A deep reinforcement learning experiment pitting three algorithms against each other — DQN vs MCTS vs PPO — to see which one masters the art of exponential tile merging.
The Proximal Policy Optimization agent navigates the 4×4 grid through trial and error, learning corner strategies and merge sequences that push tiles to their theoretical maximum.