2048 is a game about coalescing powers of 2 — moving tiles up, right, left, and down — into higher powers of 2, with the goal of reaching 2048 in a single game. It relies on careful space management, as newly generated blocks can appear on any free cell and cannot immediately merge, and delayed rewards, requiring the agent to build up multi-step sequences to reach an optimal board state.

The agent takes as input the current state of the board along with previous moves, and outputs the next action — aiming to converge on the 2048 tile through learned policy.

Minimum 512

Implement a basic RL agent with Q-learning or DQN that plays 2048 logically until it can reach a 512 tile, or performs measurably better than random moves.

Realistic 1024

Compare different RL algorithms against each other — testing varying reward thresholds and strategies — to evaluate performance differences and examine learning curves over time.

Moonshot 2048

Focus on a single RL algorithm until reaching the 2048 tile somewhat consistently, or extend performance to n×n board configurations.

The primary approach uses model-free, off-policy methods — specifically Q-learning and Deep Q-Networks (DQN). These are well-suited to 2048's structure: a discrete action space, clear state representation as a 4×4 grid, and rewards tied to tile merges.

If time allows, the project may explore interactive reinforcement learning — benchmarking the agent against varying human skill levels as an additional evaluation axis.

Quantitative

Track highest tile reached and average score across training runs. Compare against a random-move baseline and a greedy tile-merging heuristic.

Qualitative

Examine emergent strategies: corner anchoring, edge consolidation, multi-step merge planning. Visualize board state over time and Q-value heatmaps.

Visualizations will include board state playback over game episodes and Q-value heatmaps showing which reward signals the agent prioritizes, making the learned strategy interpretable and comparable across algorithms.

AI tooling was used to assist with installation and setup of training environments.

AI assisted with environment installation and configuration
← Home PowerOf2 · CS 175 Status Report →