Lab 2: 2048 (expectimax) - Artificial Intelligence: Problem Solving

Setting up

This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).

To get the base code for the lab, you should:

choose your group members
head to the lab's github classroom
- create a team or join an existing one
- team name should fllow the convention {GROUP} {SURNAME1} / {SURNAME2}
- accept the assignment

This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.

You can now clone the repository and start working on the lab.

Repository content

src/board.rs: the board representation
src/search.rs: placeholder for the search algorithm (expectimax)
src/eval.rs: provided evaluation function (discussed in the course)
src/main.rs: Main file to run the game
src/bench.rs: alternative executable to evaluate your algorithms on several problems

Board representation

The board.rs file contains the board representation. It is a 4x4 grid of u8 values, with a few helper functions to manipulate it.

The Board struct in the file is the one that implements most for game logic. However, you should not manipulate it directly.

Instead, we provide two wrapper types around it:

PlayableBoard which is a board that can be played on (i.e. you can move on it). It provides the following methods:
- init to create a new board with a single random tile
- apply to apply an action to the board. This method returns either None if the action is invalid, or a RandableBoard with the action applied.
RandableBoard which is a board that can be randomly filled with a new tile
- with_random_tile to fill the board with a new tile in an arbitrary empty cell. The method returns a PlayableBoard with the new tile added.
- successors to generate all possible successors of the board. The method returns an iterator of PlayableBoard together with the probability of getting this board.
- evaluate to evaluate the board, using the implementation provided in eval.rs

Main

The main.rs file contains the main entry point of the program. It is a simple loop that will:

create an initial board
Print it
ask for a move (using the search::select_action function)
apply the move
add a random tile
repeat until the game is over

The current implementation of select_action is stupid one that simply returns a random applicable action.

To run the program, simply run: (the --bin argument is needed to specify the binary to run)

cargo run --release --bin main

Evaluating your AI

The bench.rs file is an alternative executable that allows you to evaluate your AI on a set of problems.

It will run your AI on a set of problems and output the average score and the average number of moves. For instance, the command line below will run your AI on 100 problems and provide with some statistics on its performance.

cargo run --release --bin bench -- --num-games 100

Info

This is primarily intended for statistical evaluation but can be slow when the algorithm becomes complex. For getting a feeling on how your AI performs, using it on a single problem with the main is usually sufficient.

Greedy strategy

At this point, the search::select_action function just calls a predefined select_action_randomly that randomly selects an action among the applicable ones.

TODO: implement the select_action_greedily function so that it
- simulates all possible actions
- evaluate all applicable ones (the RandableBoard type provides an evaluate method)
- return the action with the highest evaluation
- or return None if there were no applicable action

This is called a greedy method is selects the seemingly best action based on a heuristic with more thinking or search.

TODO: evaluate the greedy strategy:
- call it in the select_action method
- run the benchmarks for it
- how does it compare to the random move strategy?

Expectimax

At this point, the evaluation function serves as an estimation for the utility of a state.

As you may have seen, this already improves upon a purely random play but leaves much to be desired. In this part, we aim to improve upon this by using the Expectimax algorithm.

The key idea behind Expectimax is to improve our evaluation of the available actions by simulating the results that can be expected after a fixed number of actions.

You will find in the search.rs file the function declaration to filled in according to the following algorithm (also seen in the course):

select_action_expecitmax(s, max_depth):
  applicable_actions = { actions that are applicable in s }
  return applicable action a that maximizes eval_randable(result(s, a))

eval_playable(s, d) =
  applicable_actions = { actions that are applicable in s }
  successors = { result(s, action)  |  action in applicable_actions}
  max { eval_randable(succ, d-1)  | succ in successors }

eval_randable(s, d) =
  if d == 0:
    evaluate(s)
  else
    Sum { p * eval_playable(succ, d) | (p, succ) in successors(s) }

TODO: implement the three functions above
TODO: evaluate the algorithm with various depth limits:
- in terms of performance (average number of actions played in a game)
- in terms of runtime

Going Beyond: Algorithmic improvements

Transposition table (caching)

As the number of states visited increase, the likelihood of visiting the same state multiple times also increases. With the naive implementation of the search algorithm, this can lead to a lot of redundant computation.

Caching the results of the evaluation function can help reduce the number of evaluations.

TODO: cache the value of eval_randable(s, d) into a HashMap
- the hash map should be created once and passed as additional parameter to the eval_randable and eval_playable functions

Danger

The whole purpose of using expectimax is to increase the precision of the evaluation of the game state. Essentially, the most remaining actions are allowed and most precise is the evaluation (the algorithm will evaluate over more simulations).

WHen caching an evaluation, it is essential to store evaluation of the state (f32) together with the remaining actions (usize) that were available when it was computed.

WHen looking up a cached evaluation, you should only use it if it was computed with the same or more remaining (i.e. the evaluation is at least as precise as the one you were going to compute).

Iterative deepening

The expectimax algorithm is a depth-limited search. The depth limit is a parameter of the algorithm. The time complexity of the algorithm is \( O(b^d) \) where b is the branching factor and d is the depth limit.

You may have noticed that the runtime for a fixed depth limit can vary greatly depending on the state of the board. This is primarily due to the branching factor of the game tree that is proportional to the number of empty cells.

Iterative deepening is a technique that consists in running the search algorithm multiple times with increasing depth limits, until a time limit is reached.

Evaluating the algorithm

We plan to run your algorithm on a set of problems to evaluate its performance, this equivalent to the bench executable.

We plan to run your algorithm on 100 problems with a time limit of 180 seconds. The time limit is to ensure that the algorithm is implemented efficiently: to reach the 16384 tile you should take no more than 20 milliseconds per decision.

# Simulate what the benchmark will do
cargo run --release --bin bench -- --num-games 100 --timeout 180

If we are able to build and run your code, we will publish the performance in the leaderboard.