Setting up
This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).
To get the base code for the lab, you should:
- choose your group members
- head to the lab's github classroom
- create a team or join an existing one
- team name should fllow the convention
{GROUP} {SURNAME1} / {SURNAME2}
- accept the assignment
This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.
You can now clone the repository and start working on the lab.
Repository content
src/board.rs
: the board representationsrc/search.rs
: placeholder for the search algorithm (expectimax)src/eval.rs
: provided evaluation function (discussed in the course)src/main.rs
: Main file to run the gamesrc/bench.rs
: alternative executable to evaluate your algorithms on several problems
Board representation
The board.rs
file contains the board representation. It is a 4x4 grid of u8
values, with a few helper functions to manipulate it.
The Board
struct in the file is the one that implements most for game logic. However, you should not manipulate it directly.
Instead, we provide two wrapper types around it:
PlayableBoard
which is a board that can be played on (i.e. you can move on it). It provides the following methods:init
to create a new board with a single random tileapply
to apply an action to the board. This method returns eitherNone
if the action is invalid, or aRandableBoard
with the action applied.
RandableBoard
which is a board that can be randomly filled with a new tilewith_random_tile
to fill the board with a new tile in an arbitrary empty cell. The method returns aPlayableBoard
with the new tile added.successors
to generate all possible successors of the board. The method returns an iterator ofPlayableBoard
together with the probability of getting this board.evaluate
to evaluate the board, using the implementation provided ineval.rs
Main
The main.rs
file contains the main entry point of the program. It is a simple loop that
will:
- create an initial board
- Print it
- ask for a move (using the
search::select_action
function) - apply the move
- add a random tile
- repeat until the game is over
The current implementation of select_action
is stupid one that simply returns a random applicable action.
To run the program, simply run: (the --bin
argument is needed to specify the binary to run)
cargo run --release --bin main
Evaluating your AI
The bench.rs
file is an alternative executable that allows you to evaluate your AI on a set of problems.
It will run your AI on a set of problems and output the average score and the average number of moves. For instance, the command line below will run your AI on 100 problems and provide with some statistics on its performance.
cargo run --release --bin bench -- --num-games 100
This is primarily intended for statistical evaluation but can be slow when the algorithm becomes complex. For getting a feeling on how your AI performs, using it on a single problem with the main
is usually sufficient.
Greedy strategy
At this point, the search::select_action
function just calls a predefined select_action_randomly
that randomly selects an action among the applicable ones.
- TODO: implement the
select_action_greedily
function so that it- simulates all possible actions
- evaluate all applicable ones (the
RandableBoard
type provides anevaluate
method) - return the action with the highest evaluation
- or return
None
if there were no applicable action
This is called a greedy method is selects the seemingly best action based on a heuristic with more thinking or search.
- TODO: evaluate the greedy strategy:
- call it in the
select_action
method - run the benchmarks for it
- how does it compare to the random move strategy?
- call it in the
Expectimax
At this point, the evaluation function serves as an estimation for the utility of a state.
As you may have seen, this already improves upon a purely random play but leaves much to be desired. In this part, we aim to improve upon this by using the Expectimax algorithm.
The key idea behind Expectimax is to improve our evaluation of the available actions by simulating the results that can be expected after a fixed number of actions.
You will find in the search.rs
file the function declaration to filled in according to the following algorithm (also seen in the course):
select_action_expecitmax(s, max_depth):
applicable_actions = { actions that are applicable in s }
return applicable action a that maximizes eval_randable(result(s, a))
eval_playable(s, d) =
applicable_actions = { actions that are applicable in s }
successors = { result(s, action) | action in applicable_actions}
max { eval_randable(succ, d-1) | succ in successors }
eval_randable(s, d) =
if d == 0:
evaluate(s)
else
Sum { p * eval_playable(succ, d) | (p, succ) in successors(s) }
- TODO: implement the three functions above
- TODO: evaluate the algorithm with various depth limits:
- in terms of performance (average number of actions played in a game)
- in terms of runtime
Going Beyond: Algorithmic improvements
Transposition table (caching)
As the number of states visited increase, the likelihood of visiting the same state multiple times also increases. With the naive implementation of the search algorithm, this can lead to a lot of redundant computation.
Caching the results of the evaluation function can help reduce the number of evaluations.
- TODO: cache the value of
eval_randable(s, d)
into aHashMap
- the hash map should be created once and passed as additional parameter to the
eval_randable
andeval_playable
functions
- the hash map should be created once and passed as additional parameter to the
The whole purpose of using expectimax is to increase the precision of the evaluation of the game state. Essentially, the most remaining actions are allowed and most precise is the evaluation (the algorithm will evaluate over more simulations).
WHen caching an evaluation, it is essential to store evaluation of the state (f32
) together with the remaining actions (usize
) that were available when it was computed.
WHen looking up a cached evaluation, you should only use it if it was computed with the same or more remaining (i.e. the evaluation is at least as precise as the one you were going to compute).
Iterative deepening
The expectimax algorithm is a depth-limited search. The depth limit is a parameter of the algorithm. The time complexity of the algorithm is \( O(b^d) \) where b is the branching factor and d is the depth limit.
You may have noticed that the runtime for a fixed depth limit can vary greatly depending on the state of the board. This is primarily due to the branching factor of the game tree that is proportional to the number of empty cells.
Iterative deepening is a technique that consists in running the search algorithm multiple times with increasing depth limits, until a time limit is reached.
Evaluating the algorithm
- We plan to run your algorithm on a set of problems to evaluate its performance, this equivalent to the
bench
executable.
We plan to run your algorithm on 100 problems with a time limit of 180 seconds. The time limit is to ensure that the algorithm is implemented efficiently: to reach the 16384
tile you should take no more than 20 milliseconds per decision.
# Simulate what the benchmark will do
cargo run --release --bin bench -- --num-games 100 --timeout 180
If we are able to build and run your code, we will publish the performance in the leaderboard.