This site is the central point for the lectures on Artificial Intelligence (INSA 4IR). It will be updated regularly with the slides and other resources.
Lecture Slides
Below are the slides for the lectures, that will be made available as the course progresses.
- Lecture 1: Introduction (AIMA Chap. 1)
- Lecture 2: The Agent Model (AIMA Chap. 2)
- Lecture 3: Solving problems by searching (AIMA Chap. 3)
- Lecture 4: Utility and Decision Theories (AIMA Chap. 15, Making simple decisions)
- Lecture 5: Markov Decision Processes (AIMA Chap. 16, Making complex decisions)
- Lecture 6: Online MDP: 2048 & Expectimax (AIMA Chap. 16 & 5)
- Lecture 7: Multi-agent environments & Adversarial games (AIMA Chap. 17 & 5)
- Lecture 8: Monte Carlo Tree Search (AIMA Chap. 5)
- Conclusion: Some elements on remaining labs and exams
Working at home: The course is based on the book Artificial Intelligence: A Modern Approach (AIMA), by Stuart Russell and Peter Norvig.
Evaluation
The evaluation will be based on:
- a written exam (~50%)
- exam on moodle, with short open and multiple-choice questions
- date: 7th of May (07/05), 8am
- everything seen in the course and the labs is examinable, including exercises done during the course but not present in the slides
- examples of questions can be found in the last set of slides (conclusion)
- no documents allowed
- a project based evaluation (~50%), based on the last two labs of the course (MCTS for checkers)
- deadline: May the fourth (04/05), 23:59
- the project will be done in groups of 2-3 students
- the project will be evaluated based on the performance of the code and on a short report detailing the evaluation conducted
- modalities: push your code and report (README.md) on the shared github repository
The Rust Programming Language
All labs will be done in Rust. Beside the intrinsic quality of the language, there are a few reasons for this choice in the context of this course:
- performance: we are going to work on some CPU-intensive algorithms, and Rust will let you have excellent performance without having to think about it.
- reliability: the Rust compiler can often feel annoyingly strict, but it will save you from many bugs that would be hard to catch otherwise.
- cross-platform: Rust tends to very reliably work on most platforms.
In the context of this course, it means that you can focus on the AI algorithms and problems we are going to study.
Of course, the downside is that you will have to build some basic skills in Rust. We will only require the fundamentals, and provide a lot of help and examples.
Getting Acquainted with Rust
The recommended way to get started with Rust is to follow the official book.
Ideally, before the first lab, you should have read the following chapters:
- Install & Hello World
- Quick Tour of a Rust program
- walks you through a simple program, explaining the basic syntax and concepts.
- Common programming concepts (variables, functions, control flow; ...)
- you will learn about the basic building blocks of Rust programs, that you already know from other languages but may have different syntax
- Understanding Ownership
- Ownership is Rust’s most unique feature, and I highly recommend carefully reading this chapter.
- Note that the concept of ownership is not unique to Rust, but the Rust compiler checks that you respect it, which is unique. In C/C++, you typically would have to follow the same rules, but if you fail to apply them, the compiler won't help you (and your program may crash 10 minutes later)
Chapters 5 to 9 cover some aspects that would be useful but can for the most part be picked up as we go.
Chapters 10 and more are about intermediate topics which we will try to avoid in the labs.
Additional Resources
- Editors:
- Visual Studio Code with the Rust analyzer extension
- RustRover (for Jetbrains afficionados)
- Any editor with a rust-analyzer plugin (zed, neovim, ...)
- Rustlings a series of small exercises to get started witht the language
- Rust Playground: to quickly test some code without having to install anything
- Comprehensive Rust: a quite elaborated course built for in-class teaching but can be used for self-learning looking at the speaker-notes
Lab 1: 8-puzzle (fr: taquin)
What's an 8-puzzle?
The 8-puzzle is a sliding puzzle that consists of a frame of numbered square tiles in random order with one tile missing. The object of the puzzle is to place the tiles in order by making sliding moves that use the empty space, reaching the configuration below
Goal state:
┏━━━┳━━━┳━━━┓
┃ 1 ┃ 2 ┃ 3 ┃
┣━━━╋━━━╋━━━┫
┃ 4 ┃ 5 ┃ 6 ┃
┣━━━╋━━━╋━━━┫
┃ 7 ┃ 8 ┃ ┃
┗━━━┻━━━┻━━━┛
Practice with the 8-puzzle, try solving the instance below (with your own, non-artificial, intelligence):
Install Rust
The project is a Rust project, and you will need to have Rust installed on your machine to run it (it is not preinstalled).
Installation instructions are available on the Rust website.
Get acquainted
Get the code from the repository: https://github.com/arbimo/insa-4ir-ai-lab1-template
The code already contains a number of files:
src/board.rs
: the board representation, together with the actions that can be performed on itsrc/main.rs
: the main file, where you can test your codesrc/search.rs
: placeholder for the search algorithmsrc/heuristic.rs
: placeholder for the heuristic function used by the searchsrc/min_heap.rs
: a very simple min heap implementation that you can use
Please deactivate any AI-based auto-completion in your IDE (copilot, ...). While they may improve you productivity in some cases, they are in general not helpful for learning and understanding. You are free to use external chat based tools (that do not spontaneously propose answers before you have asked a question).
Understanding the board representation
If you run the test suite with cargo test
, you will see that the board representation is not complete yet and has a few tests that fail.
- ▢ complete the
test_position
andtest_apply
unit tests inboard.rs
- ▢ implement the
is_valid_plan
method (so that the corresponding test passes)
One thing that you will quickly notice is that board.apply(Direction::Down)
:
- does not modify the
board
in place, but returns the result of applying an action to the board - the return type of
apply
isOption<Board>
that will contain a new board wrapped inSome(new_board)
if the action is valid, andNone
otherwise
Like any decent language, Rust has a match construct
that allows you to handle this situation in a very elegant way:
#![allow(unused)] fn main() { match board.apply(Direction::Down) { Some(new_board) => { /* do something with new_board */ }, None => { /* the action is not valid */ } } }
Solving your first 8-puzzle
- ❓What is the sequence of moves to reach the goal state from the following position? (hint: you only need 4)
┏━━━┳━━━┳━━━┓
┃ 1 ┃ 2 ┃ 3 ┃
┣━━━╋━━━╋━━━┫
┃ 4 ┃ 8 ┃ 5 ┃
┣━━━╋━━━╋━━━┫
┃ ┃ 7 ┃ 6 ┃
┗━━━┻━━━┻━━━┛
- in the
main.rs
file, implement themain
function to:- ▢ create the board above
- ▢ create the plan (sequence of actions) to reach the goal state
- ▢ validate the plan (use the
is_valid_plan
method) - ▢ simulate the plan with the
play
method
Uninformed search (Dijkstra)
At this point we will try to implement an AI to solve the 8-puzzle. This means, that for any initial state, we want to find the sequence of moves to reach the goal state.
We are interested in finding the shortest sequence of moves. We will therefore assume that each action has a cost of 1
.
- ▢ implement the
search
method insearch.rs
, not using any heuristic (it is a Dijkstra algorithm)
- Dijkstra's algorithm is a variant of best first search where the nodes are expanded in order of their distance to the start node.
- this means that
f(s)
(the priority of a states
) should be set the to the path-cost to reachs
from the start node. - the algorithm for best first search is given in the slides of the course (CM3), in format that closely match the one of the
search
method.
- this means that
The search
implementation will force you to manipulate a number of hashmaps. Here are a few tips on how to use them:
#![allow(unused)] fn main() { let path_costs: HashMap<Board, u32> = HashMap::new(); // associates the board with a path cost of 9 (replacing any previous value) path_costs.insert(board, 9); // get the path cost of a board // the result would be Some(9) for the board inserted above // but None for a board that was not inserted before let optional_cost: Option<&u32> = path_costs.get(&board); // when you have an option that you know will contain something, // you can use the unwrap method (that will panic if the option is None) let cost: &u32 = optional_cost.unwrap(); // In Dijskstra's algorithm, it will often be the case that you know a value is present. // In this case, you can use the index notation to directly access the value // Note: it will panic if nothing is there let cost: u32 = path_costs[&board]; }
- ▢ test it on different instances of the 8-puzzle:
INSTANCES
inboard.rs
gives a number of initial states with a known distance to the goal.
- ▢ use the statistics explore the evolution of the runtime and the number of nodes explored with the distance to the goal
If you are to compare the runtime of the search algorithm, make sure to run it in release mode (i.e. with compiler optimisation turned on)
cargo run --release
Informed, optimal search (A*)
- ▢ in the
heuristic.rs
file, implement theestimate
method for the hamming distance - ▢ modify the search method to use the heuristic (taking an additional parameter
heuristic: &Heuristic
) - ▢ compare the performance of the search with the heuristic to the blind search (at this point, printing output in a form that you can copy-paste in a spreadsheet is a good idea)
- what is the difference between Dijkstra's algorithm and A* with the blind heuristic?
- is there a possibility for you heuristic to overestimate the distance to the goal?
- ▢ repeat with the Manhattan distance heuristic
Suboptimal search
- ▢ implement weighted A* search
- given the similarity with the A* algorithm, you can probably do it by generalizing the
search
method insearch.rs
and have A* and Weighted-A* only differ by the parameters passed to thesearch
method
- given the similarity with the A* algorithm, you can probably do it by generalizing the
- Analytically derive a (reasonably small) value of W that would make the weighted A* equivalent to greedy best-first search.
- The min-heap provided only supports integers as priorities, how would you handle the case where you would like a rational weight (e.g.
4/3
)?- intermediate question: what happens to the search if you multiply both g(s) and h(s) by the same number?
Going further
-
counting states: propose and implement a method to count the number of states that can be lead to the goal states
- there is an argument that there is only
9!/2
reachable states for the 8-puzzle (but doing the math is probably harder than counting them)
- there is an argument that there is only
-
enumerating: how many states are there at distance
17
from the goal state? -
unreachable states: can you find a state that is not reachable from the goal state?
-
generating instances: propose a way to generate random solvable instances of the 8-puzzle
- you can use the
rand
crate for getting random numbers.
- you can use the
-
larger boards: make the puzzle more challenging by increasing the size of the board (4x4)
- the size of the board is controlled by the
N
constant inboard.rs
- modifying it should be easy enough but will invalidate all the predefined boards defined for testing
- careful: the state space is much larger, you cannot expect to fit it all in memory (for the larger instances, you can consider memory-frugal variants such as IDA*)
- the size of the board is controlled by the
Corrections
As many of you did not have time to finish it, below are the corrections for the lab.
It is meant to help you finish it yourselves: if you get blocked on something, the content below should help you pass most difficulties.
The associated code is present in the repository below. In addition, for each of part, there is a link to the commit as well as a short video explaining it. Note: the explanations in the code/videos are limited, and you should experiment with it yourselves to get a better feel of it.
Dijkstra
Implementation of Dijkstra.
A*
Implementation of the heuristics and adaptation of the search to consider the heuristics.
Performance comparison (Dijkstra / A*)
Short demonstration of how to compare algorithms (printing, export to csv and load in spreadsheet).
Setting up
This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).
To get the base code for the lab, you should:
- choose your group members
- head to the lab's github classroom
- create a team or join an existing one
- team name should fllow the convention
{GROUP} {SURNAME1} / {SURNAME2}
- accept the assignment
This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.
You can now clone the repository and start working on the lab.
Repository content
src/board.rs
: the board representationsrc/search.rs
: placeholder for the search algorithm (expectimax)src/eval.rs
: provided evaluation function (discussed in the course)src/main.rs
: Main file to run the gamesrc/bench.rs
: alternative executable to evaluate your algorithms on several problems
Board representation
The board.rs
file contains the board representation. It is a 4x4 grid of u8
values, with a few helper functions to manipulate it.
The Board
struct in the file is the one that implements most for game logic. However, you should not manipulate it directly.
Instead, we provide two wrapper types around it:
PlayableBoard
which is a board that can be played on (i.e. you can move on it). It provides the following methods:init
to create a new board with a single random tileapply
to apply an action to the board. This method returns eitherNone
if the action is invalid, or aRandableBoard
with the action applied.
RandableBoard
which is a board that can be randomly filled with a new tilewith_random_tile
to fill the board with a new tile in an arbitrary empty cell. The method returns aPlayableBoard
with the new tile added.successors
to generate all possible successors of the board. The method returns an iterator ofPlayableBoard
together with the probability of getting this board.evaluate
to evaluate the board, using the implementation provided ineval.rs
Main
The main.rs
file contains the main entry point of the program. It is a simple loop that
will:
- create an initial board
- Print it
- ask for a move (using the
search::select_action
function) - apply the move
- add a random tile
- repeat until the game is over
The current implementation of select_action
is stupid one that simply returns a random applicable action.
To run the program, simply run: (the --bin
argument is needed to specify the binary to run)
cargo run --release --bin main
Evaluating your AI
The bench.rs
file is an alternative executable that allows you to evaluate your AI on a set of problems.
It will run your AI on a set of problems and output the average score and the average number of moves. For instance, the command line below will run your AI on 100 problems and provide with some statistics on its performance.
cargo run --release --bin bench -- --num-games 100
This is primarily intended for statistical evaluation but can be slow when the algorithm becomes complex. For getting a feeling on how your AI performs, using it on a single problem with the main
is usually sufficient.
Greedy strategy
At this point, the search::select_action
function just calls a predefined select_action_randomly
that randomly selects an action among the applicable ones.
- TODO: implement the
select_action_greedily
function so that it- simulates all possible actions
- evaluate all applicable ones (the
RandableBoard
type provides anevaluate
method) - return the action with the highest evaluation
- or return
None
if there were no applicable action
This is called a greedy method is selects the seemingly best action based on a heuristic with more thinking or search.
- TODO: evaluate the greedy strategy:
- call it in the
select_action
method - run the benchmarks for it
- how does it compare to the random move strategy?
- call it in the
Expectimax
At this point, the evaluation function serves as an estimation for the utility of a state.
As you may have seen, this already improves upon a purely random play but leaves much to be desired. In this part, we aim to improve upon this by using the Expectimax algorithm.
The key idea behind Expectimax is to improve our evaluation of the available actions by simulating the results that can be expected after a fixed number of actions.
You will find in the search.rs
file the function declaration to filled in according to the following algorithm (also seen in the course):
select_action_expecitmax(s, max_depth):
applicable_actions = { actions that are applicable in s }
return applicable action a that maximizes eval_randable(result(s, a))
eval_playable(s, d) =
applicable_actions = { actions that are applicable in s }
successors = { result(s, action) | action in applicable_actions}
max { eval_randable(succ, d-1) | succ in successors }
eval_randable(s, d) =
if d == 0:
evaluate(s)
else
Sum { p * eval_playable(succ, d) | (p, succ) in successors(s) }
- TODO: implement the three functions above
- TODO: evaluate the algorithm with various depth limits:
- in terms of performance (average number of actions played in a game)
- in terms of runtime
Going Beyond: Algorithmic improvements
Transposition table (caching)
As the number of states visited increase, the likelihood of visiting the same state multiple times also increases. With the naive implementation of the search algorithm, this can lead to a lot of redundant computation.
Caching the results of the evaluation function can help reduce the number of evaluations.
- TODO: cache the value of
eval_randable(s, d)
into aHashMap
- the hash map should be created once and passed as additional parameter to the
eval_randable
andeval_playable
functions
- the hash map should be created once and passed as additional parameter to the
The whole purpose of using expectimax is to increase the precision of the evaluation of the game state. Essentially, the most remaining actions are allowed and most precise is the evaluation (the algorithm will evaluate over more simulations).
WHen caching an evaluation, it is essential to store evaluation of the state (f32
) together with the remaining actions (usize
) that were available when it was computed.
WHen looking up a cached evaluation, you should only use it if it was computed with the same or more remaining (i.e. the evaluation is at least as precise as the one you were going to compute).
Iterative deepening
The expectimax algorithm is a depth-limited search. The depth limit is a parameter of the algorithm. The time complexity of the algorithm is \( O(b^d) \) where b is the branching factor and d is the depth limit.
You may have noticed that the runtime for a fixed depth limit can vary greatly depending on the state of the board. This is primarily due to the branching factor of the game tree that is proportional to the number of empty cells.
Iterative deepening is a technique that consists in running the search algorithm multiple times with increasing depth limits, until a time limit is reached.
Evaluating the algorithm
- We plan to run your algorithm on a set of problems to evaluate its performance, this equivalent to the
bench
executable.
We plan to run your algorithm on 100 problems with a time limit of 180 seconds. The time limit is to ensure that the algorithm is implemented efficiently: to reach the 16384
tile you should take no more than 20 milliseconds per decision.
# Simulate what the benchmark will do
cargo run --release --bin bench -- --num-games 100 --timeout 180
If we are able to build and run your code, we will publish the performance in the leaderboard.
Leaderboard
Team | Average Score | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384 | 32768 |
---|---|---|---|---|---|---|---|---|---|
c1-marty | 9647.53 | 100.00 | 100.00 | 100.00 | 100.00 | 98.00 | 92.00 | 56.00 | 0.00 |
c1-chourret-pouilles | 8981.02 | 100.00 | 100.00 | 100.00 | 99.00 | 94.00 | 90.00 | 51.00 | 1.00 |
c1-rodriguez | 8395.38 | 100.00 | 100.00 | 100.00 | 100.00 | 99.00 | 92.00 | 65.00 | 0.00 |
c1-lasserre | 7877.87 | 100.00 | 100.00 | 100.00 | 99.00 | 98.00 | 92.00 | 62.00 | 0.00 |
a1-alinot-siri | 7066.36 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 79.00 | 25.00 | 0.00 |
b2-lacau-loubejac | 6423.01 | 100.00 | 100.00 | 100.00 | 98.00 | 92.00 | 71.00 | 20.00 | 0.00 |
b2-green | 6055.08 | 100.00 | 100.00 | 100.00 | 99.00 | 97.00 | 79.00 | 4.00 | 0.00 |
a1-fornaro-clerembaux | 5648.27 | 100.00 | 100.00 | 100.00 | 98.00 | 93.00 | 63.00 | 12.00 | 0.00 |
c1-servieres | 5640.51 | 100.00 | 100.00 | 100.00 | 100.00 | 93.00 | 61.00 | 10.00 | 0.00 |
b1-boukari | 5595.15 | 100.00 | 100.00 | 100.00 | 99.00 | 94.00 | 68.00 | 7.00 | 0.00 |
c1-rebillard | 5523.63 | 100.00 | 100.00 | 100.00 | 100.00 | 97.00 | 63.00 | 9.00 | 0.00 |
a2-berger-audic | 5348.90 | 100.00 | 100.00 | 100.00 | 100.00 | 93.00 | 62.00 | 6.00 | 0.00 |
a1-doganis-rees | 5271.02 | 100.00 | 100.00 | 100.00 | 99.00 | 95.00 | 62.00 | 9.00 | 0.00 |
c1-plana-sleiman | 5160.51 | 100.00 | 100.00 | 100.00 | 99.00 | 92.00 | 59.00 | 8.00 | 0.00 |
b2-doumbouya-iben-brahim | 4853.05 | 100.00 | 100.00 | 100.00 | 98.00 | 97.00 | 51.00 | 1.00 | 0.00 |
b2-keizer-lavergne | 4743.17 | 100.00 | 100.00 | 100.00 | 99.00 | 95.00 | 64.00 | 0.00 | 0.00 |
b2-chen-juxin | 4580.14 | 100.00 | 100.00 | 100.00 | 97.00 | 92.00 | 51.00 | 1.00 | 0.00 |
a1-levy-mathieu | 4550.48 | 100.00 | 100.00 | 100.00 | 99.00 | 92.00 | 39.00 | 2.00 | 0.00 |
a2-meetoo-hindi | 4491.71 | 100.00 | 100.00 | 100.00 | 98.00 | 94.00 | 44.00 | 0.00 | 0.00 |
b1-amri-lemeilleur | 3484.05 | 100.00 | 100.00 | 99.00 | 98.00 | 80.00 | 21.00 | 0.00 | 0.00 |
a2-olivan-juan | 3213.46 | 100.00 | 100.00 | 100.00 | 100.00 | 75.00 | 11.00 | 0.00 | 0.00 |
anglade | 3120.52 | 100.00 | 100.00 | 99.00 | 94.00 | 75.00 | 10.00 | 0.00 | 0.00 |
b2-martin-sellami | 3093.88 | 100.00 | 100.00 | 98.00 | 95.00 | 73.00 | 12.00 | 0.00 | 0.00 |
a2-bezmaternykh | 2415.57 | 100.00 | 100.00 | 100.00 | 99.00 | 99.00 | 0.00 | 0.00 | 0.00 |
a2-escudie | 1663.75 | 98.00 | 97.00 | 94.00 | 72.00 | 17.00 | 0.00 | 0.00 | 0.00 |
a2-bongibault | 1270.78 | 100.00 | 100.00 | 100.00 | 100.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b1-gimond-large | 357.33 | 93.00 | 29.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b2-ferreira | 338.25 | 88.00 | 28.00 | 1.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b1-misbahi | 122.60 | 10.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a1-therond-wallart | 122.54 | 8.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
b2-jimenez | 120.68 | 7.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c1-leko | 120.46 | 6.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a1 | 119.82 | 7.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
a2-rousseau | 119.22 | 5.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
c-hassouna-mohamed-amine | 116.12 | 2.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
lacire | 115.49 | 8.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Generated with:
- 100 games per player
- 180s timeout per game
- repositories updated before 2025-04-11 12:23:06.948968
The International Checkers Game
The lab will focus on the most common variant of checkers (at least in France): Internation Checkers (sometimes called draughts).
If you are unfamiliar with checkers, you can check the International Checkers Rules.
Unlike some other variants (notably american checkers):
- the game is played on a 10x10 board
- pawns (aka pieces/checkers) can only move forward
- pawns can capture forward and backward
- when a capture is possible, the player must play the move that captures the most adversaries
- a queen (aka king) appears when the player reaches the opposite end of the board
- a queen can move freely multiple steps in one direction
The implementation provided slightly differs from this rules. In particular:
- By default the board size is 8x8
- can be configured to any reasonable multiple of 2 with the
N
constant
- can be configured to any reasonable multiple of 2 with the
- by default, the game is declared a draw after 200 moves
- can be configured with the
MAX_PLY
constant (larger number need for larger boards)
- can be configured with the
Setting up
This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).
To get the base code for the lab, you should:
- choose your group members
- head to the lab's github classroom
- create a team or join an existing one
- team name should follow the convention
{GROUP} {SURNAME1} / {SURNAME2} -- LAB3
- IMPORTANT: the name must be different that the one used for LAB2, hence the
-- LAB3
at the end.
- IMPORTANT: the name must be different that the one used for LAB2, hence the
- accept the assignment
- TROUBLESHOUTING: if github classroom is in a bad mood, you can get the base code from this repository.
This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.
You can now clone the repository and start working on the lab
Provided code
You are provided with an implementation of the board and rules (board.rs
) and of a simple AI engine (minimax.rs
).
Below is a highlight of the main parts (note: you do not have to understand everything in the board implementation).
board.rs
: "simple" implementation of the board and basic rules for International checkers. It notably contains the following elements:- >
Color
: enum withWhite
andBlack
variants Cell
representation of the content of a cell of the boardEmpty
andWhite/Black
-Pawn/Queen
Position
: position of a valid cell on the board. Its implementation is fairly low-level but you access its coordinates (coords()
)Dir
: one of the four possible directions for movingUpLeft
,UpRight
, ...Move
Elementary movement on the board: number of cells traversed in a single direction. If the moves is over an adversary's piece, it results in capture- >
Action
: a sequence ofMove
s from a given position. Represents any possible action available to the player, from a single pwn displacement to multiple captures. - >
Board
: board representation (implementation detail: using a padded array)turn
: indicates which player is to play next (white/black)num_ply
: number of turns already playedinit()
: creates the initial boardcount()
: count the number of occurrences of a particular cell valueis_draw()
: true iff the game is considered a drawactions()
: return all the valid actions available to the current playerapply()
: returns the board resulting from applying the (assumed valid) action
- >
minimax.rs
: very primitive implementation of minimax (in its negamax variant)select(...)
: will return the best action according to a minimax search of a fixed depth (defined in theMinimaxEngine
), ignoring any deadline given to it.
mcts.rs
: skeleton for the implementation of MCTS (more below)engine.rs
: Interface for abstracting over different engines (mcts or minimax). It defines anEngine
trait (similar to a Java interface) that allow abstracting over different kind of engines.
Testing
- ▢ In the main file, simulate a few games between two minimax engines.
- the
main::example_game()
function is the main one there
- the
Monte Carlo Tree Search (MCTS)
The objective of the lab is to implement the MCTS method for the checkers games. We will split it in several steps.
Scoring
- ▢ Implement the
mcts::white_score()
function.- The function can freely assume that it is only evaluated on a final board, that is: the game is a draw or the current player has no available actions.
- It must compute the score from the point of view of the white player
- Useful methods/attributes on
Board
:is_draw()
,actions()
andturn
Rollouts
- ▢ Implement the
mcts::rollout()
function.
Starting from the board given as parameter, the function should
- select a random action among the valid ones
- generate the resulting state
- repeat 1./2. until the board is final (no actions left, indicative of draw or loss)
- return white's evaluation of the final board
Picking a random value from a Vec
is possible using the rand
crate (library)
It provides a Random Number Generator (RNG) through rand::rng()
and several convenience methods for common types.
#![allow(unused)] fn main() { use rand::seq::IndexedRandom; // makes the `choose` method visible on `Vec` let ints: Vec<i32> = vec![1, 2, 3]; let choosen: Option<&i32> = ints.choose(&mut rand::rng()); }
- ▢ Test your rollout: function:
- in the
main.rs
, call you rollout function a large number of times- what is the average value of the calls (hint: should be close to 0.5)
- How many rollouts can you do in 1 second (hint: should be several tens of thousands in
release
mode)
- in the
Basic MCTS Engine
MCTS graph
The mcts.rs
file contains several basic datastructures that you will use for the implementation of the algorithm.
Node
: a node of the MCTS graph that has already been evaluated with a rollout.board
: (s) the state of the game this node representscount
: (N(s)) number of times this node has been selectedout_edges
: outgoing edges for all valid actions available on the state. EachOutEdge
contains- the corresponding
action
(a) - the number of times the action was
visited
(N(s,a)) - the evaluation of the target state (Q(s,a))
- the corresponding
initial_eval
: (U(s)) result of the initial rollouteval
: (Q(s)) estimated utility of the nodes computed from children states' utilities
The Node
structure essentially represents a node of the MCTS tree with all associated metadata and outgoing edges.
The MctsEngine
structure allow finding the Node
associated to a given Board
.
Playout implementation
For implementing MCTS you have three methods to implement
- ▢
playout
: recursively update the board (s) and returns its updated evaluation (Q(s))- if the state was never visited before (not in the graph), it should evaluate it with a rollout and add it to the graph
- the
Node::init(...)
will probably prove useful
- the
- otherwise, it should
- select an action (see
select_ucb1
below) - perform a recursive playout for the resulting state
- update the evaluation (see
update_eval
below)
- select an action (see
- if the state was never visited before (not in the graph), it should evaluate it with a rollout and add it to the graph
- ▢
select_ucb1
: selects the best action according to UCB1 (course playout algorithm, line 6)- from the node associated with the board (s) it should:
- compute the UCB1 score for each action
- return the action with the highest UCB1 score
- it should use the
exploration_factor
(C) from theMctsEngine
structure
- from the node associated with the board (s) it should:
- ▢
update_eval
: updates the node (eval and counts) at the end of a playout (course algorithm, lines 10-12)- It given the
board
(s) that should be updated, the action (a) that was taken and the evaluation of next state (Q(s,a)) - it must update the counts (N(s), N(s,a)) and store the given
action_eval
(Q(s,a)) on the corresponding edge - recompute and update the node's evaluation (Q(s))
- it should return the updated evaluation (Q(s))
- It given the
In the algorithm we are implementing all evaluations (Q(.)) are from the perspective of the white player.
- ❓ In the UCB1 selection, how does the UCB1 value allows to capture what black would do in the situation?
The test_mcts()
function at the end of the file contains a test-bed for observing the behavior of the method on a reasonably interesting state.
Below is a sample what it should look like when you increase the number of playouts to 1000:
ABCDEFGH White (0 plies)
1 b . b b
2 . . . b
3 . . . w
4 . . . .
5 . . . .
6 . b w .
7 . . w .
8 w w w .
After 1000 playouts:
Q: 0.78091437 (N: 1000) <- Q(s) and N(s)
301 C8 B7 [0.87960243] <- Q(s,a), here for the action "C8 B7"
230 C8 D7 [0.84222764]
184 F7 G6 [0.8040308]
103 E6 D5 [0.693204]
86 E6 F5 [0.6523256]
49 A8 B7 [0.48979592]
46 E8 D7 [0.47826087]
-- ----- ------------
^ ^ ^
| | Evaluation for this action (Q(s,a))
| Action (a)
Number of samples per action (N(s,a))
Note that:
- Q(s) and Q(s,a) are always in [0,1] (its weighted average of the utility of rollouts)
- best actions are sampled more often (increasingly so as N(s) increase because the exploration term decreases)
- Q(s) tends towards max_a Q(s,a) as the number of samples grows
MCTS Main Loop
With all those ingredients, you should be able to implement the main MCTS loop in the select
method.
- ▢ implement the
select
method- until the deadline is not passed, do playouts from the current state
- return the action with the most selections
For measuring time, you can use the Instant
structure from the standard library
#![allow(unused)] fn main() { let time_remaining: bool = Instant::now() < deadline; }
Evaluating you engines
Variations of your engine
There are a number of elements that can be tuned and may affect the performance of your engine:
- the
exploration_weight
parameter - the evaluation function: the initial implementation is based on a rollout but you could
- use the average over more than one rollout (more accurate but potentially costly)
- use arbitrary evaluation function instead of a rollout (notably, the one used by minimax)
- the time allowed for MCTS at each turn
Evaluation set up
Engine evaluation is typically done over a number of games against another engine, to compute the average score of an engine against another in a statistically meaningful manner.
When evaluating your engine you can do so against either:
- a baseline solver: a solver with well understood performance. These include:
- the random solver that acts randomly at each turn (not implemented for you)
- the minimax solver with a given depth
- any of the above two can be made artificially stronger by placing them in a favorable situation (e.g. removing pieces from their opponent)
- another configuration of your solver
- Checkers is not a symmetric game and for an unbiased evaluation you should make sure that both engines play both sides an equal number of times.
- The MCTS engine accumulates data at each turn. You should make sure that it start with a fresh graph for each game by either:
- creating a new engine, or
- calling the
Engine::clear()
method in between games
Performance characterization
In addition to evaluating your solver against others, you should provide some metrics sheding some light on the functionning of your solver. This may notably include:
- number of playouts per second
- average playout depth (number of actions selected before reaching a leaf of the tree)
- length of the principal variation in the tree. The principal variation is the game that would have been played if both players were to follow the MCTS policy (always selecting the action with the most playout) until reaching a tree leaf. It is deemed as the most likely by the MCTS search, and this metric would tell how far did the algorithm explored this path
- at the end of the principal variation, the node would have very few selection and are thus of low confidence, you may restrict this principal variation to stop as soon as an action was selected less that a number of time (e.g. 100), indicating that the algorithm is not yet convinced of the value of this action.
These metrics may differ between start, middle and end games, and should be computed for several representative states of these categories.
Playing against each other
I haven't had time to provide a mechanism to automatically export an agent. If you want to test against someone else's AI, and they are willing to share the code you can:
- copy their engine's code (
mcts.rs
) into a renamed_file in your repository- e.g.
src/uce_mcts.rs
where UCE (Unbeatable Checkers Engine) is the name of the engine
- e.g.
- declare the code as a module to the rust compiler
- add
pub mod uce_mcts;
at the top of yourmain.rs
- add
- use their engine from the imported module, e.g.,
let black_engine = uce_mcts::MctsEngine::new(1.);
Last AI Lab
The primary objectives of this lab are to:
- finish and consolidate the MCTS code for checkers
- set up an evaluation of your code
- prepare your report for evaluation
If your code is not available in a team repository from Github classroom, set it up now. Otherwise the evaluators will not have access to your code.
You will find the link to register in the third lab (hopefully, the problems with github classroom will be fixed).
Evaluation criteria
You will be evaluated based on:
- the code for your MCTS implementation
- a synthetic report presenting the results of your engine evaluation
- this should be the
README.md
of you repository, starting with your names
- this should be the
The report should present:
- the experiments you have conducted and the associated results (as tables or graphs), matching the instructions at the end of the Lab3 subject. In particular, you should:
- have a relative evaluation of your own engine (under different configuration) allowing you to select the best configuration
- have an absolute evaluation of your engine (against baseline and with associated metric)
- a discussion of the results (why is a given configuration better than another, ...)
The report should:
- NOT explain the basic functioning of the checkers implementation or of the MCTS algorithm (we know how it works)
- state what differs in your implementation from the basic implementation (different heuristic, specific parameter, ...)
- in other words, what you contributed with respect to the simplest implementation (or some known limitation/bug of your implementation)
The report may (optional):
- provide details on extra work that you have done in the context of the labs, related or not to MCTS and checkers.
- for instance, if you have implemented an adaptation of alpha-beta pruning to the expectimax for 2048, we want to know about it (even if your evaluation turns out deceiving)
- some hints of what can be done is given below, in the going further section.
Going further
A well conducted and functional MCTS implementation with associated evaluation is sufficient to obtain the average grade. However, there are a number of things that you can do in order to improve your grade and skills:
- MCTS: there are various things that can be done to exploit the graph structure of the MCTS search which are not done for the basic implementation. The datastructure you exploit is compatible with graphs, but is not fully exploited. This MCTS for graph article goes over the specific of MCTS in tree vs graph with lined section being a potentially easy gain (to be measured)
- minimax: implement alpha-beta pruning for checkers
- checkers: improve the evaluation function of checkers to account for runaway pawns (pawns that have a free path to become a queen) or quiescence (if there are possible captures in the evaluated state, simulate them before running your evaluation, avoiding given a high score to a situation where you have just captured a pawn but will be captured in the next turn).
- this evaluation function can be used for either minimax or mcts
- 2048: add a form of pruning to the search:
- ignore states that are highly unlikely (require four
4
tiles in a row to occur) - exploit alpha-beta pruning principles to the search (hint: the contribution of each move to the expectation term is capped by the maximum value and its probability)
- ignore states that are highly unlikely (require four
- 8-puzzle: evaluate on much larger puzzles, requiring memory-limited methods (e.g. iterative deepening A*, beam search)
Teaching Resources
Teaching resources are provided in private git repositories to which only teachers have access.
- Main Repository
- Sources of the lectures and the content of this website
- Corrections for hands-on sessions (inlined in the markdown sources)