This site is the central point for the lectures on Artificial Intelligence (INSA 4IR). It will be updated regularly with the slides and other resources.

Lecture Slides

Below are the slides for the lectures, that will be made available as the course progresses.

Lecture 1: Introduction (AIMA Chap. 1)
Lecture 2: The Agent Model (AIMA Chap. 2)
Lecture 3: Solving problems by searching (AIMA Chap. 3)
Lecture 4: Utility and Decision Theories (AIMA Chap. 15, Making simple decisions)
Lecture 5: Markov Decision Processes (AIMA Chap. 16, Making complex decisions)
Lecture 6: Online MDP: 2048 & Expectimax (AIMA Chap. 16 & 5)
Lecture 7: Multi-agent environments & Adversarial games (AIMA Chap. 17 & 5)
Lecture 8: Monte Carlo Tree Search (AIMA Chap. 5)
Conclusion: Some elements on remaining labs and exams

Working at home: The course is based on the book Artificial Intelligence: A Modern Approach (AIMA), by Stuart Russell and Peter Norvig.

Evaluation

The evaluation will be based on:

a written exam (~50%)
- exam on moodle, with short open and multiple-choice questions
- date: 7th of May (07/05), 8am
- everything seen in the course and the labs is examinable, including exercises done during the course but not present in the slides
- examples of questions can be found in the last set of slides (conclusion)
- no documents allowed
a project based evaluation (~50%), based on the last two labs of the course (MCTS for checkers)
- deadline: May the fourth (04/05), 23:59
- the project will be done in groups of 2-3 students
- the project will be evaluated based on the performance of the code and on a short report detailing the evaluation conducted
- modalities: push your code and report (README.md) on the shared github repository

The Rust Programming Language

All labs will be done in Rust. Beside the intrinsic quality of the language, there are a few reasons for this choice in the context of this course:

performance: we are going to work on some CPU-intensive algorithms, and Rust will let you have excellent performance without having to think about it.
reliability: the Rust compiler can often feel annoyingly strict, but it will save you from many bugs that would be hard to catch otherwise.
cross-platform: Rust tends to very reliably work on most platforms.

In the context of this course, it means that you can focus on the AI algorithms and problems we are going to study.

Of course, the downside is that you will have to build some basic skills in Rust. We will only require the fundamentals, and provide a lot of help and examples.

Getting Acquainted with Rust

The recommended way to get started with Rust is to follow the official book.

Ideally, before the first lab, you should have read the following chapters:

Install & Hello World
Quick Tour of a Rust program
- walks you through a simple program, explaining the basic syntax and concepts.
Common programming concepts (variables, functions, control flow; ...)
- you will learn about the basic building blocks of Rust programs, that you already know from other languages but may have different syntax
Understanding Ownership
- Ownership is Rust’s most unique feature, and I highly recommend carefully reading this chapter.
- Note that the concept of ownership is not unique to Rust, but the Rust compiler checks that you respect it, which is unique. In C/C++, you typically would have to follow the same rules, but if you fail to apply them, the compiler won't help you (and your program may crash 10 minutes later)

Chapters 5 to 9 cover some aspects that would be useful but can for the most part be picked up as we go.

Chapters 10 and more are about intermediate topics which we will try to avoid in the labs.

Additional Resources

Editors:
- Visual Studio Code with the Rust analyzer extension
- RustRover (for Jetbrains afficionados)
- Any editor with a rust-analyzer plugin (zed, neovim, ...)
Rustlings a series of small exercises to get started witht the language
Rust Playground: to quickly test some code without having to install anything
Comprehensive Rust: a quite elaborated course built for in-class teaching but can be used for self-learning looking at the speaker-notes

Lab 1: 8-puzzle (fr: taquin)

The 8-puzzle is a sliding puzzle that consists of a frame of numbered square tiles in random order with one tile missing. The object of the puzzle is to place the tiles in order by making sliding moves that use the empty space, reaching the configuration below

Goal state:
┏━━━┳━━━┳━━━┓
┃ 1 ┃ 2 ┃ 3 ┃
┣━━━╋━━━╋━━━┫
┃ 4 ┃ 5 ┃ 6 ┃
┣━━━╋━━━╋━━━┫
┃ 7 ┃ 8 ┃   ┃
┗━━━┻━━━┻━━━┛

Practice with the 8-puzzle, try solving the instance below (with your own, non-artificial, intelligence):

Online Game

Install Rust

The project is a Rust project, and you will need to have Rust installed on your machine to run it (it is not preinstalled).

Installation instructions are available on the Rust website.

Get acquainted

Get the code from the repository: https://github.com/arbimo/insa-4ir-ai-lab1-template

The code already contains a number of files:

src/board.rs: the board representation, together with the actions that can be performed on it
src/main.rs: the main file, where you can test your code
src/search.rs: placeholder for the search algorithm
src/heuristic.rs: placeholder for the heuristic function used by the search
src/min_heap.rs: a very simple min heap implementation that you can use

Danger

Please deactivate any AI-based auto-completion in your IDE (copilot, ...). While they may improve you productivity in some cases, they are in general not helpful for learning and understanding. You are free to use external chat based tools (that do not spontaneously propose answers before you have asked a question).

Understanding the board representation

If you run the test suite with cargo test, you will see that the board representation is not complete yet and has a few tests that fail.

▢ complete the test_position and test_apply unit tests in board.rs
▢ implement the is_valid_plan method (so that the corresponding test passes)

Tip

One thing that you will quickly notice is that board.apply(Direction::Down):

does not modify the board in place, but returns the result of applying an action to the board
the return type of apply is Option<Board> that will contain a new board wrapped in Some(new_board) if the action is valid, and None otherwise

Like any decent language, Rust has a match construct that allows you to handle this situation in a very elegant way:

#![allow(unused)]
fn main() {
match board.apply(Direction::Down) {
    Some(new_board) => { /* do something with new_board */ },
    None => { /* the action is not valid */ }
}
}

Solving your first 8-puzzle

❓What is the sequence of moves to reach the goal state from the following position? (hint: you only need 4)

┏━━━┳━━━┳━━━┓
┃ 1 ┃ 2 ┃ 3 ┃
┣━━━╋━━━╋━━━┫
┃ 4 ┃ 8 ┃ 5 ┃
┣━━━╋━━━╋━━━┫
┃   ┃ 7 ┃ 6 ┃
┗━━━┻━━━┻━━━┛

in the main.rs file, implement the main function to:
- ▢ create the board above
- ▢ create the plan (sequence of actions) to reach the goal state
- ▢ validate the plan (use the is_valid_plan method)
- ▢ simulate the plan with the play method

Info

You can run the main function with cargo run

Uninformed search (Dijkstra)

At this point we will try to implement an AI to solve the 8-puzzle. This means, that for any initial state, we want to find the sequence of moves to reach the goal state. We are interested in finding the shortest sequence of moves. We will therefore assume that each action has a cost of 1.

▢ implement the search method in search.rs, not using any heuristic (it is a Dijkstra algorithm)

Info

Dijkstra's algorithm is a variant of best first search where the nodes are expanded in order of their distance to the start node.
- this means that f(s) (the priority of a state s) should be set the to the path-cost to reach s from the start node.
- the algorithm for best first search is given in the slides of the course (CM3), in format that closely match the one of the search method.

Tip

The search implementation will force you to manipulate a number of hashmaps. Here are a few tips on how to use them:

#![allow(unused)]
fn main() {
let path_costs: HashMap<Board, u32> = HashMap::new();

// associates the board with a path cost of 9 (replacing any previous value)
path_costs.insert(board, 9);

// get the path cost of a board
// the result would be Some(9) for the board inserted above
// but None for a board that was not inserted before
let optional_cost: Option<&u32> = path_costs.get(&board);
// when you have an option that you know will contain something,
// you can use the unwrap method (that will panic if the option is None)
let cost: &u32 = optional_cost.unwrap();

// In Dijskstra's algorithm, it will often be the case that you know a value is present.
// In this case, you can use the index notation to directly access the value
// Note: it will panic if nothing is there
let cost: u32 = path_costs[&board];
}

▢ test it on different instances of the 8-puzzle:
- INSTANCES in board.rs gives a number of initial states with a known distance to the goal.
▢ use the statistics explore the evolution of the runtime and the number of nodes explored with the distance to the goal

Danger

If you are to compare the runtime of the search algorithm, make sure to run it in release mode (i.e. with compiler optimisation turned on)

cargo run --release

Informed, optimal search (A*)

▢ in the heuristic.rs file, implement the estimate method for the hamming distance
▢ modify the search method to use the heuristic (taking an additional parameter heuristic: &Heuristic)
▢ compare the performance of the search with the heuristic to the blind search (at this point, printing output in a form that you can copy-paste in a spreadsheet is a good idea)

Question

what is the difference between Dijkstra's algorithm and A* with the blind heuristic?
is there a possibility for you heuristic to overestimate the distance to the goal?

▢ repeat with the Manhattan distance heuristic

Suboptimal search

▢ implement weighted A* search
- given the similarity with the A* algorithm, you can probably do it by generalizing the search method in search.rs and have A* and Weighted-A* only differ by the parameters passed to the search method

Question

Could you quantify the loss of optimality of weighted A*? (for W=2, W=3, ...)

Question

Analytically derive a (reasonably small) value of W that would make the weighted A* equivalent to greedy best-first search.
The min-heap provided only supports integers as priorities, how would you handle the case where you would like a rational weight (e.g. 4/3)?
- intermediate question: what happens to the search if you multiply both g(s) and h(s) by the same number?

Going further

counting states: propose and implement a method to count the number of states that can be lead to the goal states
- there is an argument that there is only 9!/2 reachable states for the 8-puzzle (but doing the math is probably harder than counting them)
enumerating: how many states are there at distance 17 from the goal state?
unreachable states: can you find a state that is not reachable from the goal state?
generating instances: propose a way to generate random solvable instances of the 8-puzzle
- you can use the rand crate for getting random numbers.
larger boards: make the puzzle more challenging by increasing the size of the board (4x4)
- the size of the board is controlled by the N constant in board.rs
- modifying it should be easy enough but will invalidate all the predefined boards defined for testing
- careful: the state space is much larger, you cannot expect to fit it all in memory (for the larger instances, you can consider memory-frugal variants such as IDA*)

Corrections

As many of you did not have time to finish it, below are the corrections for the lab.

It is meant to help you finish it yourselves: if you get blocked on something, the content below should help you pass most difficulties.

The associated code is present in the repository below. In addition, for each of part, there is a link to the commit as well as a short video explaining it. Note: the explanations in the code/videos are limited, and you should experiment with it yourselves to get a better feel of it.

Code repository

Dijkstra

Implementation of Dijkstra.

Commit

A*

Implementation of the heuristics and adaptation of the search to consider the heuristics.

Commit

Performance comparison (Dijkstra / A*)

Short demonstration of how to compare algorithms (printing, export to csv and load in spreadsheet).

Commit
Video (somehow the embedding seem to have some problems)

Setting up

This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).

To get the base code for the lab, you should:

choose your group members
head to the lab's github classroom
- create a team or join an existing one
- team name should fllow the convention {GROUP} {SURNAME1} / {SURNAME2}
- accept the assignment

This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.

You can now clone the repository and start working on the lab.

Repository content

src/board.rs: the board representation
src/search.rs: placeholder for the search algorithm (expectimax)
src/eval.rs: provided evaluation function (discussed in the course)
src/main.rs: Main file to run the game
src/bench.rs: alternative executable to evaluate your algorithms on several problems

Board representation

The board.rs file contains the board representation. It is a 4x4 grid of u8 values, with a few helper functions to manipulate it.

The Board struct in the file is the one that implements most for game logic. However, you should not manipulate it directly.

Instead, we provide two wrapper types around it:

PlayableBoard which is a board that can be played on (i.e. you can move on it). It provides the following methods:
- init to create a new board with a single random tile
- apply to apply an action to the board. This method returns either None if the action is invalid, or a RandableBoard with the action applied.
RandableBoard which is a board that can be randomly filled with a new tile
- with_random_tile to fill the board with a new tile in an arbitrary empty cell. The method returns a PlayableBoard with the new tile added.
- successors to generate all possible successors of the board. The method returns an iterator of PlayableBoard together with the probability of getting this board.
- evaluate to evaluate the board, using the implementation provided in eval.rs

Main

The main.rs file contains the main entry point of the program. It is a simple loop that will:

create an initial board
Print it
ask for a move (using the search::select_action function)
apply the move
add a random tile
repeat until the game is over

The current implementation of select_action is stupid one that simply returns a random applicable action.

To run the program, simply run: (the --bin argument is needed to specify the binary to run)

cargo run --release --bin main

Evaluating your AI

The bench.rs file is an alternative executable that allows you to evaluate your AI on a set of problems.

It will run your AI on a set of problems and output the average score and the average number of moves. For instance, the command line below will run your AI on 100 problems and provide with some statistics on its performance.

cargo run --release --bin bench -- --num-games 100

Info

This is primarily intended for statistical evaluation but can be slow when the algorithm becomes complex. For getting a feeling on how your AI performs, using it on a single problem with the main is usually sufficient.

Greedy strategy

At this point, the search::select_action function just calls a predefined select_action_randomly that randomly selects an action among the applicable ones.

TODO: implement the select_action_greedily function so that it
- simulates all possible actions
- evaluate all applicable ones (the RandableBoard type provides an evaluate method)
- return the action with the highest evaluation
- or return None if there were no applicable action

This is called a greedy method is selects the seemingly best action based on a heuristic with more thinking or search.

TODO: evaluate the greedy strategy:
- call it in the select_action method
- run the benchmarks for it
- how does it compare to the random move strategy?

Expectimax

At this point, the evaluation function serves as an estimation for the utility of a state.

As you may have seen, this already improves upon a purely random play but leaves much to be desired. In this part, we aim to improve upon this by using the Expectimax algorithm.

The key idea behind Expectimax is to improve our evaluation of the available actions by simulating the results that can be expected after a fixed number of actions.

You will find in the search.rs file the function declaration to filled in according to the following algorithm (also seen in the course):

select_action_expecitmax(s, max_depth):
  applicable_actions = { actions that are applicable in s }
  return applicable action a that maximizes eval_randable(result(s, a))

eval_playable(s, d) =
  applicable_actions = { actions that are applicable in s }
  successors = { result(s, action)  |  action in applicable_actions}
  max { eval_randable(succ, d-1)  | succ in successors }

eval_randable(s, d) =
  if d == 0:
    evaluate(s)
  else
    Sum { p * eval_playable(succ, d) | (p, succ) in successors(s) }

TODO: implement the three functions above
TODO: evaluate the algorithm with various depth limits:
- in terms of performance (average number of actions played in a game)
- in terms of runtime

Going Beyond: Algorithmic improvements

Transposition table (caching)

As the number of states visited increase, the likelihood of visiting the same state multiple times also increases. With the naive implementation of the search algorithm, this can lead to a lot of redundant computation.

Caching the results of the evaluation function can help reduce the number of evaluations.

TODO: cache the value of eval_randable(s, d) into a HashMap
- the hash map should be created once and passed as additional parameter to the eval_randable and eval_playable functions

Danger

The whole purpose of using expectimax is to increase the precision of the evaluation of the game state. Essentially, the most remaining actions are allowed and most precise is the evaluation (the algorithm will evaluate over more simulations).

WHen caching an evaluation, it is essential to store evaluation of the state (f32) together with the remaining actions (usize) that were available when it was computed.

WHen looking up a cached evaluation, you should only use it if it was computed with the same or more remaining (i.e. the evaluation is at least as precise as the one you were going to compute).

Iterative deepening

The expectimax algorithm is a depth-limited search. The depth limit is a parameter of the algorithm. The time complexity of the algorithm is \( O(b^d) \) where b is the branching factor and d is the depth limit.

You may have noticed that the runtime for a fixed depth limit can vary greatly depending on the state of the board. This is primarily due to the branching factor of the game tree that is proportional to the number of empty cells.

Iterative deepening is a technique that consists in running the search algorithm multiple times with increasing depth limits, until a time limit is reached.

Evaluating the algorithm

We plan to run your algorithm on a set of problems to evaluate its performance, this equivalent to the bench executable.

We plan to run your algorithm on 100 problems with a time limit of 180 seconds. The time limit is to ensure that the algorithm is implemented efficiently: to reach the 16384 tile you should take no more than 20 milliseconds per decision.

# Simulate what the benchmark will do
cargo run --release --bin bench -- --num-games 100 --timeout 180

If we are able to build and run your code, we will publish the performance in the leaderboard.

Leaderboard

Team	Average Score	256	512	1024	2048	4096	8192	16384	32768
c1-marty	9647.53	100.00	100.00	100.00	100.00	98.00	92.00	56.00	0.00
c1-chourret-pouilles	8981.02	100.00	100.00	100.00	99.00	94.00	90.00	51.00	1.00
c1-rodriguez	8395.38	100.00	100.00	100.00	100.00	99.00	92.00	65.00	0.00
c1-lasserre	7877.87	100.00	100.00	100.00	99.00	98.00	92.00	62.00	0.00
a1-alinot-siri	7066.36	100.00	100.00	100.00	100.00	100.00	79.00	25.00	0.00
b2-lacau-loubejac	6423.01	100.00	100.00	100.00	98.00	92.00	71.00	20.00	0.00
b2-green	6055.08	100.00	100.00	100.00	99.00	97.00	79.00	4.00	0.00
a1-fornaro-clerembaux	5648.27	100.00	100.00	100.00	98.00	93.00	63.00	12.00	0.00
c1-servieres	5640.51	100.00	100.00	100.00	100.00	93.00	61.00	10.00	0.00
b1-boukari	5595.15	100.00	100.00	100.00	99.00	94.00	68.00	7.00	0.00
c1-rebillard	5523.63	100.00	100.00	100.00	100.00	97.00	63.00	9.00	0.00
a2-berger-audic	5348.90	100.00	100.00	100.00	100.00	93.00	62.00	6.00	0.00
a1-doganis-rees	5271.02	100.00	100.00	100.00	99.00	95.00	62.00	9.00	0.00
c1-plana-sleiman	5160.51	100.00	100.00	100.00	99.00	92.00	59.00	8.00	0.00
b2-doumbouya-iben-brahim	4853.05	100.00	100.00	100.00	98.00	97.00	51.00	1.00	0.00
b2-keizer-lavergne	4743.17	100.00	100.00	100.00	99.00	95.00	64.00	0.00	0.00
b2-chen-juxin	4580.14	100.00	100.00	100.00	97.00	92.00	51.00	1.00	0.00
a1-levy-mathieu	4550.48	100.00	100.00	100.00	99.00	92.00	39.00	2.00	0.00
a2-meetoo-hindi	4491.71	100.00	100.00	100.00	98.00	94.00	44.00	0.00	0.00
b1-amri-lemeilleur	3484.05	100.00	100.00	99.00	98.00	80.00	21.00	0.00	0.00
a2-olivan-juan	3213.46	100.00	100.00	100.00	100.00	75.00	11.00	0.00	0.00
anglade	3120.52	100.00	100.00	99.00	94.00	75.00	10.00	0.00	0.00
b2-martin-sellami	3093.88	100.00	100.00	98.00	95.00	73.00	12.00	0.00	0.00
a2-bezmaternykh	2415.57	100.00	100.00	100.00	99.00	99.00	0.00	0.00	0.00
a2-escudie	1663.75	98.00	97.00	94.00	72.00	17.00	0.00	0.00	0.00
a2-bongibault	1270.78	100.00	100.00	100.00	100.00	0.00	0.00	0.00	0.00
b1-gimond-large	357.33	93.00	29.00	0.00	0.00	0.00	0.00	0.00	0.00
b2-ferreira	338.25	88.00	28.00	1.00	0.00	0.00	0.00	0.00	0.00
b1-misbahi	122.60	10.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
a1-therond-wallart	122.54	8.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
b2-jimenez	120.68	7.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
c1-leko	120.46	6.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
a1	119.82	7.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
a2-rousseau	119.22	5.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
c-hassouna-mohamed-amine	116.12	2.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
lacire	115.49	8.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Generated with:

100 games per player
180s timeout per game
repositories updated before 2025-04-11 12:23:06.948968

The International Checkers Game

The lab will focus on the most common variant of checkers (at least in France): Internation Checkers (sometimes called draughts).

Game example

If you are unfamiliar with checkers, you can check the International Checkers Rules.

Unlike some other variants (notably american checkers):

the game is played on a 10x10 board
pawns (aka pieces/checkers) can only move forward
pawns can capture forward and backward
when a capture is possible, the player must play the move that captures the most adversaries
a queen (aka king) appears when the player reaches the opposite end of the board
a queen can move freely multiple steps in one direction

Info

The implementation provided slightly differs from this rules. In particular:

By default the board size is 8x8
- can be configured to any reasonable multiple of 2 with the N constant
by default, the game is declared a draw after 200 moves
- can be configured with the MAX_PLY constant (larger number need for larger boards)

Setting up

This labs should be done in groups of 1 or 2 students (but 3 is tolerated in case there is an uneven number of students and you don't want to be alone).

To get the base code for the lab, you should:

choose your group members
head to the lab's github classroom
- create a team or join an existing one
- team name should follow the convention {GROUP} {SURNAME1} / {SURNAME2} -- LAB3
  - IMPORTANT: the name must be different that the one used for LAB2, hence the -- LAB3 at the end.
- accept the assignment
TROUBLESHOUTING: if github classroom is in a bad mood, you can get the base code from this repository.

This process will create a new repository for your group with the base code for the lab. The repositories are private, and only the members of the group and the teachers can see them.

You can now clone the repository and start working on the lab

Provided code

You are provided with an implementation of the board and rules (board.rs) and of a simple AI engine (minimax.rs). Below is a highlight of the main parts (note: you do not have to understand everything in the board implementation).

board.rs: "simple" implementation of the board and basic rules for International checkers. It notably contains the following elements:
- > Color: enum with White and Black variants
- Cell representation of the content of a cell of the board Empty and White/Black-Pawn/Queen
- Position: position of a valid cell on the board. Its implementation is fairly low-level but you access its coordinates (coords())
- Dir: one of the four possible directions for moving UpLeft, UpRight, ...
- Move Elementary movement on the board: number of cells traversed in a single direction. If the moves is over an adversary's piece, it results in capture
- > Action: a sequence of Moves from a given position. Represents any possible action available to the player, from a single pwn displacement to multiple captures.
- > Board: board representation (implementation detail: using a padded array)
  - turn: indicates which player is to play next (white/black)
  - num_ply: number of turns already played
  - init(): creates the initial board
  - count(): count the number of occurrences of a particular cell value
  - is_draw(): true iff the game is considered a draw
  - actions(): return all the valid actions available to the current player
  - apply(): returns the board resulting from applying the (assumed valid) action
minimax.rs: very primitive implementation of minimax (in its negamax variant)
- select(...): will return the best action according to a minimax search of a fixed depth (defined in the MinimaxEngine), ignoring any deadline given to it.
mcts.rs: skeleton for the implementation of MCTS (more below)
engine.rs: Interface for abstracting over different engines (mcts or minimax). It defines an Engine trait (similar to a Java interface) that allow abstracting over different kind of engines.

Testing

▢ In the main file, simulate a few games between two minimax engines.
- the main::example_game() function is the main one there

Monte Carlo Tree Search (MCTS)

The objective of the lab is to implement the MCTS method for the checkers games. We will split it in several steps.

Scoring

▢ Implement the mcts::white_score() function.
- The function can freely assume that it is only evaluated on a final board, that is: the game is a draw or the current player has no available actions.
- It must compute the score from the point of view of the white player
- Useful methods/attributes on Board: is_draw(), actions() and turn

Rollouts

▢ Implement the mcts::rollout() function.

Starting from the board given as parameter, the function should

select a random action among the valid ones
generate the resulting state
repeat 1./2. until the board is final (no actions left, indicative of draw or loss)
return white's evaluation of the final board

Tip

Picking a random value from a Vec is possible using the rand crate (library) It provides a Random Number Generator (RNG) through rand::rng() and several convenience methods for common types.

#![allow(unused)]
fn main() {
use rand::seq::IndexedRandom; // makes the `choose` method visible on `Vec`

let ints: Vec<i32> = vec![1, 2, 3];
let choosen: Option<&i32> = ints.choose(&mut rand::rng());
}

▢ Test your rollout: function:
- in the main.rs, call you rollout function a large number of times
  - what is the average value of the calls (hint: should be close to 0.5)
  - How many rollouts can you do in 1 second (hint: should be several tens of thousands in release mode)

Basic MCTS Engine

MCTS graph

The mcts.rs file contains several basic datastructures that you will use for the implementation of the algorithm.

Node: a node of the MCTS graph that has already been evaluated with a rollout.
- board: (s) the state of the game this node represents
- count: (N(s)) number of times this node has been selected
- out_edges: outgoing edges for all valid actions available on the state. Each OutEdge contains
  - the corresponding action (a)
  - the number of times the action was visited (N(s,a))
  - the evaluation of the target state (Q(s,a))
- initial_eval: (U(s)) result of the initial rollout
- eval: (Q(s)) estimated utility of the nodes computed from children states' utilities

The Node structure essentially represents a node of the MCTS tree with all associated metadata and outgoing edges. The MctsEngine structure allow finding the Node associated to a given Board.

Playout implementation

For implementing MCTS you have three methods to implement

▢ playout: recursively update the board (s) and returns its updated evaluation (Q(s))
- if the state was never visited before (not in the graph), it should evaluate it with a rollout and add it to the graph
  - the Node::init(...) will probably prove useful
- otherwise, it should
  - select an action (see select_ucb1 below)
  - perform a recursive playout for the resulting state
  - update the evaluation (see update_eval below)
▢ select_ucb1: selects the best action according to UCB1 (course playout algorithm, line 6)
- from the node associated with the board (s) it should:
  - compute the UCB1 score for each action
  - return the action with the highest UCB1 score
  - it should use the exploration_factor (C) from the MctsEngine structure
▢ update_eval: updates the node (eval and counts) at the end of a playout (course algorithm, lines 10-12)
- It given the board (s) that should be updated, the action (a) that was taken and the evaluation of next state (Q(s,a))
- it must update the counts (N(s), N(s,a)) and store the given action_eval (Q(s,a)) on the corresponding edge
- recompute and update the node's evaluation (Q(s))
- it should return the updated evaluation (Q(s))

Danger

In the algorithm we are implementing all evaluations (Q(.)) are from the perspective of the white player.

❓ In the UCB1 selection, how does the UCB1 value allows to capture what black would do in the situation?

Tip

The test_mcts() function at the end of the file contains a test-bed for observing the behavior of the method on a reasonably interesting state.

Below is a sample what it should look like when you increase the number of playouts to 1000:

ABCDEFGH   White  (0 plies)
1  b . b b
2 . . . b
3  . . . w
4 . . . .
5  . . . .
6 . b w .
7  . . w .
8 w w w .

After 1000 playouts:
Q: 0.78091437    (N: 1000)       <- Q(s) and N(s)
     301 C8 B7   [0.87960243]    <- Q(s,a), here for the action "C8 B7"
     230 C8 D7   [0.84222764]
     184 F7 G6   [0.8040308]
     103 E6 D5   [0.693204]
      86 E6 F5   [0.6523256]
      49 A8 B7   [0.48979592]
      46 E8 D7   [0.47826087]
      -- -----   ------------
      ^    ^         ^
      |    |         Evaluation for this action (Q(s,a))
      |    Action (a)
      Number of samples per action (N(s,a))

Note that:

Q(s) and Q(s,a) are always in [0,1] (its weighted average of the utility of rollouts)
best actions are sampled more often (increasingly so as N(s) increase because the exploration term decreases)
Q(s) tends towards max_a Q(s,a) as the number of samples grows

MCTS Main Loop

With all those ingredients, you should be able to implement the main MCTS loop in the select method.

▢ implement the select method
- until the deadline is not passed, do playouts from the current state
- return the action with the most selections

Tip

For measuring time, you can use the Instant structure from the standard library

#![allow(unused)]
fn main() {
let time_remaining: bool = Instant::now() < deadline;
}

Evaluating you engines

Variations of your engine

There are a number of elements that can be tuned and may affect the performance of your engine:

the exploration_weight parameter
the evaluation function: the initial implementation is based on a rollout but you could
- use the average over more than one rollout (more accurate but potentially costly)
- use arbitrary evaluation function instead of a rollout (notably, the one used by minimax)
the time allowed for MCTS at each turn

Evaluation set up

Engine evaluation is typically done over a number of games against another engine, to compute the average score of an engine against another in a statistically meaningful manner.

When evaluating your engine you can do so against either:

a baseline solver: a solver with well understood performance. These include:
- the random solver that acts randomly at each turn (not implemented for you)
- the minimax solver with a given depth
- any of the above two can be made artificially stronger by placing them in a favorable situation (e.g. removing pieces from their opponent)
another configuration of your solver

Danger

Checkers is not a symmetric game and for an unbiased evaluation you should make sure that both engines play both sides an equal number of times.
The MCTS engine accumulates data at each turn. You should make sure that it start with a fresh graph for each game by either:
- creating a new engine, or
- calling the Engine::clear() method in between games

Performance characterization

In addition to evaluating your solver against others, you should provide some metrics sheding some light on the functionning of your solver. This may notably include:

number of playouts per second
average playout depth (number of actions selected before reaching a leaf of the tree)
length of the principal variation in the tree. The principal variation is the game that would have been played if both players were to follow the MCTS policy (always selecting the action with the most playout) until reaching a tree leaf. It is deemed as the most likely by the MCTS search, and this metric would tell how far did the algorithm explored this path
- at the end of the principal variation, the node would have very few selection and are thus of low confidence, you may restrict this principal variation to stop as soon as an action was selected less that a number of time (e.g. 100), indicating that the algorithm is not yet convinced of the value of this action.

These metrics may differ between start, middle and end games, and should be computed for several representative states of these categories.

Playing against each other

Danger

This is potentially time consuming, only to be done when you have consolidated results.

I haven't had time to provide a mechanism to automatically export an agent. If you want to test against someone else's AI, and they are willing to share the code you can:

copy their engine's code (mcts.rs) into a renamed_file in your repository
- e.g. src/uce_mcts.rs where UCE (Unbeatable Checkers Engine) is the name of the engine
declare the code as a module to the rust compiler
- add pub mod uce_mcts; at the top of your main.rs
use their engine from the imported module, e.g.,
- let black_engine = uce_mcts::MctsEngine::new(1.);

Last AI Lab

The primary objectives of this lab are to:

finish and consolidate the MCTS code for checkers
set up an evaluation of your code
prepare your report for evaluation

Danger

If your code is not available in a team repository from Github classroom, set it up now. Otherwise the evaluators will not have access to your code.

You will find the link to register in the third lab (hopefully, the problems with github classroom will be fixed).

Evaluation criteria

You will be evaluated based on:

the code for your MCTS implementation
a synthetic report presenting the results of your engine evaluation
- this should be the README.md of you repository, starting with your names

The report should present:

the experiments you have conducted and the associated results (as tables or graphs), matching the instructions at the end of the Lab3 subject. In particular, you should:
- have a relative evaluation of your own engine (under different configuration) allowing you to select the best configuration
- have an absolute evaluation of your engine (against baseline and with associated metric)
a discussion of the results (why is a given configuration better than another, ...)

The report should:

NOT explain the basic functioning of the checkers implementation or of the MCTS algorithm (we know how it works)
state what differs in your implementation from the basic implementation (different heuristic, specific parameter, ...)
- in other words, what you contributed with respect to the simplest implementation (or some known limitation/bug of your implementation)

The report may (optional):

provide details on extra work that you have done in the context of the labs, related or not to MCTS and checkers.
- for instance, if you have implemented an adaptation of alpha-beta pruning to the expectimax for 2048, we want to know about it (even if your evaluation turns out deceiving)
- some hints of what can be done is given below, in the going further section.

Going further

A well conducted and functional MCTS implementation with associated evaluation is sufficient to obtain the average grade. However, there are a number of things that you can do in order to improve your grade and skills:

MCTS: there are various things that can be done to exploit the graph structure of the MCTS search which are not done for the basic implementation. The datastructure you exploit is compatible with graphs, but is not fully exploited. This MCTS for graph article goes over the specific of MCTS in tree vs graph with lined section being a potentially easy gain (to be measured)
minimax: implement alpha-beta pruning for checkers
checkers: improve the evaluation function of checkers to account for runaway pawns (pawns that have a free path to become a queen) or quiescence (if there are possible captures in the evaluated state, simulate them before running your evaluation, avoiding given a high score to a situation where you have just captured a pawn but will be captured in the next turn).
- this evaluation function can be used for either minimax or mcts
2048: add a form of pruning to the search:
- ignore states that are highly unlikely (require four 4 tiles in a row to occur)
- exploit alpha-beta pruning principles to the search (hint: the contribution of each move to the expectation term is capped by the maximum value and its probability)
8-puzzle: evaluate on much larger puzzles, requiring memory-limited methods (e.g. iterative deepening A*, beam search)

Teaching Resources

Teaching resources are provided in private git repositories to which only teachers have access.

Main Repository
- Sources of the lectures and the content of this website
- Corrections for hands-on sessions (inlined in the markdown sources)

Artificial Intelligence: Problem Solving