Monte Carlo Tree Search

The process of Upper Confidence Bounds for Trees (UCT)

Monte Carlo Tree Search

1. Selection

Starting at the root node R, search a node L which is not expanded fully.
Specifically, we use UCB(upper confidence bound) algorithm to choose the descendant node with the most potential.
Monte Carlo Tree Search
v is the parent node of the v’.
Q(v’) represent the value of v’, which is ruled by yourself.
N(v’) represent the visit times of v’. N(v) represent the visit times of the parent node of v’.
Cp is the constant that trade off exploration and exploitation.

From the root R:

  • If all child nodes of R are visited, we choose the child node which has the maximum UCB value.
  • Otherwise, we randomly choose a child node which has not been visited.

2. Expansion

After selection, we choose the node L. If L isn’t a terminal node(it means that the game is not over yet), randomly create a node as its child node.
The node which is created represent the next step of the game.
If L is a terminal node, then return the current node L. Because it is already a exit status.

3. Simulation

Get node c from the above two steps. And starting from node c, simulate the game until the end of the game.
We can randomly play until we can decide the winner.

4.Backpropagation

In the step 3, we get a result from the simulation of the game. And we must use the result to update the node.
From the node c to the root R, we update all the node as follows:
N(v) = N(v) + 1
Q(v) = Q(v) + reward

The reason we use MCTS

Monte carlo tree search is based on sampling to get results, rather than exhaustive enumeration.
Although we can’t enumerate all the game cases, the results is still fairly reliable.

Pseudocode

Monte Carlo Tree Search

Code

to be continue…