Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning, Study Guides, Projects, Research of Artificial Intelligence

Ankara University Artificial Intelligence

Artificial Intelligence sides note

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 11/24/2022

mostafa-suleyman 🇹🇷

6 documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

Artificial Intelligence

Adversarial Search and Games

Dr. Bilgin Avenoğlu

Partial preview of the text

Download Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning and more Study Guides, Projects, Research Artificial Intelligence in PDF only on Docsity!

Artificial Intelligence

Adversarial Search and Games

Dr. Bilgin Avenoğlu

Adversarial Search and Games

Competitive environments:
- Two or more agents have conflicting goals, giving rise to adversarial search

problems - adversarial search

Two-player zero-sum games

S 0 : The initial state, which specifies how the game is set up at the start.
TO-MOVE(s) : The player whose turn it is to move in state s.
ACTIONS(s) : The set of legal moves in state s.
RESULT(s, a) : The transition model, which defines the state resulting from taking action a in state s.
IS-TERMINAL(s) : A terminal test, which is true when the game is over and false otherwise. - States where the game has ended are called terminal states.
UTILITY(s, p) : A utility function which defines the final numeric value to player p when the game ends in terminal state s. 4

Game Tree

Tic-tac-toe, the game tree is relatively small—fewer than 9! = 362, 880 terminal nodes (with only 5,478 distinct states).
But for chess there are over 10 40 nodes, so the game tree is best thought of as a theoretical construct that we cannot realize in the physical world. 5 194 Chapter 6 Adversarial Search and Games X X X X X X X X X X X O X O O O X O X O X ............ ... ... ... X X –1 0 + X X X OX X OX X OX O O X X OX O O O O X X MAX (X) MIN (O) MAX (X) MIN (O) TERMINAL Utility Figure 6.1 A (partial) game tree for the game of tic-tac-toe. The top node is the initial state, and MAX moves first, placing an X in an empty square. We show part of the tree, giving alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.

6.2 Optimal Decisions in Games

MAX wants to find a sequence of actions leading to a win, but MIN has something to say

MINIMAX Search Algorithm

The minimax algorithm performs a complete depth-first algorithm.
If the maximum depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax algorithm is O(b m ).
The space complexity is O(bm) for an algorithm that generates all actions at once
The exponential complexity makes MINIMAX impractical for complex games; - for example, chess has a branching factor of about 35 and the average game has

depth of about 80 ply, and it is not feasible to search 35

states.

MINIMAX serves as a basis for the mathematical analysis of games. 7

MINIMAX Algorithm 8 function MINIMAX-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state) return move function MAX-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a)) if v2 > v then v, move v2, a return v, move function MIN-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a)) if v2 < v then v, move v2, a return v, move Figure 6.3 An algorithm for calculating the optimal move using minimax—the move that leads to a terminal state with maximum utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole 196 Chapter 6 Adversarial Search and Games function MINIMAX-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state) return move function MAX-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a)) if v2 > v then v, move v2, a return v, move function MIN-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a)) if v2 < v then v, move v2, a return v, move Figure 6.3 An algorithm for calculating the optimal move using minimax—the move that leads to a terminal state with maximum utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves, to determine the backed-up value of a state and the move to get there.

https://www.youtube.com/

watch?v=zDskcx8FStA

Section 6.2 Optimal Decisions in Games 195 MAX (^) A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 d 1 d 2 d 3 MIN 6.2 A two-ply game tree. The 4 nodes are “MAX nodes,” in which it is MAX’s turn e, and the 5 nodes are “MIN nodes.” The terminal nodes show the utility values for he other nodes are labeled with their minimax values. MAX’s best move at the root is ause it leads to the state with the highest minimax value, and MIN’s best reply is b 1 , e it leads to the state with the lowest minimax value. M IN prefers a state of minimum value (that is, minimum value for MAX and thus value for M IN). So we have: IMAX (s) = UTILITY(s, MAX) if IS-TERMINAL(s) maxa 2 Actions(s) MINIMAX(RESULT(s, a)) if TO-MOVE(s) = MAX mina 2 Actions(s) MINIMAX(RESULT(s, a)) if TO-MOVE(s) = MIN y these definitions to the game tree in Figure 6.2. The terminal nodes on the bottom eir utility values from the game’s U TILITY function. The first MIN node, labeled successor states with values 3, 12, and 8, so its minimax value is 3. Similarly, o MIN nodes have minimax value 2. The root node is a MAX node; its successor minimax values 3, 2, and 2; so it has a minimax value of 3. We can also identify ax decision at the root: action a 1 is the optimal choice for MAX because it leads to Minimax decision th the highest minimax value. efinition of optimal play for MAX assumes that MIN also plays optimally. What if

Alpha-Beta Pruning

The number of game states is exponential in the depth of the tree.
No algorithm can completely eliminate the exponent
We can sometimes cut it in half, computing the correct minimax decision without examining every state by pruning large parts of the tree that make no difference to the outcome. 10

Alpha-Beta Pruning

(a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3.
(b) The second leaf below B has a value of 12 ; MIN would avoid this move, so the value of B is still at most 3.
(c) The third leaf below B has a value of 8 ; we have seen all B’s successor states, so the value of B is exactly 3. Now we can infer that the value of the root is at least 3, because MAX has a choice worth 3 at the root.
(d) The first leaf below C has the value 2. Hence, C, which is a MIN node, has a value of at most 2. But we know that B is worth 3 , so MAX would never choose C. Therefore, there is no point in looking at the other successor states of C. This is an example of alpha–beta pruning.
(e) The first leaf below D has the value 14 , so D is worth at most
1. This is still higher than MAX ’s best alternative (i.e., 3), so we need to keep exploring D’s successor states. Notice also that we now have bounds on all of the successors of the root, so the root’s value is also at most 14.
(f) The second successor of D is worth 5, so again we need to keep exploring. The third successor is worth 2, so now D is worth exactly 2. MAX ’s decision at the root is to move to B, giving a value of 3. 11 Section 6.2 Optimal Decisions in Gam (a) (b) (c) (d) (e) (f) 3 3 12 3 12 8 3 12 8 2 3 12 8 2 14 3 12 8 2 14 5 2 A B A B A B C D A B C D A B A B C [– , + ] [– , 3] [3, + ] [3, 3] [– , 2] [3, 3] [3, 14] [– , 2] [– , 14] [3, 3] [3, 3] [2, 2] [3, + ] [3, 3] [– , 3] [– , + ] [– , 2] Figure 6.5 Stages in the calculation of the optimal decision for the game tree in Figure 6.2. At each point, we show the range of possible values for each node. (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3.

function ALPHA-BETA-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state, •, +•) return move function MAX-VALUE(game, state, ↵, ) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a), ↵, ) if v2 > v then v, move v2, a ↵ MAX(↵, v) if v then return v, move return v, move function MIN-VALUE(game, state, ↵, ) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a), ↵, ) if v2 < v then v, move v2, a MIN(, v) if v  ↵ then return v, move return v, move Alpha-Beta alg.

α = highest-value for MAX.
- Think: α = “at least.”
β = lowest-value for MIN.
- Think: β = “at most.” 13

https://pascscha.ch/info2/abTreePractice/

Move Ordering

The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are examined.
For example, in Figure, (e) and (f), we could not prune any successors of D at all because the worst successors (from the point of view of MIN) were generated first.
If the third successor of D had been generated first, with value 2, we would have been able to prune the other two successors. 14 Section 6.2 Optimal Decisions in Gam (a) (b) (c) (d) (e) (f) 3 3 12 3 12 8 3 12 8 2 3 12 8 2 14 3 12 8 2 14 5 2 A B A B A B C D A B C D A B A B C [– , + ] [– , 3] [3, + ] [3, 3] [– , 2] [3, 3] [3, 14] [– , 2] [– , 14] [3, 3] [3, 3] [2, 2] [3, + ] [3, 3] [– , 3] [– , + ] [– , 2] Figure 6.5 Stages in the calculation of the optimal decision for the game tree in Figure 6.2. At each point, we show the range of possible values for each node. (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3.

Transposition table

Idea: Cache and reuse information about previous search by using hash table
Avoid searching the same subtree twice
Get best move information from earlier, shallower searches
In game tree search, repeated states can occur because of transpositions
[w 1 ,b 1 ,w 2 ,b 2 ] - > resulting state s
After exploring a large subtree below s, we find its backed-up value, which we store

in the transposition table.

When we later search the sequence of moves [w 2 ,b 2 ,w 1 ,b 1 ] , we end up in s again,

and we can look up the value instead of repeating the search.

In chess, use of transposition tables is very effective, allowing us to double the reachable search depth in the same amount of time. 16

Claude Shannon’s Strategy

Even with alpha–beta pruning and clever move ordering, minimax won’t work for games like chess and Go, still too many states to explore in the time.
Type A strategy considers all possible moves to a certain depth in the search tree, and then uses a heuristic evaluation function to estimate the utility of states at that depth. - It explores a wide but shallow portion of the tree.
Type B strategy ignores moves that look bad, and follows promising lines “as far as possible.” - It explores a deep but narrow portion of the tree.
Chess programs are often Type A
Go programs are often Type B (due to the high branching factor)
Type B programs have shown world-champion-level play across a variety of games, including chess 17

Heuristic Alpha–Beta Tree Search

Define categories or equivalence classes of states based on experience:
- 82% of the states in the two-pawns versus one-pawn category lead to a win (utility +1);
- 2% to a loss (0),
- 16% to a draw (1/2).
A reasonable evaluation for states in the category is the expected value: (0.82 × +1) + (0.02 × 0) + (0.16 × 1/2) = 0.90.
Any given category will contain some states
This kind of analysis requires too many categories and hence too much experience to estimate all the probabilities 19

Heuristic Alpha–Beta Tree Search

Approximate material value
- each pawn is worth 1,
- a knight or bishop is worth 3,
- a rook 5,
- the queen 9.
- Other features such as “good pawn structure” and “king safety” might be worth half a pawn.
These feature values are then simply added up to obtain the evaluation of the position: weighted linear function
Each fi is a feature of the position (such as “number of white bishops”) and each wi is a weight (saying how important that feature is). 20

(a) White to move (b) White to move

Figure 6.8 Two chess positions that differ only in the position of the rook at lower right. In (a), Black has an advantage of a knight and two pawns, which should be enough to win the game. In (b), White will capture the queen, giving it an advantage that should be strong enough to win.

centuries, chess players have developed ways of judging the value of a position using just this

idea. For example, introductory chess books give an approximate material value for each Material value

piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9. Other

features such as “good pawn structure” and “king safety” might be worth half a pawn, say.

These feature values are then simply added up to obtain the evaluation of the position.

Mathematically, this kind of evaluation function is called a weighted linear function

Weighted linear function

because it can be expressed as

EVAL(s) = w 1 f 1 (s) + w 2 f 2 (s) + · · · + wn fn(s) =

n Â i= 1

Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning, Study Guides, Projects, Research of Artificial Intelligence

Related documents

Partial preview of the text

Download Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning and more Study Guides, Projects, Research Artificial Intelligence in PDF only on Docsity!

Artificial Intelligence

Adversarial Search and Games

Dr. Bilgin Avenoğlu

problems - adversarial search

6.2 Optimal Decisions in Games

depth of about 80 ply, and it is not feasible to search 35

states.

https://www.youtube.com/

https://pascscha.ch/info2/abTreePractice/

in the transposition table.

and we can look up the value instead of repeating the search.

(a) White to move (b) White to move

centuries, chess players have developed ways of judging the value of a position using just this

idea. For example, introductory chess books give an approximate material value for each Material value

piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9. Other

features such as “good pawn structure” and “king safety” might be worth half a pawn, say.

These feature values are then simply added up to obtain the evaluation of the position.

Mathematically, this kind of evaluation function is called a weighted linear function

because it can be expressed as

EVAL(s) = w 1 f 1 (s) + w 2 f 2 (s) + · · · + wn fn(s) =

wi fi(s) ,

where each fi is a feature of the position (such as “number of white bishops”) and each wi is

a weight (saying how important that feature is). The weights should be normalized so that the

sum is always within the range of a loss (0) to a win (+1). A secure advantage equivalent to

a pawn gives a substantial likelihood of winning, and a secure advantage equivalent to three

pawns should give almost certain victory, as illustrated in Figure 6.8(a). We said that the