Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning, Study Guides, Projects, Research of Artificial Intelligence

Artificial Intelligence sides note

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 11/24/2022

mostafa-suleyman
mostafa-suleyman 🇹🇷

6 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Artificial Intelligence
Adversarial Search and Games
Dr. Bilgin Avenoğlu
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Optimal Decisions in Games: Minimax Algorithm and Alpha-Beta Pruning and more Study Guides, Projects, Research Artificial Intelligence in PDF only on Docsity!

Artificial Intelligence

Adversarial Search and Games

Dr. Bilgin Avenoğlu

Adversarial Search and Games

  • Competitive environments:
    • Two or more agents have conflicting goals, giving rise to adversarial search

problems - adversarial search

2

Two-player zero-sum games

  • S 0 : The initial state, which specifies how the game is set up at the start.
  • TO-MOVE(s) : The player whose turn it is to move in state s.
  • ACTIONS(s) : The set of legal moves in state s.
  • RESULT(s, a) : The transition model, which defines the state resulting from taking action a in state s.
  • IS-TERMINAL(s) : A terminal test, which is true when the game is over and false otherwise. - States where the game has ended are called terminal states.
  • UTILITY(s, p) : A utility function which defines the final numeric value to player p when the game ends in terminal state s. 4

Game Tree

  • Tic-tac-toe, the game tree is relatively small—fewer than 9! = 362, 880 terminal nodes (with only 5,478 distinct states).
  • But for chess there are over 10 40 nodes, so the game tree is best thought of as a theoretical construct that we cannot realize in the physical world. 5 194 Chapter 6 Adversarial Search and Games X X X X X X X X X X X O X O O O X O X O X ............ ... ... ... X X –1 0 + X X X OX X OX X OX O O X X OX O O O O X X MAX (X) MIN (O) MAX (X) MIN (O) TERMINAL Utility Figure 6.1 A (partial) game tree for the game of tic-tac-toe. The top node is the initial state, and MAX moves first, placing an X in an empty square. We show part of the tree, giving alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.

6.2 Optimal Decisions in Games

MAX wants to find a sequence of actions leading to a win, but MIN has something to say

MINIMAX Search Algorithm

  • The minimax algorithm performs a complete depth-first algorithm.
  • If the maximum depth of the tree is m and there are b legal moves at each point, then the time complexity of the minimax algorithm is O(b m ).
  • The space complexity is O(bm) for an algorithm that generates all actions at once
  • The exponential complexity makes MINIMAX impractical for complex games; - for example, chess has a branching factor of about 35 and the average game has

depth of about 80 ply, and it is not feasible to search 35

states.

  • MINIMAX serves as a basis for the mathematical analysis of games. 7

MINIMAX Algorithm 8 function MINIMAX-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state) return move function MAX-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a)) if v2 > v then v, move v2, a return v, move function MIN-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a)) if v2 < v then v, move v2, a return v, move Figure 6.3 An algorithm for calculating the optimal move using minimax—the move that leads to a terminal state with maximum utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole 196 Chapter 6 Adversarial Search and Games function MINIMAX-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state) return move function MAX-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a)) if v2 > v then v, move v2, a return v, move function MIN-VALUE(game, state) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v, move +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a)) if v2 < v then v, move v2, a return v, move Figure 6.3 An algorithm for calculating the optimal move using minimax—the move that leads to a terminal state with maximum utility, under the assumption that the opponent plays to minimize utility. The functions MAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves, to determine the backed-up value of a state and the move to get there.

https://www.youtube.com/

watch?v=zDskcx8FStA

Section 6.2 Optimal Decisions in Games 195 MAX (^) A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 d 1 d 2 d 3 MIN 6.2 A two-ply game tree. The 4 nodes are “MAX nodes,” in which it is MAX’s turn e, and the 5 nodes are “MIN nodes.” The terminal nodes show the utility values for he other nodes are labeled with their minimax values. MAX’s best move at the root is ause it leads to the state with the highest minimax value, and MIN’s best reply is b 1 , e it leads to the state with the lowest minimax value. M IN prefers a state of minimum value (that is, minimum value for MAX and thus value for M IN). So we have: IMAX (s) = UTILITY(s, MAX) if IS-TERMINAL(s) maxa 2 Actions(s) MINIMAX(RESULT(s, a)) if TO-MOVE(s) = MAX mina 2 Actions(s) MINIMAX(RESULT(s, a)) if TO-MOVE(s) = MIN y these definitions to the game tree in Figure 6.2. The terminal nodes on the bottom eir utility values from the game’s U TILITY function. The first MIN node, labeled successor states with values 3, 12, and 8, so its minimax value is 3. Similarly, o MIN nodes have minimax value 2. The root node is a MAX node; its successor minimax values 3, 2, and 2; so it has a minimax value of 3. We can also identify ax decision at the root: action a 1 is the optimal choice for MAX because it leads to Minimax decision th the highest minimax value. efinition of optimal play for MAX assumes that MIN also plays optimally. What if

Alpha-Beta Pruning

  • The number of game states is exponential in the depth of the tree.
  • No algorithm can completely eliminate the exponent
  • We can sometimes cut it in half, computing the correct minimax decision without examining every state by pruning large parts of the tree that make no difference to the outcome. 10

Alpha-Beta Pruning

  • (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3.
  • (b) The second leaf below B has a value of 12 ; MIN would avoid this move, so the value of B is still at most 3.
  • (c) The third leaf below B has a value of 8 ; we have seen all B’s successor states, so the value of B is exactly 3. Now we can infer that the value of the root is at least 3, because MAX has a choice worth 3 at the root.
  • (d) The first leaf below C has the value 2. Hence, C, which is a MIN node, has a value of at most 2. But we know that B is worth 3 , so MAX would never choose C. Therefore, there is no point in looking at the other successor states of C. This is an example of alpha–beta pruning.
  • (e) The first leaf below D has the value 14 , so D is worth at most
    1. This is still higher than MAX ’s best alternative (i.e., 3), so we need to keep exploring D’s successor states. Notice also that we now have bounds on all of the successors of the root, so the root’s value is also at most 14.
  • (f) The second successor of D is worth 5, so again we need to keep exploring. The third successor is worth 2, so now D is worth exactly 2. MAX ’s decision at the root is to move to B, giving a value of 3. 11 Section 6.2 Optimal Decisions in Gam (a) (b) (c) (d) (e) (f) 3 3 12 3 12 8 3 12 8 2 3 12 8 2 14 3 12 8 2 14 5 2 A B A B A B C D A B C D A B A B C [– , + ] [– , 3] [3, + ] [3, 3] [– , 2] [3, 3] [3, 14] [– , 2] [– , 14] [3, 3] [3, 3] [2, 2] [3, + ] [3, 3] [– , 3] [– , + ] [– , 2] Figure 6.5 Stages in the calculation of the optimal decision for the game tree in Figure 6.2. At each point, we show the range of possible values for each node. (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3.

function ALPHA-BETA-SEARCH(game, state) returns an action player game.TO-MOVE(state) value, move MAX-VALUE(game, state, •, +•) return move function MAX-VALUE(game, state, ↵, ) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v • for each a in game.ACTIONS(state) do v2, a2 MIN-VALUE(game, game.RESULT(state, a), ↵, ) if v2 > v then v, move v2, a ↵ MAX(↵, v) if v then return v, move return v, move function MIN-VALUE(game, state, ↵, ) returns a (utility, move) pair if game.IS-TERMINAL(state) then return game.UTILITY(state, player), null v +• for each a in game.ACTIONS(state) do v2, a2 MAX-VALUE(game, game.RESULT(state, a), ↵, ) if v2 < v then v, move v2, a MIN(, v) if v  ↵ then return v, move return v, move Alpha-Beta alg.

  • α = highest-value for MAX.
    • Think: α = “at least.”
  • β = lowest-value for MIN.
    • Think: β = “at most.” 13

https://pascscha.ch/info2/abTreePractice/

Move Ordering

  • The effectiveness of alpha–beta pruning is highly dependent on the order in which the states are examined.
  • For example, in Figure, (e) and (f), we could not prune any successors of D at all because the worst successors (from the point of view of MIN) were generated first.
  • If the third successor of D had been generated first, with value 2, we would have been able to prune the other two successors. 14 Section 6.2 Optimal Decisions in Gam (a) (b) (c) (d) (e) (f) 3 3 12 3 12 8 3 12 8 2 3 12 8 2 14 3 12 8 2 14 5 2 A B A B A B C D A B C D A B A B C [– , + ] [– , 3] [3, + ] [3, 3] [– , 2] [3, 3] [3, 14] [– , 2] [– , 14] [3, 3] [3, 3] [2, 2] [3, + ] [3, 3] [– , 3] [– , + ] [– , 2] Figure 6.5 Stages in the calculation of the optimal decision for the game tree in Figure 6.2. At each point, we show the range of possible values for each node. (a) The first leaf below B has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3.

Transposition table

  • Idea: Cache and reuse information about previous search by using hash table
  • Avoid searching the same subtree twice
  • Get best move information from earlier, shallower searches
  • In game tree search, repeated states can occur because of transpositions
  • [w 1 ,b 1 ,w 2 ,b 2 ] - > resulting state s
  • After exploring a large subtree below s, we find its backed-up value, which we store

in the transposition table.

  • When we later search the sequence of moves [w 2 ,b 2 ,w 1 ,b 1 ] , we end up in s again,

and we can look up the value instead of repeating the search.

  • In chess, use of transposition tables is very effective, allowing us to double the reachable search depth in the same amount of time. 16

Claude Shannon’s Strategy

  • Even with alpha–beta pruning and clever move ordering, minimax won’t work for games like chess and Go, still too many states to explore in the time.
  • Type A strategy considers all possible moves to a certain depth in the search tree, and then uses a heuristic evaluation function to estimate the utility of states at that depth. - It explores a wide but shallow portion of the tree.
  • Type B strategy ignores moves that look bad, and follows promising lines “as far as possible.” - It explores a deep but narrow portion of the tree.
  • Chess programs are often Type A
  • Go programs are often Type B (due to the high branching factor)
  • Type B programs have shown world-champion-level play across a variety of games, including chess 17

Heuristic Alpha–Beta Tree Search

  • Define categories or equivalence classes of states based on experience:
    • 82% of the states in the two-pawns versus one-pawn category lead to a win (utility +1);
    • 2% to a loss (0),
    • 16% to a draw (1/2).
  • A reasonable evaluation for states in the category is the expected value: (0.82 × +1) + (0.02 × 0) + (0.16 × 1/2) = 0.90.
  • Any given category will contain some states
  • This kind of analysis requires too many categories and hence too much experience to estimate all the probabilities 19

Heuristic Alpha–Beta Tree Search

  • Approximate material value
    • each pawn is worth 1,
    • a knight or bishop is worth 3,
    • a rook 5,
    • the queen 9.
    • Other features such as “good pawn structure” and “king safety” might be worth half a pawn.
  • These feature values are then simply added up to obtain the evaluation of the position: weighted linear function
  • Each fi is a feature of the position (such as “number of white bishops”) and each wi is a weight (saying how important that feature is). 20

(a) White to move (b) White to move

Figure 6.8 Two chess positions that differ only in the position of the rook at lower right. In (a), Black has an advantage of a knight and two pawns, which should be enough to win the game. In (b), White will capture the queen, giving it an advantage that should be strong enough to win.

centuries, chess players have developed ways of judging the value of a position using just this

idea. For example, introductory chess books give an approximate material value for each Material value

piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9. Other

features such as “good pawn structure” and “king safety” might be worth half a pawn, say.

These feature values are then simply added up to obtain the evaluation of the position.

Mathematically, this kind of evaluation function is called a weighted linear function

Weighted linear function

because it can be expressed as

EVAL(s) = w 1 f 1 (s) + w 2 f 2 (s) + · · · + wn fn(s) =

n  i= 1

wi fi(s) ,

where each fi is a feature of the position (such as “number of white bishops”) and each wi is

a weight (saying how important that feature is). The weights should be normalized so that the

sum is always within the range of a loss (0) to a win (+1). A secure advantage equivalent to

a pawn gives a substantial likelihood of winning, and a secure advantage equivalent to three

pawns should give almost certain victory, as illustrated in Figure 6.8(a). We said that the