Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Interacting with the Environment: Connectionism & Feedback in Learning Systems, Slides of Artificial Intelligence

The concepts of connectionism, feedback, and credit assignment in the context of collective learning systems (cls). It discusses the role of automata, allegedonic algorithms, and the environment in cls, as well as the importance of selection and compensation methods. The document also includes pseudo code for implementing cls and explanations of reward and punishment compensation.

Typology: Slides

2012/2013

Uploaded on 04/29/2013

shantii
shantii 🇮🇳

4.4

(14)

98 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Interacting with the environment
1
Connectionism
Stímulus
Response
Feedback
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Interacting with the Environment: Connectionism & Feedback in Learning Systems and more Slides Artificial Intelligence in PDF only on Docsity!

Interacting with the environment

Connectionism

Stímulus

Response

Feedback

Feedback at work:

playing tic tac toe

Possibilities

Boards

Plays

Won!

Credit Assignment and Connectionism

  • For all the actions the environment delivers one

single composite feedback

  • The automata must distribute the prize or

punishement among the network, generating a credit

assignment model

  • In this way the automata generates an adequate

internal pattern of behavior

  • This is we call Learning
  • There are many methodologies to model such

behavior

  • Here we shall use CLS (Collective Learning Systems)

So Far ….

Artificial Intelligence deals with knowledge and learning

Artificial learning is obtained by

  • Traversing knowledge bases (rule based and logical programming)
  • Artificial selection (genetic and evolutionary algorithms)
  • Adaptive methods (connectionism and feedback)

Adaptive behavior studies have their roots in Pavlov’s

studies of animal conditioning

CLS Formalization

CLS = [ AUTOMATA, MA ]

Where AUTOMATA = { I, O, STM, A }

I : Is a vector of possible entries or stimuli

O : Is a vector of possible responses or actions

STM : Is the transition matrix where the Probability Pij of choosing Response Oj is stored for each Stimulus Ii

A : Is an Alegedonic algorithm (punishment / reward) the modifies the distinct Pij according to the compensation policy of the automaton, and it is precisely this algorithm that represents learning

MA: Is the Environment that emits a series of stimuli I and evaluates the

responses O of the AUTOMATA, that serves to determine the values applied to Pij across the algorithm A, and the matrix STM.

CLS mapping

Possible moves

Other players turn

Second turn

Selection method

Compensation method

Initial method

After the game is over and the winner is determined the compensation method modifies probabilities and the STM becomes more prescriptive (knowledge) rather than just descriptive (information)

Looking at the options the CLS selects

Its moves and gives the board to

the other player

After the other selects a move, the

CLS takes a second move and so on until

the game is over description

possible prescription moves

1/2 1/4 0 0 1/ 0 0 3/4 1/8 0 1/

Best moves

Algedonic compensation

In case of a Reward (with 0 < ß < 1)

For selection i -> k in the STM ( the selected play )

STM(t+1)i,k = STM(t)i,k + ß*(1– STM(t)i,k

For the others transitions i à j for j ≠ k

STM(t+1)i,j = STM(t)i,j - ß*(1– STM(t)i,k)/(n-1)

In case of a Punishment (with 0 < ß < 1)

For selection i -> k in the STM ( the selected play )

STM(t+1)i,k = STM(t)i,k - ß*STM(t)i,k

For the others transitions i à j for j ≠ k

STM(t+1)i,j = STM(t)i,j + ß*STM(t)i,k /(n-1)

CLS Non linear Compensation

Think about the case of a R/P in my

everyday life. How much do I listen to

a R/P? It depends on:

  • Who is giving it to me
  • What is my expectation
  • The recent evaluations I have had

We can take into account such

concerns, for example:

  • The domain of ß is 0 < ß < 1

A value of 0 will cause no learning,

while a value of 1 will saturate the

STM driving to one selection only

  • A reward/inaction is achieved by

using ß = 0

  • When punishing a ß / 2 reduces

the changes of wrongly updating

probabilities

Rewards more at the beginning

STM[i][k] = STM[i][k] + ß(1-STM[i][k])(1-STM[i][k] )

Rewards more at the end

STM[i][k] = STM[i][k] + ß* (1-STM[i][k] )*(STM[i][k])

Punishes more at the beginning

STM[i][k] = STM[i][k] - ß/2STM[i][k](1-STM[i][k] )

Punishes more at the end

STM[i][k] = STM[i][k] - ß/2* STM[i][k]*STM[i][k]

Boltzman entropy

As measurement of order, entropy can be define as

Using Entropy, we can check how well organized is the STM, thus the lower the value of the entropy the more uneven the probabilities are in the STM.

S = - STM[i][j] Log 2 (STM[i][j] ) for all i,j

STM in time

The initial STM represents a

descriptive matrix of

possible actions in the

game. Mostly the rules of

the game (information)

Inputs i^ Output options j

As time goes by the STM becomes more prescriptive of the game, showing the right moves (knowledge)

Δ time

In reality is not only Δ time, but also Δ information that

reduces the entropy and guides the STM towards the right

answers, this the reason why information is also called

negentropy

Knowledge Representation (basic)

Additional to the transitions in a game,

it is also necessary to represents

other aspects of the game such as:

The board or state of the game

The value of each board state

This becomes an important part of

knowledge representation

Could be represented as

---- 0 ----

--X- 0 ----

O

X O

X
O O
X
X
O O
X

Yet it pays to represent equal boards with the same notation--X- 0 ----.

Why?

If we use only arrays, you will see that the

number of boards will make the amount

Information to represent quite big.

As a matter of fact this is were data

structures first began to be used in stead

of arrays and the concept of link list was

developed in languages such as IPL, Lisp,

Snobol.

Learning versus Preprogrammed Knowledge

Δ time

Δ performance

100 %

Slow learning

Below the

preprogrammed

Back to stability

It will remain limited until it is reprogrammed

However It will relearn