Download Probability Theory Basics: Events, Sample Space, and Rules and more Slides Human-Computer Interaction Design in PDF only on Docsity!
Probability Basics
2
Introduction
Let A be an uncertain event with possible outcomes A 1 , A 2 , … An
e.g. A=“Flipping a coin”, A 1 ={Head}, A 2 ={Tail}
The sample space S of event A is the collection of all possible
outcomes of A (Ai is the ith outcome)
e.g. A=“Flipping a coin”, S = {Head, Tail}
“U” is called Union; A 1 U A 2 means either A 1 or A 2 or both of them
happen
“∩” is called intersect; A 1 ∩ B 1 means both A 1 and B 1 happen, so
sometimes A 1 ∩ B 1 can also be called A 1 and B 1
4
Introduction (Cont’d)
Pr(A 1 )+Pr(A 2 )+•••+Pr(An) = 1
A 1 Pr(A 1 B 1 )Pr(A 1 )Pr(B 1 )Pr(A 1 B 1 )
S
B 1
If A 1 , A 2 , … An are all the possible outcomes of event A and not two
of these can occur at the same time, their probabilities must sum up to
1; A 1 , A 2 , … An are said to be collectively exhaustive and mutually
exclusive
If two outcomes A 1 and B 1 can occur at the same time, then the
probability of either A 1 or B 1 or both happening equals the sum of
their individual probability minus the probability of them both
happening at the same time
S
A 1 A 2 … An
5
Basic Probability Rules
Conditional Probability
The conditional probability of an outcome B in relationship to an outcome A is
the probability that outcome B occurs given that outcome A has already
occurred
Informally, conditioning on an event coincides with reducing the total event to the
conditioning event
Pr(B|A)= Pr(A)
Pr(B∩A)
A: Dow Jones up
B ∩ A: stock price up and Dow
Jones up
B: stock price up
S
7
Basic Probability Rules (Cont’d)
Multiplicative Rule
Calculating the probability of two outcomes happening at the same time
Pr(A i Bj)Pr(Ai|Bj)Pr(Bj)Pr(Bj|Ai)Pr(Ai)
No arrow between two chance nodes in influence diagrams implies independence
between the associated uncertain events
Dependence between A and B does not mean causation; it only means information
about A helps in determining the likelihood of outcomes of B
Events A (with outcomes A 1 ,…,An) and B (with outcomes B 1 ,…,Bm)
are independent if and only if information about A does not provide
any information about B and vice versa. Mathematically,
Pr(Ai |Bj)Pr(Ai),Pr(Bj|Ai)Pr(Bj) , and Pr(Ai Bj)Pr(Ai)Pr(Bj)
(for i=1,2,…,n; j=1,2,…,m)
8
Basic Probability Rules (Cont’d)
Events A (with outcomes A 1 ,…,An) and B (with outcomes B 1 ,…,Bm)
are conditionally independent given C (with outcomes C 1 ,…,Cp) if
and only if once C is known, any knowledge about A does not provide
more information about B and vice versa. Mathematically,
A B
C
A B
C
Conditional independence in influence diagrams
Pr(A (^) i |Bj,Ck)Pr(Ai |Ck),Pr(Bj|Ai,Ck)Pr(Bj|Ck)
Pr(Ai Bj|Ck)Pr(Ai |Ck)Pr(Bj|Ck)
, and
(for i=1,…,n; j=1,…,m; k=1,…p)
10
Law of Total Probability
If B 1 , B 2 ,…, Bn are mutually exclusive and collectively exhaustive,
then
Pr(A |B )Pr(B )
Pr(A |B )Pr(B ) Pr(A |B )Pr(B ) Pr(A |B )Pr(B )
Pr(A ) Pr(A B ) Pr(A B ) Pr(A B )
j
n
j 1
i j
i 1 1 i 2 2 i n n
i i 1 i 2 i n
B 1 B 2 B 3
Ai
Ai∩B 1 Ai∩B 2 Ai∩B 3
S
11
Oil Example
An oil company is considering a site for an exploratory well. If the rock strata
underlying the site are characterized by what geologists call a “dome” structure,
the chances of finding oil are somewhat greater than if dome structure exists. The
probability of a dome structure is Pr(Dome)=0.6. The conditional probabilities of
finding oil in this site are as follows.
Pr(Dry|Dome) = 0.6 Pr(Low|Dome) = 0.25 Pr(High|Dome) = 0.
Pr(Dry|No Dome) = 0.85 Pr(Low|No Dome) = 0.125^ Pr(High|No Dome) = 0.
Pr(Dry)=? Pr(Low)=? Pr(High)=?
13
Bayes Theorem
If B 1 , B 2 ,…, Bn are mutually exclusive and collectively exhaustive
Pr(A i∩Bj)=Pr(Bj|Ai)Pr(Ai)=Pr(Ai |Bj)Pr(Bj)
From the multiplicative rule it follows that:
(Eq. 1)
P(A )
Pr(A |B )Pr(B ) i
i j j
Dividing the LHS and RHS of Eq. 1 by Pr (Ai) yields:
Pr(B j|Ai)^ (Eq. 2)
From the law of total probability it follows that:
n
j 1
Pr(Ai ) Pr(Ai |Bj)Pr(Bj)^ (Eq. 3)
Pr(Bj): Prior probability (it does not take into account any information about A) Pr(Bj | Ai): Posterior probability (it is derived from the specific outcome of A)
Substituting Eq. 3 into Eq. 2 yields one of most the well known theorems in
probability theory – Bayes Theorem
Pr(B j|Ai)
∑
n
j= 1
i j j
i j j
Pr(A |B )•Pr(B )
Pr(A |B )•Pr(B )
(Eq. 4)
14
Oil Example (Cont’d)
Using Bayes Theorem
Flip Tree
Pr(Dome|Low) Pr(Low)
Pr(Low|Dome)Pr(Dome) 0.^250 . 20.^6 ^0.^75
Pr(Dome|High)Pr(High|Pr(DomeHigh)Pr() Dome) 0.^150 . 10.^6 0. 90
Pr(Dry)
Pr(Dry|Dome)Pr(Dome) Pr(Dome|Dry) 0. 06 . 70.^6 0. 514 Pr(No Dome|Dry) 1 0. 514 0. 486
Pr(No Dome|Low) 1 0. 75 0. 25
Pr(No Dome|High) 1 0. 90 0. 10
Dome (0.6)
(0.125)
Dry (0.6) (0.25) (0.15) (0.85) No Dome (0.4)
Low High Dry Low High(0.025)
dome structure
oil condition (^) Dry Dome^ (?)
Dome (?)
Dome (?)
No Dome (?)
Low
High
(0.7)
(0.2)
(0.1)
No Dome (?)
No Dome (?)
oil condition
dome structure
16
Uncertain Quantities (Cont’d)
A rv is usually denoted with a capital letter (such as X ) and the
specific value it takes is usually represented with a small letter (such
as x )
A rv is discrete if its possible values either constitute a finite set or
can be listed in an infinite sequence in which there is a first element, a
second element, etc.
e.g. The number of heads you get after you flip a coin 3 times
A rv is continuous if its set of possible values consists of an entire
interval on the numerical line
e.g. The failure time of a component
17
Discrete Probability Distributions
The probability distribution for a discrete rv an be expressed in two
ways: probability mass function (PMF) and cumulative distribution
function (CDF)
PMF of X lists the probabilities of each possible discrete outcome
x
Pr( X = x )
X = the number of heads you get after flipping a coin three times
x Pr( X=x )
0 1/ 1 3/
2 3/ 3 1/
19
Expected Value
We know a rv X can have many possible values. However, if you
have to give a “best guess” for X , what number would you give? The
expected value of X E( X ), is usually used as the “best guess”
Interpretation of Expected Value
If you were able to observe the outcomes of X a large number of times, the
calculated average of these observations would be close to the E( X )
If X can take on any value in the set { x 1 , x 2 , …, x n }, then
E( ) Pr( )
n
i 1
i i
X x X x
20
Expected Value (Cont’d)
If Y =g ( X ), then
E( ) g( ) Pr( )
n
i 1
i i
Y x X x
If Y =a X +b, where a and b are constants, then
If Z =a X +b Y , where a and b are constants, then
E( Y ) aE( X )b
E( Z )aE( X )bE( Y )