






















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Data Models,Parametric Estimation,Statistical Testing and Linear Regression.
Typology: Exercises
1 / 62
This page cannot be seen from the preview
Don't miss anything!
Each theme begins with an abstract of the lecture and an exercise with detailed solution. The computations have been made with a software; due to rounding errors, there may be some minor differences with computations from statistical tables.
Let ( x 1 ,... , xn ) be a sample, i.e. a series of numerical values for a certain variable in a set of n individuals.
n
∑^ n
i =
xi.
( 1 n
∑^ n
i =
x^2 i
) − x^2.
Exercise 1.1.1. Here are numbers by age of non-smoking mothers at delivery.
age 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 number 7 8 9 10 12 3 2 5 4 5 2 4 2 0 1
The modalities are the whole numbers between 21 and 35_._
The standard deviation is the square root of the variance:
sx =
that is approximately 3 years and 7 months.
The values of the empirical distribution function are the cumulated sums of fre- quencies.
age 21 22 23 24 25 26 27 cum. freq. (^747157424743474467449745174) rounded 0.095 0.203 0.324 0.459 0.622 0.662 0.
56 74
60 74
65 74
67 74
71 74
73 74
73 74
74 74 0.757 0.811 0.878 0.905 0.959 0.986 0.986 1
It is the sum of empirical frequencies for the modalities 22 , 23 , 24 , 25 , or else the increment of the empirical distribution function F (25) − F (21) , that is 39 / 74 ' 0_._ 527_. More than half of the women in the sample are between_ 22 and 25 years old.
0.0 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
The median is 25 years; the first quartile is 23 years, the last quartile is 28 years.
The mean is larger than the median, which is normal for a distribution skewed to the right. For the same reason, the gap between the last quartile and the median is larger than that between the median and the first quartile. Both are lower than the standard deviation: this is the case for most distributions, whether they are symmetrical or skewed.
Exercise 1.1.2. Here are numbers by age of smoking birth mothers at delivery.
age 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 number 5 5 4 3 3 5 1 4 3 2 3 2 1 1 1
Exercise 1.1.3. Consider the sample (1 , 0 , 2 , 1 , 1 , 0 , 1 , 0 , 0).
Exercise 1.1.4. Consider the sample
(1_._ 2 , 0_._ 2 , 1_._ 6 , 1_._ 1 , 0_._ 9 , 0_._ 3 , 0_._ 7 , 0_._ 1 , 0_._ 4).
P[ T | M ] = 1 − P[ T | M ] = 1 − 0_._ 9 = 0_._ 1_._
P[ T and M ] = P[ T | M ] P[ M ] = 0_._ 1 × 0_._ 7 = 0_._ 07_._
Use the formula of total probabilities or compute it straight away, by distinguishing among those sheep reacting positively, those which are ill from those which are not.
P[ T ] = P[ T and M ] + P[ T and M ]
= P[ T | M ] P[ M ] + P[ T | M ] P[ M ]
= 0_._ 8 × 0_._ 3 + 0_._ 1 × 0_._ 7 = 0_._ 24 + 0_._ 07 = 0_._ 31_._
Use the Bayes formula or prove it again as follows.
P[ T and M ] P[ T ]
Use the Bayes formula or prove it again as follows.
P[ T and M ] P[ T ]
Exercise 1.2.2. There are three sorts of a given plant: early, normal, and late. It can also be either dwarf or tall. In a sample of plants grown from 1000 seeds, there are 600 dwarf, 200 late, 300 early dwarf, 250 normal tall, 100 late tall. Consider the plant grown from a seed taken at random.
Exercise 1.2.3. In a batch of manufactured items, 5% are faulty. The items are checked, but the checking is not perfect. If the item is good, it is accepted with probability 0_._ 96 ; if it is faulty, it is rejected with probability 0_._ 98. An item is chosen at random, then checked.
Exercise 1.2.4. Here are the percentages of the different blood types in France.
Group O A B AB Factor Rhesus + 37.0 38.1 6.2 2. Rhesus – 7.0 7.2 1.2 0.
P[ X > 3] = P[ X = 3] + P[ X = 4] + P[ X = 5]
( 5 3
) 0_._ 93 0_._ 12 +
( 5 4
) 0_._ 94 0_._ 11 +
( 5 5
) 0_._ 95 0_._ 10
Exercise 1.3.2. When a hunter aims at a helpless rabbit, he has 1 chance out of 10 to hit it.
(a) neither of them hit; (b) only one of them hits; (c) both hunters hit.
(a) What is the probability distribution of the number of shots suffered by the poor animal? Give the expectation and variance of that distribution. (b) What is the probability that the rabbit is hit at most twice? (c) What is the probability that the rabbit is hit at least twice?
(a) What is the probability for the rabbit not to be hit? (b) What is the probability that the rabbit becomes inedible (if it has received at least 5 shots).
Exercise 1.3.3. At an identification session, 6 witnesses are asked to identify a murderer among 4 suspects, including yourself.
(a) of not being pointed out? (b) of being pointed out exactly once? (c) of being pointed out twice or more?
P[ X = k ] =
( m k
)( N − m n − k
) ( N n
Exercise 1.4.1. There are 18 girls and 11 boys in a certain group of students. A sample of 5 persons is chosen at random in that group. Let X be the random variable equal to the number of girls in that sample.
The distribution of X is the hypergeometric distribution with parameters N = 29 (total number of persons), m = 18 (the “marked” individuals are the girls), and n = 5 (the size of the sample). The values are the integers between 0 and 5_. For any integer k_ = 0 , 1_... ,_ 5 :
P[ X = k ] =
( 18 k
)( (^11) 5 − k
) ( 29 5
The expectation of X is 5 × 18 / 29 ' 3_._ 1_. It is the size of the sample, multiplied by the proportion of girls in the group._
( 18 5
) ( 29 5
The value of P[ X > 1] must be calculated. It could be done as P[ X = 1] + P[ X =
σ^2 follows the N (0 , 1) distribution. Thus:
P[ a 6 X 6 b ] = P
[ a − μ √ σ^2
X − μ √ σ^2
b − μ √ σ^2
]
( b − μ √ σ^2
) − F
( a − μ √ σ^2
) ,
where F is the distribution function of the N (0 , 1).
Exercise 1.5.1. The height X of men in France is modeled by a normal distribution N (172 , 196) (unit: cm).
P[ X < 160] = P
[ X − 172 √ 196
] = F (− 0_._ 857) = 1 − F (0_._ 857) = 0_._ 1957 ,
where F denotes the distribution function of the N (0 , 1) distribution.
P[ X > 200] = P
[ X − 172 √ 196
] = 1 − F (2) = 0_._ 02275_._
P[165 < X < 185] = P
[ 165 − 172 √ 196
]
The question amounts to finding the size such that 90% of the French are smaller, i.e. the 90 -th quantile of the ninth decile. Let x be that size.
P[ X < x ] = P
[ X − 172 √ 196
x − 172 √ 196
] = 0_._ 9
Thus x √− 196172 is the value of the quantile function of the N (0 , 1) distribution for p = 0_._ 9 , that is 1_._ 2816_. Therefore:_
x = 172 + 1_._ 2816 ×
196 ' 190 cm.
Let X denote the size of the man and Y that of the woman, and suppose they are independent. Then X − Y follows the normal distribution N (10 , 340). The probability for X to be larger than Y is the probability for X − Y to be positive:
[ ( X − Y ) − 10 √ 340
] = 1 − F (− 0_._ 5423) = 0_._ 7062_._
Exercise 1.5.2. Let X be a random variable with N (0 , 1) distribution.
(a) P[ X > 1_._ 45] (b) P[− 1_._ 65 6 X 6 1_._ 34] (c) P[| X | < 2_._ 05]
(a) P[ X < u ] = 0_._ 63 (b) P[ X > u ] = 0_._ 63 (c) P[| X | < u ] = 0_._ 63
Exercise 1.5.3. Let X be a random variable with N (0 , 1) distribution. Let Y = 2 X − 3.
Exercise 1.5.4. Let X be a random variable with N (3 , 25) distribution.
(a) P[ X < 6] (b) P[ X > −2] (c) P[− 1 6 X 6 1_._ 5]
P[ a 6 X 6 b ] = P
(^) √ a^ −^ np np (1 − p )
X − np √ np (1 − p )
b − np √ np (1 − p )
(^) √ b^ −^ np np (1 − p )
(^) − F
(^) √ a^ −^ np np (1 − p )
(^) ,
where F is the distribution function of the N (0 , 1).
Exercise 1.6.1. From past experience, it is known that a certain surgery has a 90% chance to succeed. This surgery is performed by a certain clinic 400 times each year. Let N be the number of successes next year. The normal approximation will be used for N.
The expectation is 400 × 0_._ 9 = 360 , the variance is 400 × 0_._ 9 × 0_._ 1 = 36_._
P[ N > 345] = P
[ N − 360 √ 36
]
P[ N 6 372] = P
[ N − 360 √ 36
]
Let n be the number of failed surgeries that must be determined. The correspond- ing number of successes is 400 − n. Therefore P[ N 6 400 − n ] = 0_._ 01_. Now:_
P[ N 6 400 − n ] = P
[ N − 360 √ 36
400 − n − 360 √ 36
]
( 40 − n √ 36
) = 0_._ 01_._
The number^40 √− 36 n is the quantile at 0_._ 01 of the N (0 , 1) , that is − 2_._ 3236_. Thus:_
40 − n √ 36
= − 2_._ 3263 =⇒ n = 40 + 2_._ 3263
The reasoning could also be applied to the number of failed surgeries R = 400− N. It follows the binomial distribution B(400 , 0_._ 1) , that can be approximated by the normal N (40 , 36). The desired number is such that P[ R > n ] = 0_._ 01_._
P[ R > n ] = P
[ R − 40 √ 36
n − 40 √ 36
]
( n − 40 √ 36
)
( 40 − n √ 36
) = 0_._ 01_._
Of course the result is the same.
Exercise 1.6.2. Among people old enough to receive an injection against the flu, 40% of them ask for it. In a population of 150000 persons old enough to receive the injection, let N be the number of those that will ask for it.
Exercise 1.6.3. A restaurant, serving only upon reservation, has 50 seats. The proba- bility that a someone with a reservation does not show up is 1 / 5. Let N be the number of meals served on a given day. The normal approximation will be used for N.
Exercise 1.6.4. Suppose there is probability 0_._ 1 of being controlled in the tramway. Mr A. makes 700 trips per year. The normal approximation will be used for the number of fraud checks.
Exercise 2.1.1. Consider the statistical sample (1 , 0 , 2 , 1 , 1 , 0 , 1 , 0 , 0).
x =
and s^2 x =
The empirical mean ( 2 / 3 ) is an unbiased estimate of the expectation. An unbiased estimate of the variance is obtained, multiplying s^2 x by 9 / 8 : this gives 1 / 2_._
The expectation of the B(2 , p ) distribution is 2 p. It is estimated by the empirical mean (here 2 / 3 ). Thus p can be estimated by:
2 / 3 2
The variance of the B(2 , p ) distribution is 2 p (1 − p ). It is estimated by 1 / 2_. The value of p can be estimated by solving the equation_ 2 p (1 − p ) = 1 / 2 , giving p = 1 / 2_._
The parameter λ can be estimated by the empirical mean, 2 / 3_._
Exercise 2.1.2. Consider the statistical sample (1 , 3 , 2 , 3 , 2 , 2 , 0 , 2 , 3 , 1).
Exercise 2.1.3. Consider the statistical sample (1_._ 2 , 0_._ 2 , 1_._ 6 , 1_._ 1 , 0_._ 9 , 0_._ 3 , 0_._ 7 , 0_._ 1 , 0_._ 4).
A Gaussian sample is a n -tuple ( X 1 ,... , Xn ) of independent random variables with normal distribution N ( μ, σ^2 ). The empirical mean and variance of the sample are given by:
X =
n
∑^ n
i =
Xi et S^2 =
( 1 n
∑^ n
i =
X i^2
) − X 2 ,
X − uα
σ^2 √ n
; X + uα
σ^2 √ n
] ,
where uα is the quantile of order 1 − α/ 2 for the normal distribution N (0 , 1).
X − tα
n − 1
; X + tα
n − 1
] ,
where tα is the quantile of order 1 − α/ 2 for the Student distribution with param- eter n − 1.