








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The properties of a bivariate normal distribution, focusing on finding the means, standard deviations, and covariance of two random variables x and y. Various methods for calculating these values, including integrating the joint probability density function and using moment-generating functions.
Typology: Summaries
1 / 14
This page cannot be seen from the preview
Don't miss anything!
Mathematical Expectation
π
and it follows that
σ 2 = 0. 2732 − ( 0. 4413 )^2 = 0. 0785
and σ =
The following is another theorem that is of importance in work connected with standard deviations or variances.
THEOREM 7. If X has the variance σ 2 , then
var( aX + b ) = a^2 σ 2
The proof of this theorem will be left to the reader, but let us point out the following corollaries: For a = 1, we find that the addition of a constant to the values of a random variable, resulting in a shift of all the values of X to the left or to the right, in no way affects the spread of its distribution; for b = 0, we find that if the values of a random variable are multiplied by a constant, the variance is multiplied by the square of that constant, resulting in a corresponding change in the spread of the distribution.
To demonstrate how σ or σ 2 is indicative of the spread or dispersion of the distribu- tion of a random variable, let us now prove the following theorem, called Chebyshev’s theorem after the nineteenth-century Russian mathematician P. L. Chebyshev. We shall prove it here only for the continuous case, leaving the discrete case as an exercise.
THEOREM 8. ( Chebyshev’s Theorem ) If μ and σ are the mean and the stan- dard deviation of a random variable X , then for any positive constant k the probability is at least 1 − (^) k^12 that X will take on a value within k standard deviations of the mean; symbolically,
P (| x − μ| < k σ ) Ú 1 −
k^2
, σ Z 0
Proof According to Definitions 4 and 5, we write
σ 2 = E [( X − μ)^2 ] =
∫ (^) q
−q
( x − μ)^2 · f ( x ) dx
Mathematical Expectation
Figure 2. Diagram for proof of Chebyshev’s theorem.
Then, dividing the integral into three parts as shown in Figure 2, we get
σ 2 =
∫ (^) μ− k σ
−q
( x − μ)^2 · f ( x ) dx +
∫ (^) μ+ k σ
μ− k σ
( x − μ)^2 · f ( x ) dx
∫ (^) q
μ+ k σ
( x − μ)^2 · f ( x ) dx
Since the integrand ( x − μ)^2 · f ( x ) is nonnegative, we can form the inequality
σ 2 G
∫ (^) μ− k σ
−q
( x − μ)^2 · f ( x ) dx +
∫ (^) q
μ+ k σ
( x − μ)^2 · f ( x ) dx
by deleting the second integral. Therefore, since ( x − μ)^2 G k^2 σ 2 for x F μ − k σ or x G μ + k σ it follows that
σ 2 G
∫ (^) μ− k σ
−q
k^2 σ 2 · f ( x ) dx +
∫ (^) q
μ+ k σ
k^2 σ 2 · f ( x ) dx
and hence that
1 k^2
∫ (^) μ− k σ
−q
f ( x ) dx +
∫ (^) q
μ+ k σ
f ( x ) dx
provided σ 2 Z 0. Since the sum of the two integrals on the right-hand side is the probability that X will take on a value less than or equal to μ − k σ or greater than or equal to μ + k σ , we have thus shown that
P (| X − μ| G k σ ) F
k^2 and it follows that
P (| X − μ| < k σ ) G 1 −
k^2
For instance, the probability is at least 1 − 21 2 = 34 that a random variable X will take on a value within two standard deviations of the mean, the probability is at least 1 − 31 2 = 89 that it will take on a value within three standard deviations of the
mean, and the probability is at least 1 − 51 2 = 2425 that it will take on a value within
Mathematical Expectation
41. Prove that cov( X , Y ) = cov( Y , X ) for both discrete and continuous random variables X and Y. 42. If X and Y have the joint probability distribution f ( x , y ) = 14 for x = −3 and y = −5, x = −1 and y = −1, x = 1 and y = 1, and x = 3 and y = 5, find cov( X , Y ). 43. This has been intentionally omitted for this edition. 44. This has been intentionally omitted for this edition. 45. This has been intentionally omitted for this edition. 46. If X and Y have the joint probability distribution f (−1, 0) = 0, f (−1, 1) = 14 , f (0, 0) = 16 , f (0, 1) = 0, f (1, 0) = 121 , and f (1, 1) = 12 , show that (a) cov( X , Y ) = 0; (b) the two random variables are not independent. 47. If the probability density of X is given by
f ( x ) =
⎧ ⎪⎪ ⎨ ⎪⎪ ⎩
1 + x for − 1 < x F 0 1 − x for 0 < x < 1 0 elsewhere
and U = X and V = X^2 , show that (a) cov( U , V ) = 0; (b) U and V are dependent.
48. For k random variables X 1 , X 2 ,... , Xk , the values of their joint moment-generating function are given by
E
( e t^1 X^1 + t^2 X^2 +···+ tk^ X^ k
)
(a) Show for either the discrete case or the continuous case that the partial derivative of the joint moment- generating function with respect to ti at t 1 = t 2 = · · · = tk = 0 is E ( Xi ). (b) Show for either the discrete case or the continu- ous case that the second partial derivative of the joint moment-generating function with respect to t (^) i and tj , i Z j , at t 1 = t 2 = · · · = tk = 0 is E ( Xi Xj ). (c) If two random variables have the joint density given by
f ( x , y ) =
{ e − x − y^ for x > 0, y > 0 0 elsewhere
find their joint moment-generating function and use it to determine the values of E ( XY ), E ( X ), E ( Y ), and cov( X , Y ).
49. If X 1 , X 2 , and X 3 are independent and have the means 4, 9, and 3 and the variances 3, 7, and 5, find the mean and the variance of
(a) Y = 2 X 1 − 3 X 2 + 4 X 3 ; (b) Z = X 1 + 2 X 2 − X 3.
50. Repeat both parts of Exercise 49, dropping the assumption of independence and using instead the information that cov( X 1 , X 2 ) = 1, cov( X 2 , X 3 ) = −2, and cov( X 1 , X 3 ) = −3. 51. If the joint probability density of X and Y is given by
f ( x , y ) =
⎧ ⎪⎨
⎪⎩
1 3
( x + y ) for 0 < x < 1, 0 < y < 2
0 elsewhere
find the variance of W = 3 X + 4 Y − 5.
52. Prove Theorem 15. 53. Express var( X + Y ), var( X − Y ), and cov( X + Y , X − Y ) in terms of the variances and covariance of X and Y. 54. If var( X 1 ) = 5, var( X 2 ) = 4, var( X 3 ) = 7, cov( X 1 , X 2 ) = 3, cov( X 1 , X 3 ) = −2, and X 2 and X 3 are indepen- dent, find the covariance of Y 1 = X 1 − 2 X 2 + 3 X 3 and Y 2 = − 2 X 1 + 3 X 2 + 4 X 3. 55. With reference to Exercise 49, find cov( Y , Z ). 56. This question has been intentionally omitted for this edition. 57. This question has been intentionally omitted for this edition. 58. This question has been intentionally omitted for this edition. 59. This question has been intentionally omitted for this edition. 60. (a) Show that the conditional distribution function of the continuous random variable X , given a < X F b , is given by
F ( x | a < X F b ) =
⎧ ⎪⎪⎪ ⎨ ⎪⎪⎪ ⎩
0 for x F a F ( x ) − F ( a ) F ( b ) − F ( a ) for a < x F b 1 for x > b
(b) Differentiate the result of part (a) with respect to x to find the conditional probability density of X given a < X F b , and show that
E [ u ( X )| a < X F b ] =
∫ (^) b
a
u ( x ) f ( x ) dx ∫ (^) b
a
f ( x ) dx
Special Probability Densities
40. With reference to Exercise 39, show that for normal distributions κ 2 = σ 2 and all other cumulants are zero. 41. Show that if X is a random variable having the Pois- son distribution with the parameter λ and λ→q, then the moment-generating function of
Z = X − λ √ λ
that is, that of a standardized Poisson random variable, approaches the moment-generating function of the stan- dard normal distribution.
42. Show that when α→q and β remains constant, the moment-generating function of a standardized gamma random variable approaches the moment-generating function of the standard normal distribution.
Among multivariate densities, of special importance is the multivariate normal dis- tribution , which is a generalization of the normal distribution in one variable. As it is best (indeed, virtually necessary) to present this distribution in matrix notation, we shall give here only the bivariate case; discussions of the general case are listed among the references at the end of this chapter.
DEFINITION 8. BIVARIATE NORMAL DISTRIBUTION. A pair of random variables X and Y have a bivariate normal distribution and they are referred to as jointly nor- mally distributed random variables if and only if their joint probability density is given by
f ( x , y ) = e
− (^2) ( 1 −^1 ρ) 2
[( x −μ 1 σ 1
) 2 − 2 ρ
( (^) x −μ σ^1 1
)( (^) y −μ σ^2 2
)
( (^) y −μ σ^2 2
) 2 ]
2 πσ 1 σ 2
1 − ρ^2
for −q < x < q and −q < y < q, where 1 > 0, 2 > 0, and − 1 < < 1_._
To study this joint distribution, let us first show that the parameters μ 1 , μ 2 , σ 1 , and σ 2 are, respectively, the means and the standard deviations of the two random variables X and Y. To begin with, we integrate on y from −q to q, getting
g ( x ) =
e
− (^2) ( 1 −^1 ρ (^2) )
( (^) x −μ σ^1 1
) 2
2 πσ 1 σ 2
1 − ρ^2
∫ (^) q
−q
e
− (^2) ( 1 −^1 ρ (^2) )
[( y −μ 2 σ 2
) 2 − 2 ρ
( (^) x −μ σ^1 1
)( (^) y −μ σ^2 2
)] dy
for the marginal density of X. Then, temporarily making the substitution u =
x − μ 1 σ 1 to simplify the notation and changing the variable of integration by letting v = y − μ 2 σ 2
, we obtain
g ( x ) =
e
− (^2) ( 1 −^1 ρ (^2) ) μ^2
2 πσ 1
1 − ρ^2
∫ (^) q
−q
e
− (^2) ( 1 −^1 ρ (^2) ) ( v^2 − 2 ρ uv ) dv
After completing the square by letting
v^2 − 2 ρ uv = ( v − ρ u )^2 − ρ^2 u^2 and collecting terms, this becomes
g ( x ) =
e −^ (^12) u 2
σ 1
2 π
2 π
1 − ρ^2
∫ (^) q
−q
e
− (^12)
( √^ v −ρ u 1 −ρ^2
) 2
dv
Special Probability Densities
Then, expressing this result in terms of the original variables, we obtain
w ( y | x ) =
σ 2
2 π
1 − ρ^2
e
− (^12)
⎡ ⎢⎣^ y −
{ μ 2 +ρ σ σ^21 ( x −μ 1 )
}
σ 2
1 −ρ^2
⎤ ⎥⎦
2
for −q < y < q, and it can be seen by inspection that this is a normal density with the mean μ Y | x = μ 2 + ρ
σ 2 σ 1
( x − μ 1 ) and the variance σ (^) Y^2 | x = σ 22 ( 1 − ρ^2 ). The corresponding results for the conditional density of X given Y = y follow by symmetry.
The bivariate normal distribution has many important properties, some statisti- cal and some purely mathematical. Among the former, there is the following prop- erty, which the reader will be asked to prove in Exercise 43.
THEOREM 10. If two random variables have a bivariate normal distribution, they are independent if and only if ρ = 0.
In this connection, if ρ = 0, the random variables are said to be uncorrelated. Also, we have shown that for two random variables having a bivariate normal distribution the two marginal densities are normal, but the converse is not necessar- ily true. In other words, the marginal distributions may both be normal without the joint distribution being a bivariate normal distribution. For instance, if the bivariate density of X and Y is given by
f ∗( x , y ) =
2 f ( x , y ) inside squares 2 and 4 of Figure 10 0 inside squares 1 and 3 of Figure 10 f ( x , y ) elsewhere
where f ( x , y ) is the value of the bivariate normal density with μ 1 = 0, μ 2 = 0, and ρ = 0 at ( x , y ), it is easy to see that the marginal densities of X and Y are normal even though their joint density is not a bivariate normal distribution.
2 1
3 4
x
y
Figure 10. Sample space for the bivariate density given by f ∗( x , y ).
Special Probability Densities
Figure 11. Bivariate normal surface.
Many interesting properties of the bivariate normal density are obtained by studying the bivariate normal surface , pictured in Figure 11, whose equation is z = f ( x , y ), where f ( x , y ) is the value of the bivariate normal density at ( x , y ). As the reader will be asked to verify in some of the exercises that follow, the bivariate nor- mal surface has a maximum at (μ 1 , μ 2 ), any plane parallel to the z -axis intersects the surface in a curve having the shape of a normal distribution, and any plane parallel to the xy -plane that intersects the surface intersects it in an ellipse called a contour of constant probability density. When ρ = 0 and σ 1 = σ 2 , the contours of constant probability density are circles, and it is customary to refer to the corresponding joint density as a circular normal distribution.
43. To prove Theorem 10, show that if X and Y have a bivariate normal distribution, then (a) their independence implies that ρ = 0; (b) ρ = 0 implies that they are independent. 44. Show that any plane perpendicular to the xy -plane intersects the bivariate normal surface in a curve having the shape of a normal distribution. 45. If the exponent of e of a bivariate normal density is
− 1 102 [( x + 2 )^2 − 2. 8 ( x + 2 )( y − 1 ) + 4 ( y − 1 )^2 ]
find (a) μ 1 , μ 2 , σ 1 , σ 2 , and ρ; (b) μ Y | x and σ (^) Y^2 | x.
46. If the exponent of e of a bivariate normal density is
− 1 54 ( x^2 + 4 y^2 + 2 xy + 2 x + 8 y + 4 )
find σ 1 , σ 2 , and ρ, given that μ 1 = 0 and μ 2 = −1.
47. If X and Y have the bivariate normal distribution with μ 1 = 2, μ 2 = 5, σ 1 = 3, σ 2 = 6, and ρ = 23 , find μ Y | 1 and σ Y | 1. 48. If X and Y have a bivariate normal distribution and U = X + Y and V = X − Y , find an expression for the correlation coefficient of U and V. 49. If X and Y have a bivariate normal distribution, it can be shown that their joint moment-generating function is given by
M (^) X , Y ( t 1 , t 2 ) = E ( e t^1 X^ +^ t^2 Y^ )
= e t^1 μ^1 +^ t^2 μ^2 +^ (^12) (σ 12 t (^21) + 2 ρσ 1 σ 2 t 1 t 2 + σ 22 t 22 )
Verify that (a) the first partial derivative of this function with respect to t 1 at t 1 = 0 and t 2 = 0 is μ 1 ; (b) the second partial derivative with respect to t 1 at t 1 = 0 and t 2 = 0 is σ 12 + μ^21 ; (c) the second partial derivative with respect to t 1 and t 2 at t 1 = 0 and t 2 = 0 is ρσ 1 σ 2 + μ 1 μ 2.
In many of the applications of statistics it is assumed that the data are approxi- mately normally distributed. Thus, it is important to make sure that the assumption
Regression and Correlation
The resulting equation is called the regression equation of Y on X_. Alternately, the_ regression equation of X on Y is given by
μ X | y = E ( X | y ) =
∫ (^) q
−q
x · f ( x | y ) dy
In the discrete case, when we are dealing with probability distributions instead of probability densities, the integrals in the two regression equations given in Definition 1 are simply replaced by sums. When we do not know the joint probability density or distribution of the two random variables, or at least not all its parameters, the determination of μ Y | x or μ X | y becomes a problem of estimation based on sample data; this is an entirely different problem, which we shall discuss in Sections 3 and 4.
Given the two random variables X and Y that have the joint density
f ( x , y ) =
x · e − x (^1 + y )^ for x > 0 and y > 0 0 elsewhere
find the regression equation of Y on X and sketch the regression curve.
Solution Integrating out y , we find that the marginal density of X is given by
g ( x ) =
e − x^ for x > 0 0 elsewhere
and hence the conditional density of Y given X = x is given by
w ( y | x ) =
f ( x , y ) g ( x )
x · e − x (^1 + y ) e − x^
= x · e − xy
for y > 0 and w ( y | x ) = 0 elsewhere, which we recognize as an exponential density
with θ =
x
. Hence, by evaluating
μ Y | x =
∫ (^) q
0
y · x · e − xy^ dy
or by referring to the corollary of a theorem given here “The mean and the variance of the exponential distribution are given by μ = θ and σ 2 = θ^2 ,” we find that the regression equation of Y on X is given by
μ Y | x =
x
The corresponding regression curve is shown in Figure 1.
Regression and Correlation
y
x
1
2
3
4
1 2 3 4
m y | x (^1) x
Figure 1. Regression curve of Example 1.
If X and Y have the multinomial distribution
f ( x , y ) =
n x , y , n − x − y
· θ 1 x θ y 2 ( 1 − θ 1 − θ 2 ) n − x − y
for x = 0, 1, 2,... , n , and y = 0, 1, 2,... , n , with x + y F n , find the regression equa- tion of Y on X.
Solution The marginal distribution of X is given by
g ( x ) =
n ∑− x
y = 0
n x , y , n − x − y
· θ x 1 θ 2 y ( 1 − θ 1 − θ 2 ) n − x − y
n x
θ 1 x ( 1 − θ 1 ) n − x
for x = 0, 1, 2,... , n , which we recognize as a binomial distribution with the parame- ters n and θ 1. Hence,
w ( y | x ) =
f ( x , y ) g ( x )
n − x y
θ y 2 (^1 −^ θ^1 −^ θ^2 )
n − x − y
( 1 − θ 1 ) n − x
for y = 0, 1, 2,... , n − x , and, rewriting this formula as
w ( y | x ) =
n − x y
θ 2 1 − θ 1
) y ( 1 − θ 1 − θ 2 1 − θ 1
) n − x − y
Regression and Correlation
Note that the conditional expectation obtained in the preceding example depends on x 1 but not on x 3. This could have been expected, since there is a pairwise independence between X 2 and X 3.
An important feature of Example 2 is that the regression equation is linear; that is, it is of the form
μ Y | x = α + β x
where α and β are constants, called the regression coefficients. There are several reasons why linear regression equations are of special interest: First, they lend them- selves readily to further mathematical treatment; then, they often provide good approximations to otherwise complicated regression equations; and, finally, in the case of the bivariate normal distribution, the regression equations are, in fact, linear. To simplify the study of linear regression equations, let us express the regression coefficients α and β in terms of some of the lower moments of the joint distribution of X and Y , that is, in terms of E ( X ) = μ 1 , E ( Y ) = μ 2 , var( X ) = σ 12 , var( Y ) = σ 22 , and cov( X , Y ) = σ 12. Then, also using the correlation coefficient
ρ =
σ 12 σ 1 σ 2
we can prove the following results.
THEOREM 1. If the regression of Y on X is linear, then
μ Y | x = μ 2 + ρ
σ 2 σ 1
( x − μ 1 )
and if the regression of X on Y is linear, then
μ X | y = μ 1 + ρ
σ 1 σ 2
( y − μ 2 )
Proof Since μ Y | x = α + β x , it follows that ∫ y · w ( y | x ) dy = α + β x
and if we multiply the expression on both sides of this equation by g ( x ), the corresponding value of the marginal density of X , and integrate on x , we obtain ∫ ∫ y · w ( y | x ) g ( x ) dy dx = α
g ( x ) dx + β
x · g ( x ) dx
or μ 2 = α + βμ 1
Regression and Correlation
since w ( y | x ) g ( x ) = f ( x , y ). If we had multiplied the equation for μ Y | x on both sides by x · g ( x ) before integrating on x , we would have obtained ∫ ∫ xy · f ( x , y ) dy dx = α
x · g ( x ) dx + β
x^2 · g ( x ) dx
or E ( XY ) = αμ 1 + β E ( X^2 )
Solving μ 2 = α + βμ 1 and E ( XY ) = αμ 1 + β E ( X^2 ) for α and β and mak- ing use of the fact that E ( XY ) = σ 12 + μ 1 μ 2 and E ( X^2 ) = σ 12 + μ^21 , we find that
α = μ 2 −
σ 12 σ 12
· μ 1 = μ 2 − ρ
σ 2 σ 1
· μ 1
and
β = σ 12 σ 12
= ρ σ 2 σ 1
This enables us to write the linear regression equation of Y on X as
μ Y | x = μ 2 + ρ
σ 2 σ 1
( x − μ 1 )
When the regression of X on Y is linear, similar steps lead to the equation
μ X | y = μ 1 + ρ
σ 1 σ 2
( y − μ 2 )
It follows from Theorem 1 that if the regression equation is linear and ρ = 0, then μ Y | x does not depend on x (or μ X | y does not depend on y ). When ρ = 0 and hence σ 12 = 0, the two random variables X and Y are uncorrelated , and we can say that if two random variables are independent, they are also uncorrelated, but if two random variables are uncorrelated, they are not necessarily independent; the latter is again illustrated in Exercise 9. The correlation coefficient and its estimates are of importance in many statistical investigations, and they will be discussed in some detail in Section 5. At this time, let us again point out that − 1 F ρ F +1, as the reader will be asked to prove in Exercise 11, and the sign of ρ tells us directly whether the slope of a regression line is upward or downward.
In the preceding sections we have discussed the problem of regression only in con- nection with random variables having known joint distributions. In actual practice, there are many problems where a set of paired data gives the indication that the regression is linear, where we do not know the joint distribution of the random vari- ables under consideration but, nevertheless, want to estimate the regression