Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals for Difference of Means with Unknown Variance, Study notes of Statistics

How to find confidence intervals for the difference of means when the variances are unknown. It covers the use of the Student's t distribution and the Welch-Satterthwaite equation. An example using data from two groups of mosquitoes is provided.

What you will learn

  • What is the role of the Student's t distribution in finding confidence intervals for the difference of means?
  • How does the effective degrees of freedom compare to the minimum and maximum degrees of freedom in confidence intervals?
  • What is the interpretation of a confidence interval for a parameter?
  • What is the formula for a confidence interval for the difference of means when the variances are unknown?
  • What is the Welch-Satterthwaite equation and how is it used to find confidence intervals?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

asdlol2
asdlol2 🇬🇧

4.4

(8)

233 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Topic 16
Interval Estimation
Our strategy to estimation thus far has been to use a method to find an estimator, e.g., method of moments, or maximum
likelihood, and evaluate the quality of the estimator by evaluating the bias and the variance of the estimator. Often, we
know more about the distribution of the estimator and this allows us to take a more comprehensive statement about the
estimation procedure.
Interval estimation is an alternative to the variety of techniques we have examined. Given data x, we replace the
point estimate ˆ
(x)for the parameter by a statistic that is subset ˆ
C(x)of the parameter space. We will consider both
the classical and Bayesian approaches to choosing ˆ
C(x). As we shall learn, the two approaches have very different
interpretations.
16.1 Classical Statistics
In this case, the random set ˆ
C(X)is chosen to have a prescribed high probability, , of containing the true parameter
value . In symbols,
P{2ˆ
C(X)}=.
3210123
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
zα
area α
Figure 16.1: Upper tail critical values.is the area under the standard
normal density and to the right of the vertical line at critical value z
In this case, the set ˆ
C(x)is called a -level con-
fidence set. In the case of a one dimensional param-
eter set, the typical choice of confidence set is a con-
fidence interval
ˆ
C(x)=(
ˆ
`(x),ˆ
u(x)).
Often this interval takes the form
ˆ
C(x)=(
ˆ
(x)m(x),ˆ
(x)+m(x)) = ˆ
(x)±m(x)
where the two statistics,
ˆ
(x)is a point estimate, and
m(x)is the margin of error.
16.1.1 Means
Example 16.1 (1-sample zinterval).If X1.X2....X
n
are normal random variables with unknown mean µ
but known variance 2
0. Then,
Z=¯
Xµ
0/pn
239
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Confidence Intervals for Difference of Means with Unknown Variance and more Study notes Statistics in PDF only on Docsity!

Topic 16

Interval Estimation

Our strategy to estimation thus far has been to use a method to find an estimator, e.g., method of moments, or maximum likelihood, and evaluate the quality of the estimator by evaluating the bias and the variance of the estimator. Often, we know more about the distribution of the estimator and this allows us to take a more comprehensive statement about the estimation procedure. Interval estimation is an alternative to the variety of techniques we have examined. Given data x, we replace the point estimate ✓ˆ(x) for the parameter ✓ by a statistic that is subset Cˆ(x) of the parameter space. We will consider both the classical and Bayesian approaches to choosing Cˆ(x). As we shall learn, the two approaches have very different interpretations.

16.1 Classical Statistics

In this case, the random set Cˆ(X) is chosen to have a prescribed high probability, , of containing the true parameter value ✓. In symbols, P (^) ✓ {✓ 2 Cˆ(X)} = .

(^0) − 3 − 2 − 1 0 1 2 3

area α

Figure 16.1: Upper tail critical values. ↵ is the area under the standard normal density and to the right of the vertical line at critical value z (^) ↵

In this case, the set Cˆ(x) is called a -level con- fidence set. In the case of a one dimensional param- eter set, the typical choice of confidence set is a con- fidence interval

C^ ˆ(x) = (✓ˆ (^) ` (x), ˆ✓ (^) u (x)). Often this interval takes the form

C^ ˆ(x) = (✓ˆ(x)m(x), ✓ˆ(x)+m(x)) = ✓ˆ(x)±m(x)

where the two statistics,

  • ✓ˆ(x) is a point estimate, and
  • m(x) is the margin of error.

16.1.1 Means

Example 16.1 (1-sample z interval). If X 1 .X 2.... X (^) n are normal random variables with unknown mean μ but known variance 20. Then,

Z =

X¯ μ 0 /

p n

is a standard normal random variable. For any ↵ between 0 and 1, let z (^) ↵ satisfy

P {Z > z (^) ↵ } = ↵ or equivalently P {Z  z (^) ↵ } = 1 ↵.

The value is known as the upper tail probability with critical value z (^) ↵. We can compute this in R using, for example

qnorm(0.975) [1] 1.

for ↵ = 0. 025. If = 1 2 ↵, then ↵ = (1 )/ 2. In this case, we have that

P {z (^) ↵ < Z < z↵ } = .

Let μ 0 is the state of nature. Taking in turn each the two inequalities in the line above and isolating μ 0 , we find that

X^ ¯ μ (^0) 0 /

p n

= Z < z (^) ↵

X^ ¯ μ 0 < z↵ p^0 n X^ ¯ z (^) ↵ p^ ^0 n

< μ (^0)

Similarly, X¯ μ (^0) 0 /

p n

= Z > z (^) ↵

implies

μ 0 < X¯ + z (^) ↵

p n

Thus X¯ z (^) ↵ p^ ^0 n

< μ 0 < X¯ + z (^) ↵

p n

has probability . Thus, for data x,

¯x ± z (^) (1)/ 2

p n

is a confidence interval with confidence level . In this case,

μ ˆ(x) = ¯x is the estimate for the mean and m(x) = z (^) (1)/ 2 0 /

p n is the margin of error.

We can use the z-interval above for the confidence interval for μ for data that is not necessarily normally dis- tributed as long as the central limit theorem applies. For one population tests for means, n > 30 and data not strongly skewed is a good rule of thumb.

Generally, the standard deviation is not known and must be estimated. So, let X 1 , X 2 , · · · , X (^) n be normal random variables with unknown mean and unknown standard deviation. Let S 2 be the unbiased sample variance. If we are forced to replace the unknown variance 2 with its unbiased estimate s 2 , then the statistic is known as t:

t =

x¯ μ s/

p n

The term s/

p n which estimates the standard deviation of the sample mean is called the standard error. The remarkable discovery by William Gossett is that the distribution of the t statistic can be determined exactly. Write

T (^) n 1 =

p n( X¯ μ) S

Then, Gossett was able to establish the following three facts:

df

df

df

df

Figure 16.3: Upper critical values for the t confidence interval with = 0. 90 (black), 0.95 (red), 0.98 (magenta) and 0.99 (blue) as a function of df , the number of degrees of freedom. Note that these critical values decrease to the critical value for the z confidence interval and increases with .

Thus, the interval is

  1. 490 ± 2. 0674

p 200

= 2. 490 ± 0. 099 or (2. 391 , 2 .589)

Example 16.3. We can obtain the data for the Michaelson-Morley experiment using R by typing

data(morley)

The data have 100 rows - 5 experiments (column 1) of 20 runs (column 2). The Speed is in column 3. The values for speed are the amounts over 299,000 km/sec. Thus, a t-confidence interval will have 99 degrees of freedom. We can see a histogram by writing hist(morley$Speed). To determine a 95% confidence interval, we find

mean(morley$Speed) [1] 852. sd(morley$Speed) [1] 79. qt(0.975,99) [1] 1.

Thus, our confidence interval for the speed of light is

p 100

= 299, 852. 4 ± 15. 7 or the interval (299836. 7 , 299868 .1)

This confidence interval does not include the presently determined values of 299,792.458 km/sec for the speed of light. The confidence interval can also be found by tying t.test(morley$Speed). We will study this command in more detail when we describe the t-test.

Histogram of morley$Speed

morley$Speed

Frequency

600 700 800 900 1000 1100

0

5

10

15

20

25

30

Figure 16.4: Measurements of the speed of light. Actual values are 299,000 kilometers per second plus the value shown.

Exercise 16.4. Give a 90% and a 98% confidence interval for the example above.

We often wish to determine a sample size that will guarantee a desired margin of error. For a -level t-interval, this is m = t (^) n 1 ,(1)/ 2

s p n

Solving this for n yields

n =

t (^) n 1 ,(1)/ 2 s m

Because the number of degrees of freedom, n 1 , for the t distribution is unknown, the quantity n appears on both sides of the equation and the value of s is unknown. We search for a conservative value for n, i.e., a margin of error that will be no greater that the desired length. This can be achieved by overestimating t (^) n 1 ,(1)/ 2 and s. For the speed of light example above, if we desire a margin of error of m = 10 km/sec for a 95% confidence interval, then we set t (^) n 1 ,(1)/ 2 = 2 and s = 80 to obtain

n ⇡

measurements are necessary to obtain the desired margin of error..

The next set of confidence intervals are determined, in the case in which the distributional variance in known, by finding the standardized score and using the normal approximation as given via the central limit theorem. In the cases in which the variance is unknown, we replace the distribution variance with a variance that is estimated from the observations. In this case, the procedure that is analogous to the standardized score is called the studentized score.

Example 16.5 (matched pair t interval). We begin with two quantitative measurements

(X (^1) , 1 ,... , X (^1) ,n ) and (X (^2) , 1 ,... , X (^2) ,n ),

on the same n individuals. Assume that the first set of measurements has mean μ 1 and the second set has mean μ 2.

no longer has a t-distribution. Welch and Satterthwaite have provided an approximation to the t distribution with effective degrees of freedom given by the Welch-Satterthwaite equation

s (^21) n 1 +^

s (^22) n (^2)

s (^41) n 21 ·(n 1 1) +^

s (^42) n 22 ·(n 2 1)

This give a -level confidence interval

x ¯ 1 x¯ 2 ± t (^) ⌫,(1/ 2

s s (^21) n (^1)

s (^22) n (^2)

For two sample tests, the number of observations per group may need to be at least 40 for a good approximation to the normal distribution.

Exercise 16.8. Show that the effective degrees is between the worst case of the minimum choice from a one sample t-interval and the best case of equal variances.

min{n 1 , n 2 } 1  ⌫  n 1 + n 2 2

For data on the life span in days of 88 wildtype and 99 transgenic mosquitoes, we have the summary

standard observations mean deviation wildtype 88 20.784 12. transgenic 99 16.546 10.

Using the conservative 95% confidence interval based on min{n 1 , n 2 } 1 = 87 degrees of freedom, we use

qt(0.975,87) [1] 1.

to obtain the interval

(20. 78 16 .55) ± 1. 9876

r

  1. 99 2 88

Using the the Welch-Satterthwaite equation, we obtain ⌫ = 169. 665. The increase in the number of degrees of freedom gives a slightly narrower interval (0. 768 , 7 .710).

16.1.2 Linear Regression

For ordinary linear regression, we have given least squares estimates for the slope and the intercept ↵. For data (x 1 , y 1 ), (x 2 , y 2 )... , (x (^) n , yn ), our model is y (^) i = ↵ + x (^) i + ✏ (^) i

where ✏ (^) i are independent N (0, ) random variables. Recall that the estimator for the slope

^ ˆ(x, y) = cov(x, y) var(x)

is unbiased.

Exercise 16.9. Show that the variance of ˆ equals 2 /(n 1)var(x).

If is known, this suggests a z-interval for a -level confidence interval

^ ˆ ± z (^) (1)/ 2 s (^) x

p n 1

Generally, is unknown. However, the variance of the residuals,

s (^2) u =

n 2

X^ n

i=

(y (^) i (ˆ↵ ˆx (^) i )) 2 (16.2)

is an unbiased estimator of 2 and s (^) u / has a t distribution with n 2 degrees of freedom. This gives the t-interval

^ ˆ ± t (^) n 2 ,(1)/ 2 s^ u s (^) x

p n 1

As the formula shows, the margin of error is proportional to the standard deviation of the residuals. It is inversely proportional to the standard deviation of the x measurement. Thus, we can reduce the margin of error by taking a broader set of values for the explanatory variables. For the data on the humerus and femur of the five specimens of Archeopteryx, we have ˆ = 1. 197. s (^) u = 1. 982 , s (^) x = 13. 2 , and t (^3) , 0. 025 = 3. 1824 , Thus, the 95% confidence interval is 1. 197 ± 0. 239 or (0. 958 , 1 .436).

16.1.3 Sample Proportions

Example 16.10 (proportions). For n Bernoulli trials with success parameter p, the sample proportion pˆ has

mean p and variance p(1 p) n

The parameter p appears in both the mean and in the variance. Thus, we need to make a choice p˜ to replace p in the confidence interval

p ˆ ± z (^) (1)/ 2

r p˜(1 p˜) n

One simple choice for p˜ is pˆ. Based on extensive numerical experimentation, one more recent popular choice is

p ˜ = x + 2 n + 4

where x is the number of successes. For population proportions, we ask that the mean number of successes np and the mean number of failures n(1 p) each be at least 10. We have this requirement so that a normal random variable is a good approximation to the appropriate binomial random variable.

Example 16.11. For Mendel’s data the F 2 generation consisted 428 for the dominant allele green pods and 152 for the recessive allele yellow pods. Thus, the sample proportion of green pod alleles is

p ˆ =

The confidence interval, using

˜p =

is

  1. 7379 ± z (^) (1)/ 2

r

  1. 7363 · 0. 2637 580 = 0. 7379 ± 0. 0183 z (^) (1)/ 2

For = 0. 98 , z (^0). 01 = 2. 326 and the confidence interval is 0. 7379 ± 0 .0426 = (0. 6953 , 0 .7805). Note that this interval contains the predicted value of p = 3/ 4.

The first confidence interval for μ 1 μ 2 is the two-sample t procedure. If we can assume that the two samples have a common standard deviation, then we pool the data to compute s (^) p , the pooled standard deviation. Matched pair procedures use a one sample procedure on the difference in the observed values. For these tests, we need a sample size large enough so that the central limit theorem is a sufficiently good approx- imation. For one population tests for means, n > 30 and data not strongly skewed is a good rule of thumb. For two population tests, n > 40 may be necessary. For population proportions, we ask that the mean number of successes np and the mean number of failures n(1 p) each be at least 10. For the standard error for ˆ in linear regression, s (^) u is defined in (16.2) and s (^) x is the standard deviation of the values of the explanatory variable.

16.1.5 Interpretation of the Confidence Interval

The confidence interval for a parameter ✓ is based on two statistics - ˆ✓ (^) ` (x), the lower end of the confidence interval and ✓ˆ (^) u (x), the upper end of the confidence interval. As with all statistics, these two statistics cannot be based on the value of the parameter. In addition, these two statistics are determined in advance of having the actual data. The term confidence can be related to the production of confidence intervals. We can think of the situation in which we produce independent confidence intervals repeatedly. Each time, we may either succeed or fail to include the true parameter in the confidence interval. In other words, the inclusion of the parameter value in the confidence interval is a Bernoulli trial with success probability . For example, after having seen these 100 intervals in Figure 5, we can conclude that the lowest and highest intervals are much less likely that 95% of containing the true parameter value. This phenomena can be seen in the presidential polls for the 2012 election. Three days before the election we see the following spread between Mr. Obama and Mr. Romney

0% -1% 0% 1% 5% 0% -5% -1% 1% 1%

with the 95% confidence interval having a margin of error ⇠ 3 % based on a sample of size ⇠ 1000. Because these values are highly dependent, the values of ± 5 % is less likely to contain the true spread.

Exercise 16.15. Perform the computations needed to determine the margin of error in the example above.

The following example, although never likely to be used in an actual problem, may shed some insight into the difference between confidence and probability.

Example 16.16. Let X 1 and X 2 be two independent observations from a uniform distribution on the interval [✓ 1 , ✓ + 1] where ✓ is an unknown parameter. In this case, an observation is greater than ✓ with probability 1/2, and less than ✓ with probability 1/2. Thus,

  • with probability 1/4, both observations are above ✓,
  • with probability 1/4, both observations are below ✓, and
  • with probability 1/2, one observation is below ✓ and the other is above.

In the third case alone, the confidence interval contains the parameter value. As a consequence of these considerations, the interval (✓ˆ (^) ` (X 1 , X 2 ), ✓ˆ (^) u (X 1 , X 2 )) = (min{X 1 , X 2 }, max{X 1 , X 2 })

is a 50% confidence interval for the parameter. Sometimes, max{X 1 , X 2 } min{X 1 , X 2 } > 1. Because any subinterval of the interval [✓ 1 , ✓ + 1] that has length at least 1 must contain ✓, the midpoint of the interval, this confidence interval must contain the parameter value. In other words, sometimes the 50% confidence interval is certain to contain the parameter.

Exercise 16.17. For the example above, show that

P {confidence interval has length > 1 } = 1/ 4.

Hint: Draw the square [✓ 1 , ✓ + 1] ⇥ [✓ 1 , ✓ + 1] and shade the region in which the confidence interval has length greater than 1.

Figure 16.5: One hundred confidence build from repeatedly simulating 100 standard normal random variables and constructing 95% confidence intervals for the mean value - 0. Note that the 24th interval is entirely below 0 and so does not contain the actual parameter. The 11th, 80th and 91st intervals are entirely above 0 and again do not contain the parameter.

16.1.6 Extensions on the Use of Confidence Intervals

Example 16.18 (delta method). For estimating the distribution μ by the sample mean X¯, the delta method provides an alternative for the example above. In this case, the standard deviation of g( X¯) is approximately

|g 0 (μ)| p n

We replace μ with X¯ to obtain the confidence interval for g(μ)

g( X¯) ± z (^) ↵/ 2

|g 0 ( X¯)| p n

Using the notation for the example of estimating ↵ 3 , the coefficient of volume expansion based on independent length measurements, Y 1 , Y 2 ,... , Y (^) n measured at temperature T 1 of an object having length ` 0 at temperature T 0.

We can extend the Welch and Satterthwaite method to include the delta method to create a t-interval with effective degrees of freedom

@g(¯x,y¯) 2 @x

s (^21) n 1 +^

@g(¯x,y¯) 2 @y

s (^22) n (^2)

@g(¯x,¯y) 4 @x

s (^41) n 21 ·(n 1 1) +^

@g(¯x,y¯) 4 @y

s (^42) n 22 ·(n 2 1)

We compute to find that ⌫ = 19. 4 and then use the t-interval

ˆ✓ ± t (^) ⌫,(1)/ 2 s (^) ✓ˆ.

For a 95% confidence, this is sightly larger interval 0. 4634 ± 0 .0063 = (0. 4571 , 0 .4697) radians or (26. 19 ^ , 26. 91 ^ ).

Example 16.20 (maximum likelihood estimation). The Fisher information is the main tool used to set an confidence interval for a maximum likelihood estimation. Two choices are typical. First, we can use the Fisher information I (^) n (✓). Of course, the estimator is for the unknown value of the parameter ✓. This give an confidence interval

✓^ ˆ ± z (^) ↵/ 2 q^1 nI(✓ˆ)

More recently, the more popular method is to use the observed information based on the observations x = (x 1 , x 2 ,... , x (^) n ).

J(✓ˆ) =

log L(✓ˆ|x) =

X^ n

i=

log f (^) X (x (^) i |✓ˆ).

This is the second derivative of the score function evaluated at the maximum likelihood estimator. Then, the confidence interval is

✓^ ˆ ± z (^) ↵/ 2 q^1 J(✓ˆ)

Note that E (^) ✓ J(✓) = nI(✓), the Fisher information for n observations. Thus, by the law of large numbers

1 n J(✓)! I(✓) as n! 1.

If the estimator is consistent and I is continuous at ✓, then

1 n

J(ˆ✓)! I(✓) as n! 1.

16.2 The Bootstrap

The confidence regions have been determined using aspects of the distribution of the data. In particular, these regions have often been specified by appealing to the central limit theorem and normal approximations. The notion behind bootstrap techniques begins with the concession that the information about the source of the data is insufficient to perform the analysis to produce the necessary description of the distribution of the estimator. This is particularly true for small data sets or highly skewed data. The strategy is to take the data and treat it as if it were the distribution underlaying the data and to use a resampling protocol to describe the estimator. For the example above, we estimated the angle in a right triangle by estimating ` and h, the lengths two adjacent sides by taking the mean of our measurements and then computing the arctangent of the ratio of these means. Using the delta method, our confidence interval was based on a normal approximation of the estimator. The bootstrap takes another approach. We take the data

x 1 , x 2 ,... , x (^) n 1 , y 1 , y 2 ,... , y (^) n 2 ,

Histogram of angle

angle

Frequency

26.6 26.8 27.0 27.

0

500

1000

1500

Figure 16.6: Bootstrap distribution of ˆ✓.

the empirical distribution of the measurements of ` and h and act as if it were the actual distribution. The next step is the use the data and randomly create the results of the experiment many times over. In the example, we choose, with replacement n 1 measurements from the x data and n 2 measurements from the y data. We then compute the bootstrap means ¯x (^) b and ¯y (^) b

and the estimate ˆ✓(¯x (^) b , y¯ (^) b ) = tan ^1

y¯ (^) b ¯x (^) b

Repeating this many times gives an empirical distribution for the estimator ✓ˆ. This can be accomplish in just a couple lines of R code.

angle<-rep(0,10000) for (i in 1:10000){xb<-sample(x,length(x),replace=TRUE); yb<-sample(y,length(y),replace=TRUE);angle[i]<-atan(mean(yb)/mean(xb))*180/pi} hist(angle)

We can use this bootstrap distribution of ✓ˆ to construct a confidence interval.

q<-c(0.005,0.01,0.025,0.5,0.975,0.99,0.995) quantile(angle,q) 0.5% 1% 2.5% 50% 97.5% 99% 99.5% 26.09837 26.14807 26.21860 26.55387 26.86203 26.91486 26.

A 95% confidence interval (26. 21 ^ , 26. 86 ^ ) can be accomplished using the 2.5th percentile as the lower end point and the 97.5th percentile as the upper end point. This confidence interval is very similar to the one obtained using the delta method.

Exercise 16.21. Give the 98% bootstrap confidence interval for the angle in the example above.

This gives a 95% credible interval of (0. 4650 , 0 .8203). This is indicated in the figure above by the two vertical lines. Thus, the area under the density function from the vertical lines outward totals 5%. The narrowest credible interval is (0. 4737 , 0 .8276). At these values, the density equals 0.695. The density is lower for more extreme values and higher between these values. The beta distribution has a probability 0.0306 below the lower value for the credible interval and 0.0194 above the upper value satisfying the criterion (16.4) with = 0. 95.

Example 16.23. For the example having both a normal prior distribution and normal data, we find that we also have a normal posterior distribution. In particular, if the prior is normal, mean ✓ 0 , variance 1 / and our data has sample mean x¯ and each observation has variance 1. The the posterior distribution has mean

✓ 1 (x) =

+ n

n + n

x.¯

and variance 1 /(n + ). Thus the credible interval is

✓ 1 (x) ± z (^) ↵/ 2

p + n

16.4 Answers to Selected Exercises

16.4. Using R to find upper tail probabilities, we find that

qt(0.95,99) [1] 1. qt(0.99,99) [1] 2.

For the 90% confidence interval

299 , 852. 4 ± 1. 6604

p 100

= 299852. 4 ± 13. 1 or the interval (299839. 3 , 299865 .5).

For the 98% confidence interval

299 , 852. 4 ± 2. 3646

p 100

= 299852. 4 ± 18. 7 or the interval (299833. 7 , 299871 .1).

16.8. Let

c =

s 21 /n (^1) s 22 /n (^2)

. Then,

s (^22) n (^2)

= c

s (^21) n (^1)

Then, substitute for s 22 /n 2 and divide by s 21 /n 1 to obtain

s (^21) n 1 +^

s (^22) n (^2)

s (^41) n 21 ·(n 1 1) +^

s (^42) n 22 ·(n 2 1)

s (^21) n 1 +^

cs (^21) n (^1)

s (^41) n 21 ·(n 1 1) +^

c 2 s (^41) n 21 ·(n 2 1)

(1 + c) 2 1 n 1 1 +^

c 2 n 2 1

(n 1 1)(n 2 1)(1 + c) 2 (n 2 1) + (n 1 1)c 2

Take a derivative to see that

d⌫ dc

= (n 1 1)(n 2 1) ((n 2 1) + (n 1 1)c 2 ) · 2(1 + c) (1 + c) 2 · 2(n 1 1)c ((n 2 1) + (n 1 1)c 2 ) 2

= 2(n 1 1)(n 2 1)(1 + c) ((n 2 1) + (n 1 1)c 2 ) (1 + c)(n 1 1)c ((n 2 1) + (n 1 1)c 2 ) 2

= 2(n 1 1)(n 2 1)(1 + c)

(n 2 1) (n 1 1)c ((n 2 1) + (n 1 1)c 2 ) 2

So the maximum takes place at c = (n 2 1)/(n 1 1) with value of ⌫.

(n 1 1)(n 2 1)(1 + (n 2 1)/(n 1 1)) 2 (n 2 1) + (n 1 1)((n 2 1)/(n 1 1)) 2

= (n 1 1)(n 2 1)((n 1 1) + (n 2 1)) 2 (n 1 1) 2 (n 2 1) + (n 1 1)(n 2 1) 2

=

((n 1 1) + (n 2 1)) 2 (n 1 1) + (n 2 1) = n 1 + n 2 2.

Note that for this value s (^21) s (^22)

n (^1) n (^2)

c =

n 1 /(n 1 1) n 2 /(n 2 1)

and the variances are nearly equal. Notice that this is a global maximum with

⌫! n 1 1 as c! 0 and s 1 ⌧ s 2 and ⌫! n 2 1 as c! 1 and s 2 ⌧ s 1.

The smaller of these two limits is the global minimum.

16.9. Recall that ˆ is an unbiased estimator for , thus E (^) (↵,) ˆ = , and E (^) (↵,) [( ˆ ) 2 ] is the variance of ˆ.

^ ˆ(x, y) = 1 (n 1)var(x)

X^ n

i=

(x (^) i x¯)(y (^) i y¯)

X^ n

i=

(x (^) i x¯)(x (^) i x¯)

(n 1)var(x)

X^ n

i=

(x (^) i x¯)(y (^) i y¯ (x (^) i ¯x))

(n 1)var(x)

X^ n

i=

(x (^) i x¯)((y (^) i x (^) i ) (¯y x¯))

(n 1)var(x)

X^ n

i=

(x (^) i x¯)(y (^) i x (^) i )

X^ n

i=

(x (^) i ¯x)(¯y ¯x)

The second sum is 0. For the first, we use the fact that y (^) i x (^) i = ↵ + ✏ (^) i. Thus,

Var(↵,) ( ˆ) = Var(↵,)

(n 1)var(x)

X^ n

i=

(x (^) i x¯)(↵ + ✏ (^) i )

(n 1) 2 var(x) 2

X^ n

i=

(x (^) i x¯) 2 Var(↵,) (↵ + ✏ (^) i )

(n 1) 2 var(x) 2

X^ n

i=

(x (^) i ¯x) 2 2 =

(n 1)var(x)

Because the ✏ (^) i are independent, we can use the Pythagorean identity that the variance of the sum is the sum of the variances.

16.14. The confidence interval for the proportion yellow pod genes 1 p is (0. 2195 , 0 .3047). The proportion of yellow pod phenotype is (1 p) 2 and a 95% confidence interval has as its endpoints the square of these numbers - (0. 0482 , 0 .0928).

16.15. The critical value z (^0). 025 = 1. 96. For pˆ = 0. 468 and n = 1500, the number of successes is x = 702. The margin of error is

z (^0). 025

r pˆ(1 pˆ) n