








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of Bernoulli trials (BT) and how they can be used to calculate probabilities of various events. It covers the binomial probability distribution, the normal curve approximation, and the use of a website to calculate exact binomial probabilities. The document also discusses the limitations of the normal curve approximation and the importance of knowing the number of trials (n) and the probability of success (p) for accurate calculations.
What you will learn
Typology: Schemes and Mind Maps
1 / 14
This page cannot be seen from the preview
Don't miss anything!
In Chapter 1 we learned about i.i.d. trials. In this chapter, we study a very important special case of these, namely Bernoulli trials (BT). If each trial yields has exactly two possible outcomes, then we have BT. B/c this is so important, I will be a bit redundant and explicitly present the assumptions of BT.
The Assumptions of Bernoulli Trials. There are three:
When we are doing arithmetic, it will be convenient to represent S by the number 1 and F by the number 0. One reason that BT are so important, is that if we have BT, we can calculate probabilities of a great many events. Our first tool for calculation, of course, is the multiplication rule that we learned in Chapter 1. For example, suppose that we have n = 5 BT with p = 0. 70. The probability that the BT yield four successes followed by a failure is:
P (SSSSF ) = ppppq = (0.70)^4 (0.30) = 0. 0720.
Our next tool is extremely powerful and very useful in science. It is the binomial probability distribution. Suppose that we plan to perform/observe n BT. Let X denote the total number of successes in the n trials. The probability distribution of X is given by the following equation.
P (X = x) =
n! x!(n − x)!
pxqn−x, for x = 0, 1 ,... , n. (2.1)
To use this formula, recall that n! is read ‘n-factorial’ and is computed as follows.
1! = 1; 2! = 2(1) = 2; 3! = 3(2)(1) = 6, 4! = 4(3)(2)(1) = 24;
and so on. By special definition, 0! = 1. (Note to the extremely interested reader. If you want to see why Equation 2.1 is correct, go to the link to the Revised Student Study Guide on my webpage and read the More Mathematics sections of Chapters 2, 3 and 5.) I will do an extended example to illustrate the use of Equation 2.1. Suppose that n = 5 and p = 0. 60. I will obtain the probability distribution for X.
You should check the above computations to make sure you are comfortable using Equation 2.1. Here are some guidelines for this class. If n ≤ 8 , you should be able to evaluate Equation 2. ‘by hand’ as I have done above for n = 5. For n ≥ 9 , I recommend using a statistical software package on a computer or the website I describe later. For example, the probability distribution for X for n = 25 and p = 0. 50 is presented in Table 2.1. Equation 2.1 is called the binomial probability distribution with parameters n and p; it is denoted by Bin(n, p). With this notation, we see that my earlier ‘by hand’ effort was the Bin(5,0.60) and Table 2.1 is the Bin(25,0.50). Sadly, life is a bit more complicated than the above. In particular, a statistical software package should not be considered a panacea for the binomial. For example, if I direct my computer to calculate the Bin(n,0.50) for any n ≥ 1023 I get an error message; the computer program is smart enough to realize that it has messed up and its answer is wrong; hence, it does not give me an answer. Why does this happen? Well, consider the computation of P (X = 1000) for the Bin(2000,0.50). This involves a really huge number (2000!) divided by the square of a really huge number (1000!), and them multiplied by a really really small positive number ((0.50)^2000 ). Unless the computer programmer exhibits incredible care in writing the code, the result will be an overflow or an underflow or both.
Figure 2.4: The Bin(50, 0.1) Distribution.
Before we condemn the programmer for carelessness or bemoan the limits of the human mind, note the following. We do not need to evaluate Equation 2.1 for large n’s b/c there is a very easy way to obtain a good approximation to the exact answer. Figures 2.1–2.4 probability histograms for several binomial probability distributions. Here is how they are drawn. The method I am going to give you works only if the possible values of the random variable are equally spaced on the number line. This definition can be modified for other situations, but we won’t need to do this.
For a probability histogram (PH) the area of a rectangle equals the probability of its center value, b/c
Area = Base × Height = δ ×
P (X = x) δ
= P (X = x).
A PH allows us to ‘see’ a probability distribution. For example, for our pictures we can see that the binomial is symmetric for p = 0. 50 and not symmetric for p 6 = 0. 50. This is not an accident of
Remember π, the famous number from math which is the area of a circle with radius equal to 1. Another famous number from math is e, which is the limit as n goes to infinity of (1 + 1/n)n. As decimals, π = 3. 1416 and e = 2. 7813 , both approximations. If you want to learn more about π or e, go to Wikipedia. Let μ denote any real number, positive, zero or negative. Let σ denote any positive real number. In order to avoid really small type, when t is complicated we write et^ = exp(t). Consider the following function.
f (x) =
2 πσ
exp(
−(x − μ)^2 2 σ^2
), for all real numbers x (2.2)
The graph of the function f is called the normal curve with parameters μ and σ; it is pictured in Figure 2.5. By allowing μ and σ to vary, we generate the family of normal curves. We use the notation N(μ, σ) to designate the normal curve with parameters μ and σ. Here is a list of important properties of normal curves.
Statisticians often want to calculate areas under a normal curve. (We will see one reason why in the next section.) Fortunately, there exists a website that will do this for us. It is:
http://davidmlane.com/hyperstat/z_table.html
Our course webpage contains a link to this site:
Below are some examples of using this site. You should check to make sure you can do this and you will be asked to do similar things on homework.
Figure 2.5: The Normal Curve with Parameters μ and σ; i.e. the N(μ, σ) Curve.
1 /σ
2 /σ
3 /σ
4 /σ
5 /σ
μ − 3 σ μ − 2 σ μ^ −^ σ μ^ μ^ +^ σ^ μ + 2σ μ + 3σ
Suppose that X ∼ Bin(100,0.50). I want to calculate P (X ≥ 55). With the help of my computer, I know that this probability equals 0.1841. We will now approximate this probability using a normal curve. First, note that for this binomial μ = np = 100(0.50) = 50 and σ =
√ npq^ = 100(0.50)(0.50) = 5. We use N(50,5) to approximate the binomial; i.e. we pair the binomial with the normal curve with the same mean and standard deviation. Look at Figure 2.1, the PH for the Bin(100,0.50). The probability that we want is the area of the rectangle centered at 55 plus the area of all rectangles to the right of it. The rectangle centered at 55 actually begins at 54.5; thus, we want the sum of all the areas beginning at 54.5 and going to the right. This picture-based-conversion of 55 to 54.5 is called a continuity correction and it greatly improves the accuracy of the approximation. We proceed as follows:
P (X ≥ 55) = P (X ≥ 54 .5); this is the continuity correction.
good approximation to him. Books like to claim there are magic thresholds, but that is simply not true. I will try to discuss this issue in an intellectually honest manner. Look again at Figure 2.4, the PH for the Bin(50,0.1) distribution. This picture is strikingly asymetrical, so one suspects that the normal curve will not provide a good approximation. This observation directs us towards the situation in which the normal curve might provide a bad approximation, namely when p is close to 0 (or, by symmetry, close to 1). But for a fixed value of p, even one close to 0 or 1, the normal curve approximation improves for n large. A common magic threshold for this problem is the following:
If both np ≥ 15 and nq ≥ 15 then the normal approximation to the binomial is good.
In the next section we will see that there is a website that calculates exact binomial probabilities, so the above magic threshold does not cause any practical problems.
There is website that calculates binomial probabilities. Its address is
http://stattrek.com/Tables/Binomial.aspx#binomialnormal
It is linked to our webpage, reached by first clicking on ‘Calculators for Various Statistical Prob- lems’ and then ‘Binomial Probability Calculator.’ Below I illustrate how to use it. You should verify these results. To use the site, you must enter: p in the box ‘Probability of success on a single trial;’ n in the box ‘Number of trials;’ and your value of interest for the number of successes (more on this below) in the box ‘Number of successes (x).’ As output, the site gives you the values of:
Here is an example. If I enter 0.50, 100 and 55, I get:
Depending on our purpose, we might need to be algebraically clever to obtain our answer. For example, suppose that for X ∼ Bin(100,0.50) we want to determine P (43 ≤ X ≤ 55). Our website does not give us these ‘between probabilities,’ so we need to be clever. Write
P (43 ≤ X ≤ 55) = P (X ≤ 55) − P (X ≤ 42) = 0. 8644 − P (X ≤ 42),
from our earlier output for x = 55. To finish our quest, we need to enter the website with 0.5, 100 and 42. The result is P (X ≤ 42) = 0. 0666. Thus, our final answer is
P (43 ≤ X ≤ 55) = 0. 8644 − 0 .0666 = 0. 7978.
Buried in the description of the binomial calculator website is the following passage.
Note: When the number of trials is greater than 20,000, the Binomial Calculator uses a normal distribution to estimate the cumulative binomial probability. In most cases, this yields very good results.
I am skeptical about this. As stated earlier, the computer software package that I use—which has been on the market for over 35 years—does not work for n ≥ 1023 if p = 0. 5. Thus, I find it hard to believe that the website works at n =20,000. Let’s investigate this. Let’s take n = 16000 and p = 0. 5. This gives μ = np = 8000 and σ =
npq = 63. 246. Suppose I want to find P (X ≤ 8050). According to the website, this probability is 0.7877 and the normal curve approximation (details not shown) is also 0.7877. I am impressed. For theoretical math reasons, I trust the normal curve approximation and am quite impressed that the exact answer appears to be, well, correct. Thus, it appears that whoever programmed the website was careful about it. To summarize, it seems to me that you can trust this website provided n is 20,000 or smaller. As I will show you below, do not use it for larger values of n. As mentioned earlier, the normal curve approximation should not be trusted if np < 15. Let’s look at some examples.
In Chapter 4 we will learn a way to obtain good approximations for the binomial when n is large and either np or nq is small. For our work in Chapter 3 and later in these notes, I need to tell you about standardizing a random variable. Let X be any random variable with mean μ and standard deviation σ. Then the standardized version of X is denoted by Z and is given by the equation:
X − μ σ
Before I further discuss standardizing, let’s do a simple example. Suppose that X ∼ Bin(3,0.25). You can verify the following facts about X.
the total number of cards marked ‘0’ is denoted by f (for failure). The total number of cards in the box is denoted by N = s + f. Also, let p = s/N denote the proportion of the cards in the box marked ‘1’ and q = f /N denote the proportion of the cards in the box marked ‘0.’ For example, one could have: s = 60 and f = 40, giving N = 100, p = 0. 60 and q = 0. 40. Clearly, there is a great deal of redundancy in these five numbers; statisticians prefer to focus on N and p. Knowledge of this pair allows one to determine the other three numbers. We refer to a population box as Box(N,p) to denote a box with N cards, of which N × p cards are marked ‘1.’ Consider the CM: Select one card at random from Box(N, p). After operating this CM, place the selected card back into the population box. Repeat this process n times. This operation is referred to as selecting n cards at random with replacement. Viewing each selection as a trial, we can see that we have BT:
Thus, everything we have learned (the binomial sampling distribution) or will learn about BT is also true when one selects n cards at random with replacement from Box(N, p). Below is a two-part example to solidify these ideas.
P (X ≥ 3) = P (X = 3) + P (X = 4) + P (X = 5) =
5! 3!2!
What is the probability that Larry will correctly predict the winner? Solution: Using the website, the answer is 0.99977. For practice, I will obtain the an- swer with the normal curve approximation.√ First, μ = 601(0.571) = 343. 17 and σ = 343 .17(0.429) = 12. 13. Using the website, the normal curve approximation is 0.
We see above that if we sample at random with replacement from a finite population then we get BTs. But suppose that we sample at random without replacement, which, of course, seems more sensible. In this new case, we say that we have a (simple) random sample from the finite population. Another way to say this is the following. A sample of size n from a finite population of size N is called a (simple) random sample if, and only if, every subset of size n is equally likely to be selected. Here is a common error. Some people believe that you have a (simple) random sample if every member of the population has the same probability of being in the sample. This is wrong. Here is a simple example why. Suppose that a population consists of N = 10 members and we want a sample of size n = 2. For convenience, label the members a 1 , a 2 ,... , a 10. A systematic random sample is obtained as follows. Select one of the members a 1 , a 2 ,... , a 5 at random. Denoted the selected member by as. Then let the sample be as and as+5. Each member of the population has a 20% chance of being in the sample, but most possible subsets (40 out of 45) are impossible; hence, not a (simple) random sample. Now there are many situations in practice in which one might prefer a systematic random sample to a (simple) random sample (typically for reasons of ease in sampling). My point is not that one is better than the other, simply that they are different. Another popular way of sampling is the stratified random sample in which the researcher divides the population into two or more strata, say males and females, and then selects a (simple) random sample from each strata. The common feature of (simple) random samples, systematic random samples and stratified random samples is that they are probability samples. As such, they are particularly popular with scientists, statisticians and probabilists b/c they allow one to compute, in advance, probabilities of what will happen when the sample is selected. There are a number of important ways to sample that are not probability samples; the most important of these are: judgment sampling, convenience sampling and volunteer sampling. There are many examples of the huge biases that can occur with convenience or volunteer sampling, but judgment sampling, provided one has good judgment, can be quite useful. Sadly, the above topics are beyond the scope of these notes, primarily b/c they are mainly of interest in social science applications. In this course we will focus on (simple) random sampling. (Well, with one exception, clearly stated, much later, of stratified sampling.) As a result I will drop the adjective simple and refer to it as random sampling. Much of this chapter has been devoted to showing you how to compute probabilities when we have BTs. If, instead, we have a random sample, the formulas for computing probabilities are much messier and, in fact, cannot be used unless we know N exactly; and often researchers don’t know N. Here is an incredibly useful fact; it will be illustrated with an example in the lecture notes. Provided n ≤ 0. 05 N, the probability distribution of X, the total number of successes in a sample of size n for a random sample, can be well-approximated by the Bin(n, p). In words, sample either way, but when you calculate probabilities for X you may use the binomial.