Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics test analysis and study, Lecture notes of Statistics

Test for statistical analysis for preparing for subject little guide

Typology: Lecture notes

2019/2020

Uploaded on 02/19/2020

oliver-sen
oliver-sen 🇮🇳

2 documents

1 / 43

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical tests
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b

Partial preview of the text

Download Statistics test analysis and study and more Lecture notes Statistics in PDF only on Docsity!

Statistical tests

Steps

**1. Make an initial appraisal of your data (Data types and initial appraisal)

  1. Select the type of test you require based on the question you are asking (see Categories)
  2. Select the actual test you need to use from the appropriate key
  3. Determine any preliminary tests you need to carry out prior to performing the statistical test
  4. If your data are suitable for the test chosen based on the results from 4 proceed to the test
  5. If your data do not meet the demands of the chosen test go back to 3 and choose the non- parametric equivalent.
  6. It may be that your data are still not suitable in which case you need to search wider than this web site or get more data or discard them (one of the problems you may face if you have not planned properly)**

Chi-Square Test

  • All chi-squared tests are concerned with counts of things (frequencies) that you can put into categories. For example, you might be investigating flower colour and have frequencies of red flowers and white flowers. Or you might be investigating human health and have frequencies of smokers and non-smokers.
  • The test looks at the frequencies you obtained and compares them with the frequencies you might expect given your null hypothesis. The null hypothesis is this: There is no significant difference between the observed and expected frequencies

The Chi-square Distribution

  • Before discussing the unfortunately-named "chi-square" test, it's

necessary to talk about the actual chi-square distribution. The chi-square

distribution, itself, is based on a complicated mathematical formula. There

are many other distributions used by statisticians (for example, F and t )

that are also based on complicated mathematical formulas. Fortunately,

this is not our problem. Plenty of people have already done the relevant

calculations, and computers can do them very quickly today.

When we perform a statistical test using a test statistic , we make the

assumption that the test statistic follows a known probability distribution.

We somehow compare our observed and expected results, summarize

these comparisons in a single test statistic, and compare the value of the

test statistic to its supposed underlying distribution. Good test statistics

are easy to calculate and closely follow a known distribution. The various

chi-square tests (and the related G - tests) assume that the test statistic

follows the chi-square distribution.

The Chi-square Distribution

  • When we perform a statistical test, we refer to this probability of "mistakenly rejecting our hypothesis" as "alpha." Usually, we equate alpha with a p - value. Thus, using the numbers from before, we would say p =0.0863 for a chi-square value of 4.901 and 2 d.f. We would not reject our hypothesis, since p is greater than 0.05 (that is, p >0.05). You should note that many statistical packages for computers can calculate exact p - values for chi- square distributed test statistics. However, it is common for people to simply refer to chi-square tables. Consider the table below:
  • The first column lists degrees of freedom. The top row shows the p - value in question. The cells of the table give the critical value of chi-square for a given p - value and a given number of degrees of freedom. Thus, the critical value of chi-square for p =0.05 with 2 d.f. is 5.991. Earlier, remember, we considered a value of 4.901. Notice that this is less than 5.991, and that critical values of chi-square increase as p - values decrease. Even without a computer, then, we could safely say that for a chi- square value of 4.901 with 2 d.f., 0.05< p <0.10. That's because, for the row corresponding to 2 d.f., 4.901 falls between 4.605 and 5.991 (the critical values for p =0.10 and p =0.05, respectively).

A Simple Goodness-of-fit Chi-square Test

  • Consider the following coin-toss experiment.

We flip a coin 20 times, getting 12 "heads"

and 8 "tails." Using the binomial distribution,

we can calculate the exact probability of

getting 12H/8T and any of the other possible

outcomes. Remember, for the binomial

distribution, we must define k (the number of

successes), N (the number of Bernoulli trials)

and p (the probability of success). Here, N is

20 and p is 0.5 (if our hypothesis is that the

coin is "fair"). The following table shows the

exact probability ( p ( k | pN ) for all possible

outcomes of the experiment. The probability

of 12 heads/8 tails is highlighted.

A Simple Goodness-of-fit Chi-square Test

  • Using the Sum Rule, we get a p - value of 0.50344467. Following the

convention of failing to reject a hypothesis if p >0.05, we fail to reject the

hypothesis that the coin is fair.

It happens that doing this type of calculation, while tedious, can be

accomplished pretty easily -- especially if we know how to use a

spreadsheet program. However, we run into practical problems once the

numbers start to get large. We may find ourselves having to calculate

hundreds or thousands of individual binomial probabilities. Consider

testing the same hypothesis by flipping the coin 10,000 times. What is the

exact probability, based on the binomial distribution, of getting 4,

heads/5,135 tails or any outcome as far or farther from 5,000 heads/5,

tails? You should recognize that you'll be adding 9,732 individual

probabilities to get the p - value. You will also find that getting those

probabilities in the first place is often impossible. Try calculating 10,000!

(1 x 2 x 3 x ... x 9,998 x 9,999 x 10,000).

  • As sample size gets large, we can substitute a simple test statistic that

follows the chi-square distribution. Even with small sample sizes (like the

20 coin flips we used to test the hypothesis that the coin was fair), the chi-

square goodness-of-fit test works pretty well. The test statistic usually

referred to as "chi-square" (unfortunately, in my opinion) is calculated by

comparing observed results to expected results. The calculation is

straightforward. For each possible outcome, we first subtract the expected

number from the observed number. Note: we do not subtract

percentages, but the actual numbers! This is very important. After we do

this, we square the result (that is, multiply it by itself). Then we divide this

result by the expected number. We sum these values across all possible

outcome classes to calculate the chi-square test statistic.

The formula for the test statistic is basically this:

  • Here's the earlier table, with two columns added so we can calculate the chi-square test statistic. One is for our observed data, the other for the calculation.
  • Notice that the totals for observed and expected numbers are the same

(both are 20). If you ever do this test and the columns do not add up to

the same total, you have done something wrong!

In this case, the sum of the last column is 0.8. For this type of test, the

number of degrees of freedom is simply the number of outcome classes

minus one. Since we have two outcome classes ("heads" and "tails"), we

have 1 degree of freedom. Going to the chi-square table, we look in the

row for 1 d.f. to see where the value 0.8 lies. It lies between 0.455 and

2.706. Therefore, we would say that 0.1< p <0.5. If we were to calculate the

p - value exactly, using a computer, we would say p =0.371. So the chi-

square test doesn't give us exactly the right answer. However, as sample

sizes increase, it does a better and better job. Also, p - values of 0.371 and

0.503 aren't qualitatively very different. In neither case would we be

inclined to reject our hypothesis.

TEST YOUR UNDERSTANDING

  • There are 110 houses in a particular neighborhood. Liberals live in 25 of them, moderates in 55 of them, and conservatives in the remaining 30. An airplane carrying 65 lb. sacks of flour passes over the neighborhood. For some reason, 20 sacks fall from the plane, each miraculously slamming through the roof of a different house. None hit the yards or the street, or land in trees, or anything like that. Each one slams through a roof. Anyway, 2 slam through a liberal roof, 15 slam through a moderate roof, and 3 slam through a conservative roof. Should we reject the hypothesis that the sacks of flour hit houses at random?

SOLUTION

Independent Assortment of Genes

  • The standard approach to testing for independent assortment of genes involves crossing individuals heterozygous for each gene with individuals homozygous recessive for both genes ( i.e. , a two-point testcross). Consider an individual with the AaBb genotype. Regardless of linkage, we expect half of the gametes to have the A allele and half the a allele. Similarly, we expect half to have the B allele and half the b allele. These expectations are drawn from Mendel's First Law: that alleles in heterozygotes segregate equally into gametes. If the alleles are independently assorting (and equally segregating), we expect 25% of the offspring to have each of the gametic types: AB , Ab , aB and ab. Therefore, since only recessive alleles are provided in the gametes from the homozygous recessive parent, we expect 25% of the offspring to have each of the four possible phenotypes. If the genes are not independently assorting, we expect the parental allele combinations to stay together more than 50% of the time. Thus, if the heterozygote has the AB/ab genotype, we expect more than 50% of the gametes to be AB or ab (parental), and we expect fewer than 50% to be Ab or aB (recombinant). Alternatively, if the heterozygote has the Ab/aB genotype, we expect the opposite: more than 50% Ab or aB and less than 50% AB or ab. The old-fashioned way to test for independent assortment by the two-point testcross involves two steps. First, one determines that there are more parental offspring than recombinant offspring. While it's possible to see the opposite (more recombinant than parental), this can not be explained by linkage; the simplest explanation would be selection favoring the recombinants. The second step is to determine if there are significantly more parental than recombinant offspring, since some deviation from expectations is always expected. If the testcross produced N offspring, one would expect 25% x N of each phenotype. The chi-square test would be performed as before.

Independent Assortment of Genes

  • However, there is a minor flaw with this statistical test. It assumes

equal segregation of alleles. That is, it assumes that the A allele is

found in exactly 50% of the offspring, and it assumes that the B

allele is found in exactly 50% of the offspring. However, deviations

from 25% of each phenotype could arise because the alleles are not

represented equally. As an extreme example, consider 100 testcross

offspring, where 1/5 have the lower-case allele of each gene. If the

genes are independently assorting, we would actually expect the

phenotypes in the following frequencies: 1/25 ab , 4/25 aB , 4/25 Ab

and 16/25 AB. Let's say that we observed exactly 25 of each

phenotype. If we did the chi-square test assuming equal

segregation, we would set up the following table: