






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The Chi-Square analysis, a statistical method used to test hypotheses about the distribution of observations in different categories. The analysis is particularly useful when dealing with categorical data, such as age groups or gender, and aims to determine if there is a significant difference between observed and expected frequencies. a step-by-step guide on how to perform the Chi-Square test, including calculating observed and expected frequencies, and assessing the obtained Chi-Square value in relation to critical values.
What you will learn
Typology: Lecture notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!
Chi Square Analysis
When do we use chi square?
More often than not in psychological research, we find ourselves collecting scores from participants. These data are usually continuous measures , and might be scores on a questionnaire or psychological scale, reaction time data or memory scores, for example. And when we have this kind of data, we will usually use it to look for mean differences on scores between or within groups (e.g. using t-tests or ANOVAs), or perhaps to look for relationships between different types of scores that we have collected (e.g. correlation, regression).
However sometimes we do not have this kind of data. Sometimes data will be a lot simpler than this, instead consisting only of frequency data. In these cases participants do not contribute scores for analysis; instead they each contribute to a “head count” within different grouping categories. This kind of data is known as categorical data , examples of which could be gender (male or female) or university degree classifications (1, 2:1, 2:2, 3, pass or fail) – or any other variable where each participant falls into one category. When the data we want to analyse is like this, a chi-square test, denoted χ², is usually the appropriate test to use.
What does a chi-square test do?
Chi-square is used to test hypotheses about the distribution of observations in different categories. The null hypothesis (Ho) is that the observed frequencies are the same as the expected frequencies (except for chance variation). If the observed and expected frequencies are the same, then χ² = 0. If the frequencies you observe are different from expected frequencies, the value of χ² goes up. The larger the value of χ², the more likely it is that the distributions are significantly different.
…but what does this mean in English?
To try and explain this a little better, let's think about a concrete example. Imagine that you were interested in the relationship between road traffic accidents and the age of the driver. We could randomly obtain records of 60 accidents from police archives, and see how many of the drivers fell into each of the following age- categories: 17-20, 21-30, 31-40, 41-50, 51-60 and over 60. If there is no relationship between accident-rate and age, then the drivers should be equally spread across the different age-bands (i.e. there should be similar numbers of drivers in each category). This would be the null hypothesis. However, if younger drivers are more likely to have accidents, then there would be a large number of accidents in the younger age-categories and a low number of accidents in the older age-categories.
So… say we actually collected this data, and found that out of 60 accidents, there were 25 individuals aged 17-20, 15 drivers aged 21-30 and 5 cases in each of the other age groups. This data would now make up our set of observed frequencies.
We might now ask: are these observed frequencies similar to what we might expect to find by chance, or is there some non-random pattern to them? In this particular case, from just looking at the frequencies it seems fairly obvious that a larger proportion of the accidents involved younger drivers. However, the question of whether this distribution could have just occurred by chance is yet to be answered. The Chi-Square test helps us to decide this by comparing our observed frequencies to the frequencies that we might expect to obtain purely by chance.
It is important to note at this point, that that Chi square is a very versatile statistic that crops up in lots of different circumstances. However, for the purposes of this handout we will only concentrate on two applications of it:
Chi-Square "Goodness of Fit" test: This is used when you have categorical data for one independent variable, and you want to see whether the distribution of your data is similar or different to that expected (i.e. you want to compare the observed distribution of the categories to a theoretical expected distribution).
Chi-Square Test of Association between two variables: This is appropriate to use when you have categorical data for two independent variables, and you want to see if there is an association between them.
Step One Take each observed frequency and subtract from it its associated expected frequency (i.e., work out (O-E) ):
25-10 = 15 15-10 = 5 5-10 = -5 5-10 = -5 5-10 = -5 5-10 = -
Step Two Square each value obtained in step 1 (i.e., work out (O-E)^2 ): 225 25 25 25 25 25
Step Three Divide each of the values obtained in step 2, by its associated expected frequency (i.e., work out (O-E)^2 ): E
225 = 22.5 25 = 2.5 25 = 2.5 25 = 2.5 25 = 2.5 25 = 2. 10 10 10 10 10 10
Step Four Add together all of the values obtained in step 3, to get your value of Chi-Square:
χ^2 = 22.5 + 2.5 + 2.5 + 2.5 + 2.5 + 2.5 = 35
Assessing the size of our obtained Chi-Square value:
What you do, in a nutshell...
(a) Work out how many "degrees of freedom" (d.f.) you have. (b) Decide on a probability level. (c) Find a table of "critical Chi-Square values" (in most statistics textbooks). (d) Establish the critical Chi-Square value for this particular test, and compare to your obtained value.
If your obtained Chi-Square value is bigger than the one in the table, then you conclude that your obtained Chi-Square value is too large to have arisen by chance ; it is more likely to stem from the fact that there were real differences between the observed and expected frequencies. In other words, contrary to our null hypothesis, the categories did not occur with similar frequencies.
If, on the other hand, your obtained Chi-Square value is smaller than the one in the table, you conclude that there is no reason to think that the observed pattern of frequencies is not due simply to chance (i.e., we retain our initial assumption that the discrepancies between the observed and expected frequencies are due merely
to random sampling variation, and hence we have no reason to believe that the categories did not occur with equal frequency).
For our worked example...
(a) First we work out our degrees of freedom. For the Goodness of Fit test, this is simply the number of categories minus one. As we have six categories, there are 6-1 = 5 degrees of freedom.
(b) Next we establish the probability level. In psychology, we use p < 0.05 as standard – and this is represented by the 5% column.
(c) We now need to consult a table of "critical values of Chi-Square". Here's an excerpt from a typical table:
(d) The values in each column are "critical" values of Chi-Square. These values would be expected to occur by chance with the probability shown at the top of the column. The relevant value for this test is found at the intersection of the appropriate d.f. row and probability column. As our obtained Chi-Square has 5 d.f., we are interested in the values in the 5 d.f. row. As the probability level is p <.05 , we then need to look in the 5% column (as .05 represents a chance level of 5 in 100… or 5%) to find the critical value for this statistical test. In this case, the critical value is 11..
Finally, we need to compare our obtained Chi-Square to the critical value. If the obtained Chi-Square is larger than a value in the table, it implies that it is unlikely to have occurred by chance. Our obtained value of 35 is much larger than the critical value of 11.07. We can therefore be relatively confident in concluding that our observed frequencies are significantly different from the frequencies that we would expect to obtain if all categories were equally distributed. In other words, age is related to the amount of road traffic accidents that occur.
Step 2 : Calculate expected numbers for each individual cell (i.e. the frequencies we would expect to obtain if there were no association between the two variables). You do this by multiplying row sum by column sum and dividing by total number.
Expected Frequency = Row Total x Column Total Grand Total
For example: using the first cell in table (Male/Pass);
19 x 20 = 9. 40
…and the cell below (Male/Fail):
21 x 20 = 10. 40
Do this for each cell in the table above.
Step 3: Now you should have an observed number and expected number for each cell. The observed number is the number already in 1st chart. The expected number is the number found in the last step (step 2). Redo the contingency table, this time adding in the expected frequencies in brackets below the obtained frequencies:
Male Female Pass 7 12 total= ( 9 .5 ) ( 9 .5) Fail 13 8 total= (10.5) (10.5) total=20 total=20 grand total =
Step 4 : Now calculate Chi Square using the same formula as before:
...or... χ² = Sum of (Observed - Expected)^2 Expected
Calculate this formula for each cell, one at a time. For example, cell #1 (Male/Pass):
Observed number is: 7 Expected number is: 9.
Plugging this into the formula, you have: (7 – 9.5)^2 = 0. 9.
Continue doing this for the rest of the cells.
Step 5 : Add together all the final numbers for each cell, obtained in Step 4. There are 4 total cells, so at the end you should be adding four numbers together for you final Chi Square number.
In this case, you should have:
0.6579 + 0.6579 + 0.5952 + 0.5952 = 2.
So, 0.095 is our obtained value of Chi-Square; it is a single-number summary of the discrepancy between our obtained frequencies, and the frequencies which we would expect if there was no association between our two variables. The bigger this number, the greater the difference between the observed and expected frequencies.
Step 6: Calculate degrees of freedom ( df ):
(Number of Rows – 1) x (Number of Columns – 1) (2 – 1) x (2 – 1) 1 x 1 =1 df (degrees of freedom)
Assessing the size of our obtained Chi-Square value:
The procedure here is the same as for the Goodness of Fit test. We just need:
(a) Our "degrees of freedom" (d.f.) (b) A suitable probability level (p=0.05 in psychology) (c) A table of "critical Chi-Square values" (d) Establish the critical Chi-Square value for this test (at the intersection of the appropriate d.f. row and probability column), and compare to your obtained value.
As before, if the obtained Chi-Square value is bigger than the one in the table, then you conclude that your obtained Chi-Square value is too large to have arisen by chance. This would mean that the two variables are likely to be related in some way. NB - the Chi-Square test merely tells you that there is some relationship between the two variables in question: it does not tell you what that relationship is, and most
Assumptions of the Chi-Square test
For the results of a Chi-Square test to be reliable, the following assumptions must hold true: