
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Paired-samples t tests compare scores on two different variables but for the same group of cases; independent-samples t tests compare scores on the same ...
Typology: Study Guides, Projects, Research
1 / 24
This page cannot be seen from the preview
Don't miss anything!
315
In this chapter, you can learn
This chapter will answer the following questions about 1980 GSS young adults:
316 PART III :: INFERENTIAL STATISTICS
Two types of hypothesis tests are introduced in this chapter. Both check to see if a difference between two means is significant. Paired-samples t tests compare scores on two different variables but for the same group of cases; independent-samples t tests compare scores on the same variable but for two different groups of cases. The chapter also discusses three hypothesis-testing related topics: types of errors, substantive importance, and significantly different but overlapping distributions.
Here are some research hypotheses that can be tested using paired-samples t tests:
To duplicate the examples in this chapter, use the fourGroups.sav data set. Before doing the first three examples, set the select cases condition to GROUP = 1. For the final example, set the select cases condition to “all cases.”
318 PART III :: INFERENTIAL STATISTICS The corresponding null hypothesis is “the average willingness of 1980 young adults to let controver- sial persons give public speeches was the same or less than their willingness to let them teach col- lege” ( OKSPEECH ≤ OKTEACH). These hypotheses are claims about the means on two variables (OKSPEECH and OKTEACH) for one population (1980 young adults). The sample of OKSPEECH scores and the sample of OKTEACH scores are paired samples because the two samples consist of the same persons. This is a one-tailed hypothesis test since the difference between the means must be sufficiently large and in a particular direction (greater tolerance for speeches than for college teaching) to reject the null hypothesis. Step 2: The SPSS Paired-Samples T Test procedure provides both the sample means and, should it be needed, the significance level. Pull down the Analyze menu, move the cursor over Compare Means, and click on Paired-Samples T Test. Analyze | Compare Means | Paired-Samples T Test A dialog similar to Figure 13.1 appears. Select the two variables whose means will be compared and move them into the “Paired Variables” list. When you select your first variable, it becomes variable1 for pair 1. When you select your second variable, it becomes variable2 for pair 1. Fields open up to specify a second pair, but we will test just one pair at a time. Pay attention to level of measurement! Since the mean for each variable will be calculated, both variables must be interval/ratio. Once you have moved the pair of variables into the “Paired Variables” list, click OK. Figure 13.2 shows the resulting output. Figure 13.1 Dialog to Produce a Paired-Samples t Test
Chapter 13 :: Paired- and Independent-Samples t Tests 319 Figure 13.2 Paired-Samples T Test Output for OKSPEECH and OKTEACH for 1980 GSS Young Adults The “Paired Samples Statistics” shows the mean for each variable. These means are based on the 293 persons in the data set with valid scores on both variables. Cases without valid scores on one or both variables are dropped from the analysis. In the sample, the average tolerance for giving a public speech is greater than for teaching col- lege. This is consistent with the research hypothesis. So, we proceed to Step 3. Step 3: Skip over the second box of output, the “Paired Samples Correlations” box. The informa- tion in that box is not relevant for a hypothesis about means. The “Paired Samples Correlations” box is testing a hypothesis about the correlation between the two variables. The significance level you see in this second box is not the one we want. The significance level we are looking for is in the “Paired Samples Test” box. This is a single long box as it appears on a computer screen but has been divided into two parts for Figure 13.2. The significance level appears at the extreme right of the box. It is a two-tailed significance. Since we are testing a one-tailed hypothesis, the two-tailed significance must be divided in half, but, of course, .000/2 still is .000. Before going on to Step 4, however, a few words should be said about the other information in the “Paired Samples Test” box. The degrees of freedom for the hypothesis test are to the left
Chapter 13 :: Paired- and Independent-Samples t Tests 321 Figure 13.3 Paired-Samples T Test Output for PAEDUC and MAEDUC for 1980 GSS Young Adults
Back in Chapter 4, the mean for MAEDUC was reported to be 11.41 years, and the mean for PAEDUC was 11.59 years. These are not the means appearing in Figure 13.3. What is going on? The difference results from how cases with missing data are handled. The Descriptives procedure back in Chapter 4 by default used every valid case to calculate the statistics for each variable. All 301 cases with valid data on MAEDUC were used to calculate its mean, and all 256 cases with valid data on PAEDUC were used to calculate its mean. The Paired-Samples T Test procedure, however, can only use cases that have valid values on both of the variables being compared. Only 246 cases in the data set had valid information on both MAEDUC and PAEDUC, and those are the cases being used for this procedure. Dropping the cases with valid data on just one of the two variables accounts for the shift in means.
322 PART III :: INFERENTIAL STATISTICS
Without looking back, can you answer the following questions:
With a little bit of extra work, every paired-samples t test can be reduced to a one-sample t test. It was pointed out that SPSS actually computes a new variable, a difference score, whenever it does a paired-samples t test. It then tests the null hypothesis using this new variable. Well, we could do the same thing. Our first example of paired-samples t testing had as a research hypothesis that the average willingness of 1980 young adults to let controversial per- sons give public speeches was greater than their willingness to let them teach college. We could have calculated a new variable using the Compute procedure. It might be named DIFF and calculated as OKSPEECH minus OKTEACH. In terms of this new variable, our research hypothesis would be (^) DIFF > 0 and our null hypothesis would be (^) DIFF ≤ 0. Testing this with a one-sample t test would produce values for the t statistic, degrees of freedom, and signifi- cance level that are identical to the values produced with the paired-samples t test procedure. Using the paired-samples t test procedure simply saves the effort of computing a new variable!
Like paired-samples t tests, independent-samples t tests also test hypotheses about differences between two means; however, the means are for the same variable but for two different populations. The following research hypotheses would use independent-samples t tests. (In the mathematical
324 PART III :: INFERENTIAL STATISTICS The research hypothesis “husbands average more hours of sleep per night than wives,” however, provides no evidence that the husbands and wives are necessarily married to one another. Someone took a sample from the adult population. Everyone in the sample was asked how many hours per day they typically sleep. Some of the persons in the sample were married men; some were married women. The married men in the sample can be viewed as a random sample of all married men in the population. The married women in the sample can be viewed as a random sample of all married women in the population. There is no basis for linking up scores on SLEEP reported by individual husbands with scores on SLEEP reported by individual wives. The situation calls for independent- samples t testing. Unless there is clear evidence that there is a basis for pairing up individual scores, assume you have independent samples. By the way, there may have been others in the sample besides married men and married women, for example, persons who have never married or persons who are divorced or widowed. Since independent-samples t tests only permit comparisons of two groups, these other cases would be excluded from the analysis.
“Was the number of hours worked per week by employed 1980 GSS young adults significantly higher for males than for females?” Step 1: The research hypothesis for this question is “the average number of hours worked per week by employed 1980 young adults was higher for males than for females” ( HOURS (^) males > HOURSfemales). The null hypothesis is “the average number of hours worked per week by employed 1980 young adults was the same or lower for males than for females” ( HOURS (^) males ≤ HOURSfemales). These hypotheses require an independent-samples t test. They are claims about the mean on one variable (HOURS) for two populations (male 1980 young adults and female 1980 young adults). No particular reason exists for pairing up particular male answers with particular female answers. This is a one-tail hypothesis test since only sample results in which the male average is higher than the female average could lead to the rejection of the null hypothesis. Step 2: To do an Independent-Samples T Test procedure, pull down the Analyze menu, move the cursor over Compare Means, and click on Independent-Samples T Test. Analyze | Compare Means | Independent-Samples T Test Figure 13.4 shows the dialog that appears. Move the variable on whose mean the groups will be compared into the “Test Variable” list. Since means will be calculated for this variable, it must be interval/ratio. Next, the two groups to be com- pared must be identified. Select the variable whose attributes will define the two groups and move that variable into the “Grouping Variable” field. The grouping variable can be any level of measure- ment. As soon as a variable is moved into that field, a set of parentheses containing two question marks appears after the name of the variable. SPSS wants to know which codes on this variable
Chapter 13 :: Paired- and Independent-Samples t Tests 325 identify the two groups to be compared. Even if the variable is a dichotomy and has just two codes, those codes must be specified. Click on “Define Groups.” (The button only becomes active once a grouping variable has been specified.) The dialog in Figure 13.5 appears. The two groups can be defined specifying exact values or by designating a cut point. When specifying exact values, enter in the “Group 1” field the value that cases must have to be in the first group, and enter in the “Group 2” field the value that cases must have to be in the second group. For this example, 0 was entered for Group 1 since on the variable SEX, males are coded 0, and 1 was entered for Group 2 since females are coded 1. Designating a cut point will also define two groups to be compared. If the grouping variable was years of schooling, for example, and 13 was entered as the cut point, the first group would include all valid cases with less than 13 years of schooling, and the second group would include all valid cases with 13 or more years of schooling. After defining the groups, click “Continue” to return to the initial independent-samples t test dialog. The question marks behind the grouping variable are now replaced by the codes that define the groups. Click OK to produce output similar to Figure 13.6. The first box of output, the “Group Statistics” box, shows the sample means. The mean number of hours worked reported by employed males was 43.66, and the mean reported by employed females was 39.76. The sample results are consistent with the research hypothesis that predicted the mean for males would be higher than the mean for females. We proceed to Step 3. Figure 13.4 Dialog to Produce an Independent-Samples t Test
Chapter 13 :: Paired- and Independent-Samples t Tests 327 Step 3: This step uses the “Independent Samples Test” output box, which appears on a computer screen as a single long box but has been broken in two parts to better fit in Figure 13.6. But this box contains two lines of results for the independent-samples t test. There are two values for the t statis- tic, for degrees of freedom, and for the significance level. Step 3 for the independent-samples t tests includes an additional decision because you must decide which result line is the correct one to use. The reason for two sets of results is that there are two ways of estimating the standard error of the sampling distribution for independent-samples t tests. One method assumes that the variance in the male population on number of hours worked is exactly equal to the variance in the female population on number of hours worked. The other method does not assume the population vari- ances are equal. Whether the variances in the two populations are the same or different affects the calculation of the standard error, the value of the t statistic, the degrees of freedom, and the prob- ability or significance level. To reach a conclusion about whether the variances are or are not equal, an entirely separate hypothesis test is done with the null hypothesis that the variances are equal. What appears in the output under the heading “Levene’s Test for Equality of Variances” are the results of that separate hypothesis test. If the significance level (“Sig.”) of Levene’s test is .05 or less, use the second line of t test results, the “Equal variances not assumed” line. If the significance level of Levene’s test is greater than .05, use the first line of t test results, the “Equal variances assumed” line. Sometimes the two methods of estimating the standard error give you almost identical results (as they do in Figure 13.6), and sometimes they do not. Always check the significance of Levene’s test and then use the correct line of t test results. In Figure 13.6, the significance level of Levene’s test is .668. Since this is greater than .05, the first line of results is used. The two-tailed significance level for the t test for equality of means is .041. Since this is a one-tailed hypothesis, we want the one-tailed significance level, which is .041/ or .0205. (Make sure you leave Step 3 with the significance level for the t test for equality of means. Do not leave Step 3 with the significance for Levene’s test. Once the significance of Levene’s test tells you which row of output to use, you are done with it.) How degrees of freedom are calculated for independent-samples t tests depends on whether equal variances are or are not assumed. When equal variances are assumed, degrees of freedom are simply N minus 2. A more complex formula is used when equal variances are not assumed. The “Independent Samples Test” box also shows you the mean difference, the standard error of the difference, and the 95% confidence interval of the difference. The mean difference is always calculated by subtracting the sample mean for Group 2 from the sample mean for Group 1. If Group 1 has the higher mean, the mean difference will be positive; if Group 2 has the higher mean, the mean difference will be negative. In the sample data, employed males report an average of 3. more hours of work per week than employed females. Step 4: Since the probability is .0205, we reject the null hypothesis. Among the 1980 GSS young adults, employed males did work significantly more hours per week than employed females.
“Was there a significant difference between the employed 1980 GSS young adults and the employed 1980 GSS middle-age adults in the number of hours worked per week?”
328 PART III :: INFERENTIAL STATISTICS The null hypothesis is “the average number of hours worked per week by employed 1980 young adults and by employed 1980 middle-age adults is the same” ( (^) HOURS 1980 young adults = HOURS 1980 middle-age adults). The research hypothesis is “the average number of hours worked per week by employed 1980 young adults and by employed 1980 middle-age adults is different” ( (^) HOURS 1980 young adults ≠ HOURS 1980 middle-age adults). Hypotheses about a difference in means for one variable (HOURS) in two populations ( young adults and 1980 middle-age adults) call for an independent-samples t test. This is a two- tailed hypothesis test because a sample difference in either direction could lead to rejecting the null hypothesis. For setting up the independent-samples t test, the test variable is HOURS and the grouping variable is GROUP. Codes 1 (1980 GSS young adults) and 2 (1980 GSS middle-age adults) define the comparison groups. Cases with codes other than 1 or 2 on GROUP are excluded from the analysis. (Since GROUP is the variable used to identify the comparison groups, Select Cases was set to “All cases.”) The output appears in Figure 13.7. Figure 13.7 Independent-Samples T Test Output for HOURS by GROUP (1, 2)
330 PART III :: INFERENTIAL STATISTICS know what row and what column you are in before you could determine if you made an error or not. But in reality, a researcher never knows what column she is in. To know if the research hypoth- esis is actually true or actually false, you would have to do a census, which the researcher has not done. She is working with a sample and, based on that sample, rejects or does not reject the null hypothesis. In other words, she knows which row in Figure 13.8 she is in but not which column. When she rejects the null hypothesis, she knows she is either making a correct decision or commit- ting a Type I error. When she does not reject a null hypothesis, she knows she is either making a correct decision or committing a Type II error. While it is true that when you reject the null hypothesis, you do not know if you are making a correct decision or committing a Type I error, you do know the risk you are taking of making a Type I error. The probability of making a Type I error when rejecting the null hypothesis is equal to the significance level that comes from Step 3 of hypothesis testing. The smaller the significance level is, the less likely that you are rejecting a null hypothesis that is actually true, which also means the less likely that you are committing a Type I error and the greater the likelihood you are making a correct decision. As noted earlier, science is particularly concerned about rejecting null hypotheses that are actually true. In other words, it is very reluctant to commit Type I errors. That is why the greatest risk we will take of committing a Type I error is .05. But here comes the dilemma! The greater the evidence we require before rejecting a null hypothesis, the greater the risk we run of committing a Type II error. Both are errors, and both are to be avoided, but rejecting a true null hypothesis (Type I error) is considered a graver problem than not rejecting a false null hypothesis (Type II error). Better to pro- ceed slowly but correctly than rapidly but uncertainly. Null hypotheses not rejected now can always be rejected later if the evidence becomes stronger, but null hypotheses rejected now rarely get fur- ther examined. Whatever decision you make at Step 4, you may be making a correct decision or an error. The significance level indicates the risk of making a Type I error if the null hypothesis is rejected. Usually, social scientists refuse to accept more than a 5% chance of making a Type I error. Figure 13.8 Hypothesis-Testing Conclusions by Actual Population Conditions Based on hypothesis testing using sample results, the researcher If the researcher did a census, he or she would find The null hypothesis is true The null hypothesis is false Rejected the null hypothesis Type I error Correct decision Did not reject the null hypothesis Correct decision Type II error
Chapter 13 :: Paired- and Independent-Samples t Tests 331
Rules for the admissibility of evidence in American courts are stringent. Furthermore, jurors are instructed to render a guilty verdict only if they are confident “beyond a reasonable doubt.” The system is designed to safeguard against convicting innocent persons. That is the good news. But those same stringent rules of evidence and the requirement that the evidence against a defen- dant be overwhelming also mean that more guilty persons will get off. That is the bad news. Just like in the system of justice, so also in inferential hypothesis testing, the greater the effort you make to avoid one type of error, the greater the probability of making the other type of error. In the American justice system, it is considered a more serious error to convict an inno- cent person than to not convict a guilty one. That is why the rules for evidence are so strict and the requirements for rejecting the assumption of innocence so high. Just like in the justice system, the two types of error are not considered equally bad in inferential hypoth- esis testing. The graver error is to reject a null hypothesis that is actually true, and to avoid that error, analysts are willing to take on an increased risk of not rejecting a null hypothesis that is actually false.
When testing a hypothesis, data analysts can tell you if the results are statistically significant. When an analyst says those results are statistically significant, he is saying that he rejects the null hypoth- esis; he is saying the results support the research hypothesis. He is not saying the results are sub- stantively important. In the first example of independent-samples t testing, we found that the average number of hours worked per week by employed 1980 GSS young adult males (43.66 hours) was significantly higher than the average hours worked per week by employed 1980 GSS young adult females (39. hours)—a difference of 3.90 hours. The difference is statistically significant, but is the difference important? Will a difference that size affect how important work is in a person’s definition of self? Will a difference that size impact the allocation of responsibilities within the family? Does a differ- ence that size represent prima facie evidence of workplace discrimination? If all the analyst knows is that the difference is statistically significant, then the proper answer to these questions is “I do not know.” When you ask about importance, you are asking a substantive question, not a statistical question. The data analyst can tell you with a specified degree of confi- dence what the difference is between men and women on hours worked per week, but the analyst cannot tell you whether that difference matters. So, is the data analyst unimportant? Hardly! Unless statistical significance is established, it remains unclear whether the observed sample differences actually exist in the population. Before someone starts talking about the consequences of gender differences in hours worked, he wants to be very confident that those differences aren’t just sampling error. To find that out, he needs
Chapter 13 :: Paired- and Independent-Samples t Tests 333 0 0 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 55 60 Frequency Percent Number of Hours Worked Last Week Respondent’s Sex: male 0 0 10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 35 40 45 50 55 60 Frequency Percent Number of Hours Worked Last Week Respondent’s Sex: male Figure 13.9 Separate Male and Female Histograms for HOURS Worked by 1980 GSS Young Adults
334 PART III :: INFERENTIAL STATISTICS
The chapter began with some questions about 1980 GSS young adults. On the basis of our analyses, what do we now know?
central tendency dispersion independent samples Independent Samples T Test procedure paired samples Paired-Samples T Test procedure statistical significance substantive importance Type I error Type II error