Chapter 8
Module 2For self-study: Running tests
Assignment 8.2Assignment 2
Read through the data file for MovieZone first to familiarize yourself with the variables. Do the following calculations. In your report please only include the essential part of the SPSS output. Moreover, describe in words (full sentences) what you have found and how to interpret the results. Do realize, however, that sometimes there is not much interpretation required. We are still dealing with basic statistical procedures that help you describe the data.
Calculate the (a) mean, (b) median, (c) mode, (d) range and (e) standard deviation for the variable age (see section 8.7 of chapter 8 for instructions). Using the definitions of these terms in Chapter 8 to describe your findings. Answer the following questions: (f) what would be the use of describing this part of your data, and (g) what is the advantage of the different measures of central tendency?
Find out what the distribution of male and female subjects is in this data set.
Imagine we want one measure for subjects’ evaluation of ckv-education as opposed to the four we have now (ckv1, ckv2, ckv3 and ckv4). How to go about doing this is discussed in Chapter 7. However, it is important to first see whether it makes sense bringing these variables together in one new variable (see section 8.9 of the chapter). Conduct a reliability analysis. Describe the results, and interpret them.
We want to have a general measure for subjects’ interest in art house movies. The last four questions in the questionnaire were intended to obtain information about subjects’ attitudes towards these movies (arthouse, filmint, visit, curious). Conduct a reliability analysis on these four variables. Describe and explain your findings. What conclusion can you draw about the next step in your analyses?
Examine the relation between gender on the one hand and subjects’ enjoyment of museum visits (museum1) on the other. Describe the results using the section on Boxplots in Chapter 8.
Examine the relation between subjects’ gender and their interest in art house movies (filmint). Again describe the SPSS-output using Chapter 8.
Hoping all went well, we expect that you did the following:
This has resulted in the following information about the variable age:
| Statistics | ||
|---|---|---|
| AGE | ||
| N | Valid | 675 |
| Missing | 5 | |
| Mean | 16,0667 | |
| Median | 16,0000 | |
| Mode | 16,00 | |
| Std. Deviation | ,80797 | |
As you can see there are 5 participants for who we do not have information about their age (Missing). In total we have 675 participants who did enter their age. The mean is 16.1, the median is 16, and the mode is 16. What we can conclude from this is that there are few strong differences within the group. In other words, it is rather homogeneous. This is confirmed by the Standard Deviation, which is rather small.
If you followed the instruction you also had the following table in your output:
| AGE | |||||
|---|---|---|---|---|---|
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | 14,00 | 2 | ,3 | ,3 | ,3 |
| 15,00 | 161 | 23,7 | 23,9 | 24,1 | |
| 16,00 | 333 | 49,0 | 49,3 | 73,5 | |
| 17,00 | 149 | 21,9 | 22,1 | 95,6 | |
| 18,00 | 29 | 4,3 | 4,3 | 99,9 | |
| 19,00 | 1 | ,1 | ,1 | 100,0 | |
| Total | 675 | 99,3 | 100,0 | ||
| Missing | System | 5 | ,7 | ||
| Total | 680 | 100,0 | |||
As you can see there were two relatively young participants of 14 years old and one participant of 19 years old. The distribution of the variable age seems to be normal, approximately: look at the column marked percent (third from the left) and see how it develops as you move down.
1.2. You were asked to find out what the distribution of male and female subjects is in this data set. In case you did well, this is probably the first table you saw in your output.
| Statistics | ||
|---|---|---|
| GENDER | ||
| N | Valid | 674 |
| Missing | 6 | |
We advise you not to include tables like these in your report. However, it does contain information that you need to include in your text itself: there are six cases missing, which means that six participants did not answer this question. When you see this in your own analysis and the variable that you describe is important you may want to check which cases are incomplete! Try this: click right with your mouse on the field marked gender. Chose ascending or descending from the menu that appears. This will enable you to locate the cases without information about gender. The case numbers will (hopefully) refer to the questionnaire numbers… if you had these available, you could now easily check in the pile of questionnaires whether the participants did not answer the question, or maybe the person who entered the data made a mistake!
| GENDER | |||||
|---|---|---|---|---|---|
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | male | 312 | 45,9 | 46,3 | 46,3 |
| female | 362 | 53,2 | 53,7 | 100,0 | |
| Total | 674 | 99,1 | 100,0 | ||
| Missing | System | 6 | ,9 | ||
| Total | 680 | 100,0 | |||
Above you see the table that you needed. As you can see 312 male and 362 female students participated in this study. For all reports in our field this is important information to include in your text or/and in a table.
If you conducted the test properly, this is what you will see in your output:
| Statistics for | Mean | Variance | Std Dev | N of Variables |
|---|---|---|---|---|
| SCALE | 11,2144 | 14,8681 | 3,8559 | 4 |
| Item-total Statistics | ||||
| Scale Mean if Item Deleted | Scale Variance if Item Deleted | Corrected Item-Total Correlation | Alpha if Item Deleted | |
| CKV1 | 7,9525 | 8,5944 | ,6945 | ,8493 |
| CKV2 | 8,6309 | 9,0798 | ,7058 | ,8437 |
| CKV3 | 8,3017 | 8,3644 | ,7972 | ,8066 |
| CKV4 | 8,7580 | 8,8463 | ,7089 | ,8423 |
| Reliability Coefficients | ||||
| N of Cases = 653,0 | N of Items = 4 | |||
| Alpha = ,8716 | ||||
Obviously the alpha indicates here that we have a reliable scale. Looking at the last column, what does this table tell us about the individual items? Would it be useful to exclude one of these items? In each case it would decrease the value of alpha, so it seems a good idea to use all these variables in a new scale.
If all went well, the output shows a disaster. The alpha is spectacularly low: .38. The last column however shows you what to do. If you exclude the first item, alpha increases considerably (to .68, which is above the criterion of .65). Take a look also at the way the questions were formulated) see Question 25 and compare with 26, 27, and 28. Maybe this will help you explain the outcome, and maybe you have a suggestion how to improve the scale.
| Statistics for | Mean | Variance | Std Dev | N of Variables |
|---|---|---|---|---|
| SCALE | 13,0693 | 7,3459 | 2,7103 | 4 |
| Item-total Statistics | ||||
| Scale Mean if Item Deleted | Scale Variance if Item Deleted | Corrected Item-Total Correlation | Alpha if Item Deleted | |
| ARTHOUSE | 8,4922 | 8,3962 | −,3626 | ,6843 |
| FILMINT | 10,6205 | 4,5553 | ,3052 | ,2046 |
| VISIT | 9,9185 | 3,6097 | ,4767 | −,0489 |
| CURIOUS | 10,1768 | 3,3680 | ,4581 | −,0522 |
| Reliability Coefficients | ||||
| N of Cases = 577,0 | N of Items = 4 | |||
| Alpha = ,3827 | ||||
Here are the box plots you wanted:
There is a good chance that you got something that does not quite look like the figure presented here. Maybe you had the missing values as a separate category. To avoid this from happening follow the following procedure:
Select “Simple” and make sure that Summaries for groups of cases is selected. Click on Define. This opens the following Dialogue box.
Select museum1 as variable, and gender for Category axis. Click on Options and this is what you will see.
Make sure that the box for Display groups defined by missing values is not ticked! Now click Continue and then on Okay.
Let us return to the output. The box plots for men and women look much the same. 50% of the male part of the sample sits between 2 and 3. The same holds for the females. For both male and female participants we see the same largest observed value that is not an outlier: 1–4. That is all not so very strange, considering the scale that we are looking at, one that runs from 1–5. We do see a number of outliers, that is, values that are more than 1.5 box removed from the 75th percentile. In this situation there is no reason to consider removing these cases from your data set. What is interesting is that the median for female participants is higher than that for the men in the sample. As described in the text of Chapter 8, the median is a much more informative measure for central tendency than mean.
Follow the same procedures and this is what SPSS will show you.
What this graph shows you is that the median is the same for male and female subjects. However, the rest looks different. We see that the females have some outliers (scoring 5), but for the men 5 is the largest observed value that isn’t an outlier. There seems to be a stronger consensus among the women. The spread among men is larger. 50% scores between 1 and 3 and for the women it is between 2 and 3 (or rather 2 and 3, because the scale that was used does not allow for scores between these numbers. On the whole, the graph reveals to you the men seem to have been less interested in art cinema than the women.