Chapter 9
Module 2For self study: Running tests

How to make scatterplots

In this chapter we pointed out that SPSS can graphically represent the relation between two variables that you have studied. Here we will give you some assignments that you can use to train yourself in how to do this.

Have a look at the file nao_te_amo.sav and open it in SPSS. Now you want to know whether when readers come to lines 7 and 8, they have been ‘programmed’ by the monotony of the lines so that they will basically react to these lines in the same way. Make a scatterplot for lines 7 and 8 and interpret them in such a way that the preceding question can be answered. How can you further facilitate the interpretation?
Now do the same for lines 8 and 9. Do you get a similar result? How do you interpret this?

Correlations

One of our students, Ilonka, investigated the relation between children’s ethnic background and their use of television. Open her questionnaire to see what it looked like. In her analyses she wanted to combine participants’ background variables (questions 4, 5, and 6) in one, to come to a “degree of ethnicity”, a three-point scale with scores ranging from 0 to 2. In this assignment we will investigate whether Ilonka’s data reveal a statistical association between degree of ethnicity and their frequency of “exposure to television”. Results of previous studies indicate a correlation between these two variables, but some researchers disagree. Let us see what Ilonka’s results suggest. Open the SPSS file Ilonka.sav. Look through the list of variables she entered, and compare them with the questionnaire.

2.1

First of all, determine what levels of measurement you are dealing with. In your assignment, explain how you do this. And, which test should you run to examine the correlations?

2.2

Examine the correlation between on the one hand degree and the two variables for exposure to television (daystv and hourstv) on the other. Interpret the results.

Correlations

For this assignment we look at the file Moviezone.sav. These are the results of a study by two of our students. They were interested in the relation between high school students’ background variables and their appreciation for the arts. Open their questionnaire, and look at the type of questions they included.

3.1

Investigate what the correlation is between on the one hand subjects’ age and on the other their evaluation of the culture and arts education program (those are variables ckv1, ckv2, ckv3 and ckv4). Look again at the questionnaire, questions 1 and 4, and consider which correlation coefficient you want to use here, Spearman or Pearson. Interpret the results.

3.2

Investigate what the correlation is between the educational level of the subjects (level) and subjects’ evaluation of visits to the museum (museum1, museum2) and theater performances (theater1, theater2). Before doing so, first consider at what level each of these variables is measured: which test for correlations do you use here? Describe your results.

Key

Assignment 1

We asked you to investigate whether lines 7 and 8 in the ‘poem’ produce essentially the same reaction in the readers. The most obvious place to start looking for an answer to this question is to plot readers’ reaction to one of the lines on a horizontal, and their reactions to the other line on the vertical axis. That is what a scatterplot does: it shows you whether there is a particular pattern in those data, or whether responses were just random. What you do is go to GRAPHS, then LEGACY DIALOGS, then SCATTER/DOT, then click on SIMPLE, then DEFINE. Now you transport the files for lines 7 and 8 into the boxes for x- and y-axes. Then click on OK. If you did everything correctly, you will see the following graph on your screen:

As you can see, reactions to line 7 are plotted (that why it is called a scatterplot) on the horizontal, those to line 8 on the vertical axis. If most readers would react differently to those two lines, then the dots would be all over the place. Now you see that this is not the case: you see that the dots are actually more or less ‘grouped’ in some kind of cloud that starts from the bottom left of the graph and then goes up to the upper right hand corner. That is a clear indication of a positive relation: whenever some readers gave a low evaluation of line 7 they also gave low marks to line 8, and vice versa: when line 8 got high marks, line 7 was also evaluated highly.

Now maybe you find this still a bit wishy-washy to interpret. SPSS can also show you a line that represents the ‘best fit’ between all the points in the scatterplot. Such a line is called a regression line. You can obtain it if in SPSS you go to GRAPHS, then LEGACY DIALOGS, then SCATTER/DOT, then on SIMPLE and subsequently on DEFINE. Now a new window opens. Enter the lines 7 and 8 in the boxes connected by the horizontal and vertical arrows (by drag and drop), then click on OK. SPSS will now produce a scattergram like before, but with a regression line in it. Double click in the graph, then click on on of the points in the scattergram, then go to ELEMENTS (upper left) and finally click on FIT LINE AT TOTAL. If all went well, you will now see the following result.

As you can see, the regression line now clearly shows a positive correlation (compare to Figure 9.5 in the chapter). Moreover, SPSS has also calculated for you the R-square measure. (SPSS does this automatically when you ask for a regression line in a scatterplot.) You can see that this measure here is 0.92, meaning that 92% of the variability in the answers to line 8 can be explained by the variability in the responses to line 7. That leaves 8% of the differences in responses to the lines to be explained by other factors than the response to the other (presumably previous) line. You may appreciate that 92% is a rather high number, meaning that basically readers reacted to the two lines in almost an identical way!

Assignment 2

You were asked to produce a scatterplot for lines 8 and 9 and again to interpret the results. If you did everything you have to do to obtain such a graph, you should see the following one:

As you can see, this scatterplot is much less clear, and therefore more difficult to interpret. In such a case it makes even more sense to ask SPSS for a graph with the regression line and the R-square measure. Here is what you should have obtained:

You see that there is still some kind of positive correlation between the reactions to the two lines, but the regression line is not so steep as with lines 7 and 8, meaning that the relation between the responses to these two lines is much weaker. This is also what the R-square measure tells you: it is 0.11 here, which means that only 11% of the variability in the response to line 9 can be explained by readers’ reactions to line 8. Or, in other words: almost 90% has to be explained by other factors (which we do not know). Hence the conclusion must be that there is hardly a relationship between readers’ judgment of lines 8 and 9.

Correlations

2.1

Before we start running any tests, we should always determine the level of measurement of the variables involved. Sometimes there is room for doubt, however, which level we are dealing with. In Ilonka’s study she constructed a new variable, ‘degree of ethnicity’, consisting of the scores on three other variables. The scores ranged from 0–2. You can argue two ways here. First, you could say that it is a measurement on the ratio level. There is an absolute zero: some participants are not part of an ethnic minority, not to any degree. Their score is 0 on degree.

However, you can also argue that Ilonka’s scale is nothing more than a construct. If there is anything ‘out there’ that we could call “degree of ethnicity” then it is determined by many more factors than she was able to assess with her questionnaire. In that case, we have here an ordinal scale.

According to us you can argue both ways, but the first seems a reasonable decision. Of course, things are more complex than the design of the questionnaire suggests, but within the context of the study you have good reasons to use degree as a ratio measurement.

Your decision determines the choice of test. In case you follow our advice, you need to use the Pearson correlation. In case you decided for ordinal, you should have used Spearman.

2.2

In the feedback we will concentrate on Pearson. If all went well, you found Correlate in the Analyze menu, and selected Bivariate. In the dialogue box below you see how you should have instructed SPSS.

Click on OK and SPSS produces the following output.

Correlations

Degree of ethnicity Number of days per week television watching Number of hours per day

Degree of ethnicity Pearson Correlation 1 .145 * .176 **

Sig. (2-tailed) .019 .004

N 262 262 262

Number of days per week television watching Pearson Correlation .145 * 1 .287 **

Sig. (2-tailed) .019 .000

N 262 262 262

Number of hours per day Pearson Correlation .176 ** .287 ** 1

Sig. (2-tailed) .004 .000

N 262 262 262

*

Correlations
Degree of ethnicity	Pearson Correlation	1	.145 *	.176 **
	Sig. (2-tailed)		.019	.004
	N	262	262	262
Number of days per week television watching	Pearson Correlation	.145 *	1	.287 **
	Sig. (2-tailed)	.019		.000
	N	262	262	262
Number of hours per day	Pearson Correlation	.176 **	.287 **	1
	Sig. (2-tailed)	.004	.000
	N	262	262	262

Correlation is significant at the 0.05 level (2-tailed).

Correlation is significant at the 0.01 level (2-tailed).

We see that degree of ethnicity correlates positively with both variables for television watching. Second, the correlations are both significant. This means that the higher participants scored on Ilonka’s scale the more television they watched. We cannot say, of course, that ethnicity causes this behavior, but we can say that there is a statistical relationship. In your final report (e.g., your paper or thesis), it is important that you do not use SPSS outputs. Often the SPSS tables take up much more space than necessary. In the present example you report that the value of Pearson correlation is .145 (p < .019) for the relation between degree of ethnicity and the numbers of days that participants watched television per week and .176 (<.004) for the number of hours per day.

Compare the two results: .145 vs. .176. This means that an increase of one unit on the scale for degree means about the same change on both the other scales. This is what you would expect; the two variables measure about the same type of behavior. However, the chance that the relations are based on coincidence is larger for the number of days (1.9%) than for the number of hours per day (.04%).

Correlations

3.1

For this assignment we first determine the level of measurement of the variables involved. Age is of course measured on a ratio level, and the appreciation of the art classes on an interval level (see Chapter 5). This means that you can use Pearson to see whether these variables correlate. In case you conducted the test correctly, you will see the following output on your screen.

Correlations

age cae1 cae2 cae3 cae4

age Pearson Correlation 1 −.032 −.012 .017 .073

Sig. (2-tailed) .410 .655 .763 .062

N 675 663 657 658 658

cae1 Pearson Correlation −.032 1 .538 ** .735 ** .559 **

Sig. (2-tailed) .410 .000 .000 .000

N 663 666 656 656 655

cae2 Pearson Correlation .017 .538 ** 1 .653 ** .664 **

Sig. (2-tailed) .655 .000 .000 .000

N 657 656 660 656 657

cae3 Pearson Correlation −.012 .735 ** .653 ** 1 .641 **

Sig. (2-tailed) .763 .000 .000 .000

N 658 656 656 661 655

cae4 Pearson Correlation .073 .559 ** .664 ** .641 ** 1

Sig. (2-tailed) .062 .000 .000 .000

N 658 655 657 655 661

**

Correlations
age	Pearson Correlation	1	−.032	−.012	.017	.073
	Sig. (2-tailed)		.410	.655	.763	.062
	N	675	663	657	658	658
cae1	Pearson Correlation	−.032	1	.538 **	.735 **	.559 **
	Sig. (2-tailed)	.410		.000	.000	.000
	N	663	666	656	656	655
cae2	Pearson Correlation	.017	.538 **	1	.653 **	.664 **
	Sig. (2-tailed)	.655	.000		.000	.000
	N	657	656	660	656	657
cae3	Pearson Correlation	−.012	.735 **	.653 **	1	.641 **
	Sig. (2-tailed)	.763	.000	.000		.000
	N	658	656	656	661	655
cae4	Pearson Correlation	.073	.559 **	.664 **	.641 **	1
	Sig. (2-tailed)	.062	.000	.000	.000
	N	658	655	657	655	661

Correlation is significant at the 0.01 level (2-tailed).

You can conclude from these data that there is no statistical relation between on the one hand age, and on the other the four items for appreciation. This means that you cannot say that the older (or younger) the participants are, the more they will appreciate the art classes. In case you looked further than the assignment asks you to, you may have noticed that the other variables do correlate significantly among each other (all p’s < .000). The higher participants score on the one variable the higher they will also score on the others.

3.2

As before, we are mostly dealing with variables at the interval level. This means we can use Pearson again. However, what about the education level? Can we say that the distance (in other words, the intervals) between the levels is equal? We have three levels. One of 4 HAVO, 4 VWO and 5 VWO. The distance between 4 HAVO and 4 VWO is not the same as 4 VWO and 5 VWO. We are dealing with a difference between two education types (HAVO vs. HAVO) and two years within one (4VWO vs. the next year, 5 VWO). There is no way we can substantiate the claim that the intervals are equal. Therefore, since this variable is an ordinal one, we need to use Spearman correlation. Conduct the test and SPSS produces the following output.

Correlations

level of education museum1 museum2 theater1 theater2

Spearman’s rho level of education Correlation Coefficient 1.000 .168 ** .158 ** .101 ** .070

Sig. (2-tailed) . .000 .000 .009 .072

N 678 665 663 671 660

museum1 Correlation Coefficient .168 ** 1.000 .774 ** .379 ** .371 **

Sig. (2-tailed) .000 . .000 .000 .000

N 665 665 654 662 656

museum2 Correlation Coefficient .158 ** .774 ** 1.000 .381 ** .382 **

Sig. (2-tailed) .000 .000 . .000 .000

N 663 654 663 658 652

theater1 Correlation Coefficient .101 ** .379 ** .381 ** 1.000 .740 **

Sig. (2-tailed) .009 .000 .000 . .000

N 671 662 658 671 656

theater2 Correlation Coefficient .070 .371 ** .382 ** .740 ** 1.000

Sig. (2-tailed) .072 .000 .000 .000 .

N 660 656 652 656 660

**

Correlations
Spearman’s rho	level of education	Correlation Coefficient	1.000	.168 **	.158 **	.101 **	.070
Sig. (2-tailed)	.	.000	.000	.009	.072
N	678	665	663	671	660
museum1	Correlation Coefficient	.168 **	1.000	.774 **	.379 **	.371 **
Sig. (2-tailed)	.000	.	.000	.000	.000
N	665	665	654	662	656
museum2	Correlation Coefficient	.158 **	.774 **	1.000	.381 **	.382 **
Sig. (2-tailed)	.000	.000	.	.000	.000
N	663	654	663	658	652
theater1	Correlation Coefficient	.101 **	.379 **	.381 **	1.000	.740 **
Sig. (2-tailed)	.009	.000	.000	.	.000
N	671	662	658	671	656
theater2	Correlation Coefficient	.070	.371 **	.382 **	.740 **	1.000
Sig. (2-tailed)	.072	.000	.000	.000	.
N	660	656	652	656	660

Correlation is significant at the 0.01 level (2-tailed).

As you can conclude from the results, there is a positive relation between the level of education and the appreciation for museum visits as well as for theater. However, for the second variable pertaining to theater the correlation is only tendentially significant. For the other three we can say that the higher the level of education the higher the level of appreciation is. Also look at the values of Spearman’s test: they are between .101 and .168. That shows that the relations may be significant, but not very strong.

Chapter 9Module 2For self study: Running tests

Chapter 9
Module 2For self study: Running tests