Go back to the general summary on Using SPSS for Windows 95
Examples Using SPSS for Windows
(Note: These examples were written for version 8.0 - As soon as I can, I will update these examples to version 10.0.)
Example for section II. When you first enter SPSS for Windows, you should see the blank data window. The size for any window can be changed to suit your own preferences. The data window is organized like a spreadsheet, where you see that rows are numbered 1, 2, 3, etc., and columns are each labeled var. Data can be entered into each cell by simply using the arrow keys or the mouse to select the desired cell, typing the desired data, and pressing either the Enter key or an arrow key.

Enter the data shown in the figure on the right. As a general rule, data files are organized so that rows represent different cases (i.e., different items in a sample), and columns represent different variables. Note that in the data shown in the figure on the right, the first column is named var00001, and the second column is named var00002. To give the variables (i.e., the columns) more reasonable names, we must first know what variables were recorded to produce this data set. The ones (1.00s) and twos (2.00s) in the first column are used respectively to distinguish between two varieties of corn: variety V and variety W. The values in the second column are yields of corn in bushels per acre.
To name and define categories for the first variable, double click on the name var00001 after which the Define Variable dialogue box will be displayed. In the Variable Name slot of this dialogue box, type the name variety. To define the labels for the two categories of this variable, click on the Labels button after which the Define Labels dialogue box will be displayed. Note that in the Value Labels section of this dialogue box there is a Value slot and a Value Label slot.
In the Value slot type a 1, and in the Value Label slot type a V. Then click on the Add button, and observe that 1.00="V" appears in the section at the bottom of the dialogue box. Now, type a 2 in the Value slot, and type a W in the Value Label slot. Then click on the Add button, and observe that 2.00="W" appears in the section at the bottom of the dialogue box. At the top of the dialogue box, there is a Variable Label slot, which allows you to store a longer more meaningful name for the variable; type Variety of Corn. Click on the Continue button to return to the Define Variable dialogue box, and then click on the OK button to return to the data window. You should now see that the first column has been renamed variety and that Vs and Ws are displayed in the first column in place of the 1.00s and 2.00s.

To name the second variable, double click on the name var00002 after which the Define Variable dialogue box will be displayed. In the Variable Name slot of this dialogue box, type the name yield. There is no need to define categories for this variable; however, since the yield measurements were recorded to decimal place accuracy, we want the data to be displayed with only one decimal place. To accomplish this, click on the Type button after which the Define Variable Type dialogue box will be displayed. Note that there is a Decimal Places slot in the right portion of this dialogue box. Enter a 1 in the Decimal Places slot. Click on the Continue button to return to the previous dialogue box, and then click on the OK button to return to the data window. You should now see that the second column has been renamed yield and that the yields are displayed with only one decimal place.
In order to save this data file, use either the File>Save Data options or the File>Save As options. Suppose you decide to name the data file variety.sav. Since this is the first time you are saving this data file, no previous version exists; as a result, you must specify both the drive (the a: drive, for example) and the directory. You can use the mouse to click on the appropriate options in order to select the drive and directory, or you can just type the appropriate name in the slot. Type a:\variety.sav, and click on the Save button to save the data in the main directory of the disk in the a: drive.
You should be aware that .sav is the default extension for data files in SPSS; even though you do not have to use this extension, it may be a good idea to do so since these type of data files are the ones SPSS will automatically search for, and since this will make it easy for you to distinguish your SPSS data files from other types of files.
Note: To proceed with this example, you need a copy of the data file named survey.sav which you may obtain from the network in the folder Server_1\SYS\Apps\Fac_prgs\Sprechini .
Suppose you want to retrieve a copy of the SPSS data file survey.sav which has been previously created and saved on the disk in the a:\ drive. Choose File>Open from the main menu. You must specify both the drive and the directory as well as the file name. You can use the mouse to click on the appropriate options in order to select the drive, directory, and file name, or you can just type the appropriate name in the slot. Type a:\survey.sav, and click on the Open button, after which you should see that there are 30 rows and 10 columns of data, the first 11 lines of which are displayed in the figure titled a:\survey.dat.
a:\survey.sav

The first variable named idno is just a two-digit identification number for each person in a sample. Since we do not want to include such a variable in any statistical analysis, we have defined this variable to a String type variable. To see this, double click on the name idno after which the Define Variable dialogue box will be displayed. Click on the Type button after which the Define Variable Type dialogue box will be displayed. You should now see a list of several different choices for types of variables: Numeric, Comma, Dot, Scientific notation, Date, Dollar, Custom currency, and String. You should observe that String has been selected. All the other variables in the data set are Numeric. Since you do not want to make any changes in this dialogue box, click on the Cancel button to return to the Define Variable dialogue box.
Since the width of the column for idno is larger than is really needed, we may want to reduce the column width to save space on the screen. To accomplish this, click on the Column Format button after which the Define Column Format dialogue box will be displayed. Note that there is a Column Width slot in this dialogue box. Enter a 3 in the Column Width slot. Click on the Continue button to return to the previous dialogue box, and then click on the OK button to return to the data window. You should now see that the idno column has been reduced in size.
The second variable sex has been coded so that 0 (zero) represents male and 1 (one) represents female. To define the categories for sex, double click on the name sex after which the Define Variable dialogue box will be displayed. Click on the Labels button after which the Define Labels dialogue box will be displayed. In the Value slot type a 0, and in the Value Label slot type the label Male. Then click on the Add button, and observe that .00="Male" appears in the section at the bottom of the dialogue box. Now, type a 1 in the Value slot, and type the label Female in the Value Label slot. Then click on the Add button, and observe that 1.00="Female" appears in the section at the bottom of the dialogue box. Click on the Continue button to return to the Define Variable dialogue box. You should now see that the labels Male and Female appear respectively in place of 0 and 1, although the names may not quite fit inside the column width.
To adjust the column width for sex so that the labels are entirely displayed, click on the Column Format button after which the Define Column Format dialogue box will be displayed. Enter a 6 in the Column Width slot. Click on the Continue button to return to the Define Variable dialogue box, and then click on the OK button to return to the data window. You should now see that the labels Male and Female now fit nicely inside the column width.
The third variable residnce (which is an abbreviation for area of residence) has been coded so that 1 (one) represents rural, 2 (two) represents suburban, and 3 (three) represents urban. Enter the Define Labels dialogue box and define the appropriate labels. You should find that these labels fit nicely inside the column width.
The fourth variable polparty (which is an abbreviation for political party affiliation) has been coded so that 1 (one) represents Republican, 2 (two) represents Democrat, 3 (three) represents Independent, and 4 (four) represents Other. Enter the Define Labels dialogue box and define the appropriate labels, although the names may not quite fit inside the column width. Enter the Define Column Format dialogue box and change the column width to 10; you should find that the labels fit nicely inside this column width.
Since we have made a substantial number of changes to the data file, it is a good idea to save it. Use either the File>Save Data options or the File>Save As options to save the data. If you use the File>Save Data options, your copy of the file survey.sav will immediately be updated (and you should see the light for the a: drive go on as this occurs). If you use the File>Save As options, you must specify the a: drive, and you must enter the desired file name (both of which can be accomplished either by using the mouse to select the appropriate choices or by typing A:\SURVEY.SAV). You must then click on the Save button after which you will be asked whether or not you wish to replace the earlier version of survey.sav with the updated version; by clicking on Yes button, the data file will be saved.
Example #1 for section III. We shall refer to the version of the SPSS data file survey.sav which would have been saved after the changes in the previous example were made. The first 11 lines of the data file are displayed in the figure titled a:\survey.dat - modified.
a:\survey.dat - modified

There are basically three kinds of situations where one wants to create a table or graph to display the distribution for a single variable: (1) the variable is categorical and nonnumerical, (2) the variable is numerical with many repetitions of a few different values, or (3) the variable is numerical with few or no repetitions of many different values. Creating a frequency table can be done using the Statistics>Summarize>Frequencies options from the main menu. Creating a graph is a three-step process:
(1) selecting from the main menu the Graph option followed by an option corresponding to the type of graph desired,
(2) setting parameters to produce an initial graph in an SPSS output viewer window,
(3) double clicking on the graph to display the graph in a chart editor window where options from the chart editor menu bar can be used to customize the graph and/or print the graph and/or save the graph.
The variable polparty (political party affiliation) can be considered categorical and nonnumerical. A frequency table listing raw or relative frequencies would be an appropriate tabular display of the distribution for polparty. A bar chart or pie chart would be an appropriate graphical display of the distribution for polparty.
To create a frequency table to display the distribution for polparty, select the Statistics>Summarize>Frequencies options from the main menu. In the Frequencies dialogue box which is displayed, you should see a list of the variables on the left. Select polparty, and click on the arrow button pointing toward the Variable(s) section of the dialogue box. Now, click on the OK button, after which the SPSS output viewer window is displayed. You may change the size of this window as desired. In this window you will see the following:
(i) a table titled Statistics which lists the number of missing values for each variable,
(ii) a frequency table which lists the four politcal parties together with corresponding frequencies and relative frequencies.
Select the Window option inside the SPSS output viewer window, and notice that you may choose between displaying the data or the output. With the output window displayed, notice that the screen is divided into two sections. The large section on the right displays the actual output, and the small section on the left contains a list of the various portions of the output. (This list can be used in editing the output.) By clicking on a title or table, that title or table becomes selected after which you will find that you can delete, move, or resize it. To deselect the title or table, click in a blank area outside that title or table. By double clicking on a title or table, you will find that you can edit that title or table. To exit from edit mode, double click in a blank area outside that title or table. By selecting the File option, you will find that you can print a copy of the output, and that you can save a copy of the output for future use. (You should be aware that .spo is the default extension for output files in SPSS; even though you do not have to use this extension, it may be a good idea to do so since these type of data files are the ones SPSS will automatically search for, and since this will make it easy for you to distinguish your SPSS output files from SPSS data files and other types of files.)
Using the Window option from the menu, you can return to the window containing the data. You should be aware that by default any future output you generate will be added to the existing SPSS output viewer window. If you want future output to be placed into a separate output window, you must use the File>New options from the menu.
To create a bar chart to display the distribution for polparty, select the Graphs>Bar options from the main menu. (This completes the first of the three steps to create a graph, listed earlier.) In the Bar Charts dialogue box which is displayed, make certain that the picture corresponding to Simple has been selected, and make certain that the option Summaries for groups of cases has been selected; then, click on the Define button. In the Define Simple Bar dialogue box which is displayed, you should see a list of the variables on the left. Select polparty, and click on the arrow button pointing toward the Category Axis slot of the dialogue box. Select % of cases in the Bars Represent section of the Define Simple Bar dialogue box. Now, click on the OK button, after which the bar chart is displayed in an SPSS output viewer window. (This completes the second of the three steps to create a graph, listed earlier.) Double click on the graph to display the graph in a chart editor window. By clicking on various portions of the graph or by using options from the Chart Editor menu bar, you can customize the graph. For instance, by double clicking on the title for the horizontal axis, POLPARTY, you can enter the Category Axis dialogue box where you may, among other things, edit the title by changing it to POLITICAL PARTY AFFILIATION. Clicking on the OK button will update the graph. By selecting the File>Close option, you will exit the chart editor and return to the SPSS output viewer window. (This completes the last of the three steps to create a graph, listed earlier.)
Using the Window option from the menu, you can return to the data window. Each SPSS output viewer window will remain open unless you use the File>Close options to close the window. (If you close a window without saving the contents, you will be asked if you wish to save the contents; however, an output file can require a lot of space to store, especially when it contains one or more graphs.)
To create a pie chart to display the distribution for polparty, select the Graphs>Pie options from the main menu. In the Pie Charts dialogue box which is displayed, make certain that the option Summaries for groups of cases has been selected, then click on the Define button. In the Define Pie dialogue box which is displayed, you should see a list of the variables on the left. Select polparty, and click on the arrow button pointing toward the Define Slices by slot of the dialogue box. Make certain that the option N of cases has been selected in the Slices Represent section of the Define Pie dialogue box. Now, click on the OK button, after which the bar chart is displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. Select the Chart>Options options from the chart editor menu after which the Pie Options dialogue box is displayed. Select both Values and Percents in the Labels section of this dialogue box. Then click on the Format button to enter the Pie Options: Label Format dialogue box. In the values section of this dialogue box, change the Decimal places from 2 to 0, and click on the Continue button to return to the Pie Options dialogue box. Now, click the OK button to update the graph. By selecting the File>Close option, you will exit the chart editor and return to the SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
The variable numchild (number of children) can be considered numerical with many repetitions of a few different values. A frequency table listing raw or relative frequencies would be an appropriate tabular display of the distribution for numchild. A histogram would be an appropriate graphical display of the distribution for numchild.
To create a frequency table to display the distribution for numchild, select the Statistics>Summarize>Frequencies options from the main menu. In the Frequencies dialogue box which is displayed, you should see a list of the variables on the left. Select numchild, and click on the arrow button pointing toward the Variable(s) section of the dialogue box. Now, click on the OK button, after which the frequency table is displayed in an SPSS output viewer window.
To create histogram to display the distribution for numchild, select the Graphs>Histogram options from the main menu. In the Histogram dialogue box which is displayed, you should see a list of the variables on the left. Select numchild, and click on the arrow button pointing toward the Variable slot. Now, click on the OK button, after which the histogram is displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. Double click on the title for the horizontal axis, NUMCHILD, to enter the Interval Axis dialogue box. Change the title to Number of Children. In the Intervals section of the dialogue box, click on the Custom option and then click on the Define button. In the Interval Axis: Define Custom Intervals dialogue box, select the Interval width option, change the interval width to 1, change the Minimum Displayed to -0.5, change the Maximum Displayed to 10.5, and click on the Continue button. Click on the OK button to update the graph. By selecting the File>Close option, you will exit the chart editor and return to the SPSS output viewer window. (Note: since numchild may be treated as a discrete variable rather than a continuous variable, one may prefer not to have the bars touching each other in order to emphasize the fact that numchild is not continuous. To accomplish this, the Graphs>Bar options could be used in place of the Graphs>Histogram options.)
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
The variable income (yearly income in $1000s) can be considered numerical with few or no repetitions of many different values. A frequency table listing raw or relative frequencies would be an appropriate tabular display of the distribution for income. A histogram would be an appropriate graphical display of the distribution for income.
To create a frequency table to display the distribution for income, select the Statistics>Summarize>Frequencies options from the main menu. In the Frequencies dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Variable(s) section of the dialogue box. Now, click on the OK button, after which the frequency table is displayed in an output window. Looking at the frequency table should confirm the fact that the variable income is one with few repetitions of many different values. We shall need to define income classes in order to display the distribution with a histogram.
To create histogram to display the distribution for income, select the Graphs>Histogram options from the main menu. In the Histogram dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Variable slot. (If there is already a variable name in the Variable slot, you will have to click on that variable name and then click on the arrow button in order to remove the name from the box.) Now, click on the OK button, after which the histogram is displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. Double click on the title for the horizontal axis, INCOME, to enter the Interval Axis dialogue box. Change the title to Yearly Income ($1000s). In the Intervals section of the dialogue box, click on the Custom option and then click on the Define button. In the Interval Axis: Define Custom Intervals dialogue box, select the Interval width option, change the interval width to 10, change the Minimum Displayed to 20, change the Maximum Displayed to 80, and click on the Continue button. Click on the OK button to update the graph.
With the previous settings for Minimum Displayed and Maximum Displayed, the first category consists of incomes from 20 thousand up to but not including 30 thousand; the second category consists of incomes from 30 thousand up to but not including 40 thousand; etc. Suppose you would prefer (as is often done) that the first category consist of incomes above 20 thousand up to and including 30 thousand; the second category consists of incomes above 30 thousand up to and including 40 thousand; etc. To accomplish this, double click on any of the numerical labels on the horizontal axis to enter the Interval Axis dialogue box. In the Intervals section of the dialogue box (where the Custom option was previously selected), click on the Define button. In the Interval Axis: Define Custom Intervals dialogue box, change the Minimum Displayed to 20.1, change the Maximum Displayed to 80.1, and click on the Continue button to return to the Interval Axis dialogue box. In the Interval Axis dialogue box, click on the Labels button to enter the Interval Axis: Labels dialogue box. Change the decimal paces to 0 (zero), click on the Continue button, and then click on the OK button to update the graph. By selecting the File>Close option, you will exit the chart editor and return to the SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
Example #2 for section III. We shall refer to the version of the SPSS data file survey.sav on which the previous example was based. There are basically three kinds of situations where one wants to create a table or graph to display the relationship between two variables: (1) both variables are categorical, (2) both variables are numerical, and (3) one variable is categorical and the other variable is numerical.
The variables residnce (area of residence) and polparty (political party affiliation) can each be treated as categorical. A contingency table would be an appropriate tabular display of these two variables. A stacked bar chart would be an appropriate graphical display of the relationship between these two variables.
To create a contingency table to display the relationship between residnce and polparty, select the Statistics>Summarize>Crosstabs options from the main menu. In the Crosstabs dialogue box which is displayed, you should see a list of the variables on the left. Select residnce, and click on the arrow button pointing toward the Row(s) section of the dialogue box; then, select polparty, and click on the arrow button pointing toward the Column(s) section of the dialogue box. Now, click on the OK button, after which a contingency table is displayed in an SPSS output viewer window.
Using the Window option from the menu, return to the window containing the data.
To create a stacked bar chart to display the relationship between residnce and polparty, select the Graphs>Bar options from the main menu. In the Bar Charts dialogue box which is displayed, select the Stacked option by clicking on the corresponding picture; then, make certain that the option Summaries for groups of cases has been selected, and click on the Define button. In the Define Stacked Bar dialogue box which is displayed, you should see a list of the variables on the left. Select polparty, and click on the arrow button pointing toward the Category Axis slot of the dialogue box; then select residnce, and click on the arrow button pointing toward the Define Stacks by slot of the dialogue box. Select N of cases in the Bars Represent section of the Define Stacked Bar dialogue box. Now, click on the OK button, after which the stacked bar chart is displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. Let us now explore how options from the Chart Editor menu bar can be used to customize the graph. Select the Chart>Options options from the menu to enter the Bar/Line/Area Options dialogue box. Click on the Change scale to 100% option, and then click on the OK button. Observe how the scaling on the vertical axis makes it easier to compare visually the distribution of the three areas of residence across the four political party categories. Now, select the Chart>Options options again and change the graph back by deselecting the Change scale to 100% option. Now, select the Series>Transpose Data options from the menu, and observe how the roles of residnce and polparty have been reversed. Observe also that since there are the same number (10) of voters in each of the three areas of residence, there is no need to rescale the vertical axis in order to compare visually the distribution of the four political party categories across the three areas of residence. Finally, observe what happens when you select the Format>Swap Axes options. By selecting the File>Close option, you will exit the chart editor and return to the SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
The variables age and income (yearly income in $1000s) can each be treated as numerical. A scatterplot would be an appropriate graphical display of the relationship between these two variables.
To create a scatterplot to display the relationship between age and income, select the Graphs>Scatter options from the main menu. In the Scatterplot dialogue box which is displayed, make certain that the picture corresponding to Simple has been selected, and click on the Define button. In the Simple Scatterplot dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Y Axis slot of the dialogue box; then select age, and click on the arrow button pointing toward the X Axis slot of the dialogue box. Now, click on the OK button, after which the scatterplot is displayed in an SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
The variable residnce (area of residence) can be treated as categorical, and the variable numchild (number of children) can be treated as numerical. An appropriate graphical display of the relationship between these two variables could be designed by constructing a separate histogram of the variable numchild for each of the three areas of residence. Of course, one might choose to substitute line graphs or boxplots in place of the histograms.
Let us demonstate how a boxplot of the variable numchild can be constructed for each of the three areas of residence. Select the Graphs>Boxplot options from the main menu. In the Boxplot dialogue box which is displayed, make certain that the picture corresponding to Simple has been selected, and make certain that the option Summaries for groups of cases has been selected. (Note: to create a boxplot of one or more variables measured with the same units for the entire data set, the option Summaries of separate variables would be selected.) Click on the Define button. In the Define Simple Boxplot dialogue box which is displayed, you should see a list of the variables on the left. Select numchild, and click on the arrow button pointing toward the Variable slot of the dialogue box; then select residnce, and click on the arrow button pointing toward the Category Axis slot of the dialogue box. Now, click on the OK button, after which the boxplots are displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. As before, the chart can now be edited as desired. For instance, you may select the Format>Swap Axes options, or you may click on the title for either axis in order to make modifications. Select the File>Close option to exit the chart editor and return to the SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
Example #3 for section III. We shall refer to the version of the SPSS data file survey.sav on which previous examples were based. Suppose we want to obtain a pairwise correlation matrix for the variables age, income (yearly income in $1000s), radiohrs (hours spent listening to the radio weekly), and tvhrs (hours spent watching TV weekly). Select the Statistics>Correlate>Bivariate options from the main menu. In the Bivariate Correlations dialogue box which is displayed, you should see a list of the variables on the left. Select age, and click on the arrow button pointing toward the Variables section of the dialogue box; repeat this for income, radiohrs, and tvhours. In the Correlation Coefficients section of the dialogue box, make certain that Pearson has been selected. (This represents Pearson's Product Moment Correlation which is typically used with variables which are assumed to have at least an approximate normal distribution; the other two choices are used with variables measured on an ordinal level or when the normality assumption is in doubt.) At the bottom of the dialogue box, make certain that the Flag significant correlations option has been selected, and note that in the Test of Significance section of the dialogue box, you have the option of displaying one-tailed or two-tailed P-values. (Unfortunately, in this dialogue box, the phrase significance level is incorrectly used in place of the phrase P-value.) Now, click on the OK button, after which an SPSS output viewer window containing the correlation matrix is displayed. Each entry of the matrix includes the value of the correlation, the sample size on which it is based, and the P-value.
Let us now consider the linear regression of income on age. Select the Statistics>Regression>Linear options from the main menu. In the Linear Regression dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Dependent slot of the dialogue box; then select age, and click on the arrow button pointing toward the Independent(s) section of the dialogue box. (Note: You can enter more than one variable into the Independent(s) section if you want to perform multiple regression; also, from the options in the Method slot of the dialogue box, you can choose to do stepwise regression regression, forward regression, or backward regression.) Now, click on the OK button, after which an SPSS output viewer window containing the results is displayed. The results include R-values, the standard error of estimate, the ANOVA table with corresponding f statisitc and P-value, and the intercept and slope in the least squares line.
Let us now explore how you can obtain a graph of the least squares line and a residual plot. Select once again the Statistics>Regression>Linear options from the main menu, and observe in the Linear Regression dialogue box how all the previous selections are displayed. Click on the Save button to display the Linear Regression: Save dialogue box. In the Predicted values section of the dialogue box select the Unstandardized option, and in the Residuals section of the dialogue box select the Unstandardized option. Click on the Continue button to return to the Linear Regression dialogue box, and then click on the OK button. Once again the results are displayed in the output window; however, if you use the Window option from the menu to return to the data window, you will find that two new variables (columns) have been added to the data: one labeled pre_1 which contains the predicted values and one labeled res_1 which contains the residuals.
To graph the least squares line on a scatterplot of the data, select the Graphs>Scatter options from the main menu. In the Scatterplot dialogue box which is displayed, select the picture corresponding to Overlay, and click on the Define button. In the Overlay Scatterplot dialogue box which is displayed, you should see a list of the variables on the left. Select age, select pre_1, and click on the arrow button pointing toward the Y-X Pairs section of the dialogue box; then, click on the Swap Pair button so that pre_1 will be on the vertical axis and age will be on the horizontal axis. Next, select age, select income, and click on the arrow button pointing toward the Y-X Pairs section of the dialogue box; then, click on the Swap Pair button so that income will be on the vertical axis and age will be on the horizontal axis. Now, click on the OK button, after which the scatterplot is displayed in an SPSS output viewer window.
Double click on the graph to display the graph in a chart editor window. You should find that on the lower right portion of the graph, legends are displayed, one for INCOME and one for Predicted Value, with different colored square dots for each. Click on the square dot for income (which will select that series), and select the Format>Marker options from the menu. In the Markers dialogue box which is displayed, select one of the circles in the Style section, click on the Apply button, and click on the Close button. (To deselect INCOME, you may click in any blank area of the chart window.) Select the File>Close option to exit the chart editor and return to the SPSS output viewer window.
Using the Window option from the menu, return to the window containing the data.
To graph the residuals, select the Graphs>Scatter options from the menu. In the Scatterplot dialogue box which is displayed, select the picture corresponding to Simple, and click on the Define button. In the Simple Scatterplot dialogue box which is displayed, you should see a list of the variables on the left. Select res_1, and click on the arrow button pointing toward the Y Axis slot of the dialogue box; then, select age, and click on the arrow button pointing toward the X Axis slot of the dialogue box. Now, click on the OK button, after which the residual plot is displayed in an SPSS output viewer window.
(To conserve memory, it may be a good idea to close an output window containing a lot of charts; you can decide whether or not you need to save the output for future reference.)
Using the Window option from the menu, return to the window containing the data.
Example #1 for section VI. We shall refer to the version of the SPSS data file survey.sav on which previous examples were based.
Suppose we want to test the null hypothesis that the mean yearly income in a certain state is $42 thousand against the one-sided alternative hypothesis that the mean yearly income is more $42 thousand. (We assume that our data represents a random sample from the state.) Select the Statistics>Compare Means>One-Sample T Test options from the main menu. In the One-Sample T Test dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Test Variable(s) section of the dialogue box. Type 42 in the Test Value slot of the dialogue box. Now, click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include the sample size, sample mean, sample standard deviation, standard error of the mean, a 95% confidence interval for the difference between the population mean and the hypothesized mean, the value of the one-sample t test statistic and corresponding degrees of freedom, and the two-tailed P-value. It is important to realize that the correct P-value for our one-sided test is half of the P-value displayed in the output.
Using the Window option from the menu, return to the window containing the data.
A paired t test is the same as a one-sample t test if we treat the differences between pairs of recorded observations as one sample of data. These differences may arise from two measurements recorded on each item in a sample, or from matched pairs in two dependent samples. Often, the null hypothesis states that the mean of the differences is zero. Suppose we want to test the null hypothesis that the mean difference between weekly TV hours and weekly radio hours in a certain state is zero against the two-sided alternative hypothesis that this mean difference is not zero. (We again assume that our data represents a random sample from the state.) Select the Statistics>Compare Means>Paired-Samples T Test options from the main menu. In the Paired-Samples T Test dialogue box which is displayed, you should see a list of the variables on the left. Select radiohrs, and then select tvhrs. Click on the arrow button pointing toward the Paired Variables section of the dialogue box, and then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include the sample size, the correlation between the two paired variables, the sample mean and standard deviation for each paired variable, the standard error of the mean for each paired variable, the sample mean difference, a 95% confidence interval for the mean difference, the value of the paired t test statistic and corresponding degrees of freedom, and the two-tailed P-value. The P-value displayed is correct for our test, since our test is two-sided.
Using the Window option from the menu, return to the window containing the data.
A two-sample t test is used to decide if there is a significant difference between two means, with two independent random samples. Suppose we want to test the null hypothesis that the mean weekly TV hours is equal for males and females in a certain state against the two-sided alternative hypothesis that the mean weekly TV hours is different for males and females. (We assume that our data represents two random samples, one of males from the state and one of females from the state.) Select the Statistics>Compare Means>Independent-Samples T Test options from the main menu. In the Independent-Samples T Test dialogue box which is displayed, you should see a list of the variables on the left. Select tvhrs, and click on the arrow button pointing toward the Test Variable(s) section of the dialogue box. Select sex, and click on the arrow button pointing toward the Grouping Variable slot of the dialogue box. Next, click on the Define Groups button; type 0 (zero) in the Group 1 slot, type 1 (one) in the Group 2 slot, and click on the Continue button. Then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include the sample sizes, the sample means and standard deviations, the standard error of the mean for each sample, and the difference between sample means. Also included are the results of Levene's test, which is an f test used to decide if there is a significant difference between two variances with two independent random samples. When there is no significant difference between variances, then the pooled two-sample t test is appropriate for deciding whether or not there is a significant difference between means. When there is a significant difference between variances, then the pooled two-sample t test could give misleading results (especially if the sample sizes are very different); in this situation, the separate t two-sample test is appropriate for deciding whether or not there is a significant difference between means. For each of the two situations, the results include the value of the two-sample t test statistic and corresponding degrees of freedom, the two-tailed P-value, the standard error of the difference between two the two sample means, and a 95% confidence interval for the difference between means.
Using the Window option from the menu, return to the window containing the data.
Example #2 for section VI. We shall refer to the version of the SPSS data file survey.sav on which previous examples were based. The t tests are based on the assumption that data are randomly observed from one or more populations having at least an approximate normal distribution. When there is reason to doubt the assumptions underlying a t test, a nonparametric test can be employed as an alternative. The primary difference between parametric tests and nonparametric tests is that parametric tests focus on a specific parameter such as the mean, whereas nonparametric tests focus more generally on the distribution of values.
The Wilcoxon signed rank test is a nonparametric version of the paired t test. Let us again test the null hypothesis that the distribution of differences between weekly TV hours and weekly radio hours is centered around zero against the two-sided alternative hypothesis that these differences tend to be either above zero more often than below or vice versa; except that this time we shall use the Wilcoxon signed rank test instead of a paired t test. Select the Statistics>Nonparametric Tests>2 Related Samples options from the main menu. In the Two-Related-Samples Tests dialogue box which is displayed, you should see a list of the variables on the left. Select radiohrs, and then select tvhrs. Click on the arrow button pointing toward the Test Pair(s) List section of the dialogue box. Make certain that Wilcoxon is selected in the Test Type section of the dialogue box, and then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include information about the rank sums and mean ranks corresponding to positive and negative differences, the number of zero differences (i.e., ties), the value of the Wilcoxon standard normal z test statistic, and the two-tailed P-value of the test. The P-value displayed is correct for our test, since our test is two-sided.
Using the Window option from the menu, return to the window containing the data.
The Mann-Whitney rank sum test is a nonparametric version of the two-sample t test. Let us again test the null hypothesis that the distribution of weekly TV hours is the same for males and females against the two-sided alternative hypothesis that the distribution of weekly TV hours is different for males and females; except that this time we shall use the Mann-Whitney rank sum test instead of a two-sample t test. Select the Statistics>Nonparametric Tests>2 Independent Samples options from the main menu. In the Two-Independent-Samples Tests dialogue box which is displayed, you should see a list of the variables on the left. Select tvhrs, and click on the arrow button pointing toward the Test Variable List section of the dialogue box. Select sex, and click on the arrow button pointing toward the Grouping Variable slot of the dialogue box. Next, click on the Define Groups button; type 0 (zero) in the Group 1 slot, type 1 (one) in the Group 2 slot, and click on the Continue button. Make certain that Mann-Whitney U is selected in the Test Type section of the dialogue box, and then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include information about the rank sums and mean ranks corresponding to the two samples, the value of the Mann-Whitney statistics (U and W) for each sample together with the exact two-tailed P-value uncorrected for ties, and the value of the Mann-Whitney standard normal z test statistic together with the two-tailed P-value. A two-tailed P-value is correct for our test, since our test is two-sided.
Using the Window option from the menu, return to the window containing the data.
Example #3 for section VI. We shall refer to the version of the SPSS data file survey.sav on which previous examples were based. One-way ANOVA is an abbreviation for one-way Analysis of Variance, which refers to a technique for deciding whether or not there is a significant difference among three or more means from independent samples. The categorical variable which defines the samples is called a factor; ANOVA is called one-way when there is only one such factor. The parametric one-way ANOVA is based on an f test which compares the variation among samples with the variation within samples; this f test is based on the assumption that independent random samples have been selected from populations each having at least an approximate normal distribution and all having equal variance. When there is reason to doubt the assumptions underlying this f test, a nonparametric test known as the Kruskal-Wallis test can be employed as an alternative.
Suppose we want to test the null hypothesis that the mean yearly income for voters is equal in the rural, suburban, and urban areas in a certain state against the alternative hypothesis that the mean yearly income for voters is different for at least one of the three areas of residence. (We assume that our data represents three random samples of voters, one from the rural area, one from the suburban area, and one of from the urban area.) Select the Statistics>Compare Means>One-Way ANOVA options from the main menu. In the One-Way ANOVA dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Dependent List section of the dialogue box; then select residnce, and click on the arrow button pointing toward the Factor slot of the dialogue box. Next, click on the Options button; in the One-Way ANOVA: Options dialogue box which is displayed, select the Descriptive option, select the Homogeneity-of-variance option, and click on the Continue button. Now, click on the OK button, after which an SPSS output viewer window containing the results of the ANOVA is displayed. The results include the sample sizes, the sample means and standard deviations, the standard error of the mean for each sample, a 95% confidence interval for each mean, the results of Levene's f test concerning equality of variance (used to decide if the equal variance assumption in a one-way ANOVA is correct), and the one-way ANOVA table. (Note: Prior to clicking on the OK button in the One-Way ANOVA dialogue box, you can choose to click on the Post Hoc button which displays a dialogue box which allows you to select a multiple comparison procedure, such as Scheffe's method; you may also select a corresponding desired significance level.)
Using the Window option from the menu, return to the window containing the data.
Consider the variable jobsat which is a job satisfaction ranking from 0 to 10. Job satisfaction is really a qualitative variable which has been made quantitative by using a Likert scale to measure it. With data such as these, the assumptions underlying parametric tests, such as t tests and one-way ANOVA, are often in doubt; consequently, a nonparametric test is often used. Suppose we want to test the null hypothesis that the distribution of job satisfaction scores for voters is the same in the rural, suburban, and urban areas in a certain state against the alternative hypothesis that the distribution of job satisfaction scores for voters is different for at least one of the three areas of residence. (We again assume that our data represents three random samples of voters, one from the rural area, one from the suburban area, and one of from the urban area.) Select the Statistics>Nonparametric Tests>K Independent Samples options from the main menu. In the Tests for Several Independent Samples dialogue box which is displayed, you should see a list of the variables on the left. Select jobsat, and click on the arrow button pointing toward the Test Variable List section of the dialogue box. Select residnce, and click on the arrow button pointing toward the Grouping Variable slot of the dialogue box. Next, click on the Define Range button; type 1 (one) in the Minimum slot, type 3 (three) in the Maximum slot, and click on the Continue button. (Recall that the integers 1, 2, and 3 were used to code the groups for the variable residnce.) Make certain that Kruskal-Wallis H is selected in the Test Type section of the dialogue box, and then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include information about the rank sums and mean ranks corresponding to the three samples, the value of the Kruskall-Wallis chi-square statistic together with its degrees of freedom, and the P-value of the test.
Using the Window option from the menu, return to the window containing the data.
Example #4 for section VI. A repeated measures ANOVA is used when, instead of obtaining data from three or more independent samples, data is obtained from repeated measurements recorded on each item in a sample, or from matched items in three or more dependent samples. When certain assumptions concerning normality and variance/covariance structure are fulfilled, the f test in the classical repeated measures ANOVA is appropriate to decide whether or not there is a significant difference among three or more means; when there is reason to doubt the assumptions underlying this f test, a nonparametric test known as the Friedman's test can be employed as an alternative. The examples here involve only one within subject factor, which is the simplest type of repeated measures data.
Note: To proceed with this example, you need a copy of the data file named music.sav which you may obtain from the network in the folder Server_1\SYS\Apps\Fac_prgs\Sprechini .
Suppose we want to use the f test to test the null hypothesis that the mean number of minutes to complete a certain task is equal with no background music, with soft background music, and with rock background music against the alternative hypothesis that the mean number of minutes to complete the task is different for at least one of the three types of background music. (We assume that our data consists of subjects who each performed the task three times: once with each type of background music in random order.) Select the Statistics>General Linear Model>GLM-Repeated Measures options from the main menu, after which the Repeated Measures Define Factor(s) dialogue box is displayed. In the Within-Subject Factor Name slot of the dialogue box, type music; in the Number of Levels slot of the dialogue box, type 3. Now, click on the Add button, and then click on the Define button to enter the GLM-Repeated Measures dialogue box where you should see a list of the variables on the left. Select no_mus, and click on the arrow button pointing toward the Within-Subjects Variables section of the dialogue box; repeat this for soft_mus and rock_mus. Now, click on the OK button, after which an SPSS output viewer window containing the results of the repeated measures ANOVA is displayed. The results displayed consist of the results of some tests involving whether or not the assumptions on which the f test is based are valid, the results of some multivariate tests of significance, and an abbreviated ANOVA table containing the results of the f test.
Using the Window option from the menu, return to the window containing the data.
Let us now use Friedman's test to test the null hypothesis that the mean number of minutes to complete a certain task is equal with no background music, with soft background music, and with rock background music against the alternative hypothesis that the mean number of minutes to complete the task is different for at least one of the three types of background music. Select the Statistics>Nonparametric Tests>K Related Samples options from the main menu, after which the Tests for Several Related Samples dialogue box is displayed. Select no_mus, and click on the arrow button pointing toward the Test Variables section of the dialogue box; repeat this for soft_mus and rock_mus. Make certain that Friedman is selected in the Test Type section of the dialogue box, and then click on the OK button, after which an SPSS output viewer window containing the results of the test is displayed. The results include information about the mean ranks corresponding to the different types of background music, the Friedman chi-square test statistic together with its degrees of freedom, and the P-value of the test.
Using the Window option from the menu, return to the window containing the data.
Example #5 for section VI. We shall refer to the version of the SPSS data file survey.sav on which most of previous examples were based. Two-way ANOVA involves deciding whether or not there is a significant difference among the means for a quantitative variable as defined by two categorical variables called factors. The parametric two-way ANOVA involes an f test for interaction as well as an f test for each of the two factors; these f tests are based on the assumption that independent random samples have been selected from populations each having at least an approximate normal distribution and all having equal variance.
Suppose we want to perform a two-way ANOVA comparing the mean yearly income for voters in a certain state between males and females and among the rural, suburban, and urban areas. (We assume that our data represents random samples of male and female voters from the rural, suburban, and urban areas.) Select the Statistics>General Linear Model>GLM-General Factorial options from the main menu. In the General Factorial dialogue box which is displayed, you should see a list of the variables on the left. Select income, and click on the arrow button pointing toward the Dependent slot of the dialogue box. Select sex, and click on the arrow button pointing toward the Fixed Factor(s) section of the dialogue box; then select residnce, and click on the arrow button pointing toward the Fixed Factor(s) section of the dialogue box.
Click on the Options button to display the General Factorial: Options dialogue box. In the Factor(s) and Factor Interactions section of the dialogue box, select all of the items in the list, and click on the arrow pointing toward the Display Means for section of the dialogue box. Click on the OK button, after which results of the two-way ANOVA are displayed in an SPSS output viewer window. The results displayed include cell means, row means, column means, the grand mean, and a two-way ANOVA table.
From the P-values in the ANOVA table, you will find that interaction is significant at the alpha=0.05 level; in order to create an interaction plot of the means, select the Graphs>Line options. In the Line Charts dialogue box which is displayed, select the picture corresponding to Drop-line, and click on the Define button. In the Define Drop-line dialogue box which is displayed, you should see a list of the variables on the left. Select residnce, and click on the arrow button pointing toward the Category Axis slot of the dialogue box; then, select sex, and click on the arrow button pointing toward the Define Points by slot of the dialogue box. Now, in the Points Represent section of the dialogue box, select Other summary function. Next, select income, and click on the arrow button pointing toward the Variable section of the dialogue box. Click on the OK button, after which the interaction plot is added to the SPSS output viewer window.
Double click on the graph in order to display the graph in an SPSS chart editor window. You should find that on the lower right portion of the graph, legends are displayed, one for male, and one for female, with different colored square dots for each. Click on the square dot for male (which will select that series), and select the Format>Marker options from the chart window menu. In the Markers dialogue box which is displayed, select one of the circles in the Style section, and click on the Apply button. Now, click on the square dot for female (which will select that series); in the Markers dialogue box which should still be displayed, select one of the triangles in the Style section, click on the Apply button, and click on the Close button. To deselect the female series, you may click in any blank area of the chart window. Select the File>Close options in the Chart Editor in order to return to the SPSS output viewer window. (For an alternative interaction plot, you could repeat the process described here with the roles of sex and residnce interchanged.)
Example #6 for section VI. We shall refer to the version of the SPSS data file survey.sav on which previous examples were based. In general, the chi-square goodness-of-fit test is used for deciding whether or not the population percentages (proportions) corresponding to a categorical variable differ significantly from a set of hypothesized percentages (proportions). The chi-square goodness-of-fit test in SPSS Windows is considered a nonparametric test since it does not focus directly on one or more means.
Suppose we want to test the null hypothesis that the politcal party affiliations of voters in a certain state are 30% republican, 30% democrat, 20% independent, and 20% other against the alternative hypothesis that at least one of the percentages is significantly different from its corresponding hypothesized value. (We assume that our data represents a random sample of voters.) Select the Statistics>Nonparametric Tests>Chi-Square options from the main menu. In the Chi-Square Test dialogue box which is displayed, you should see a list of the variables on the left. Select polparty, and click on the arrow button pointing toward the Test Variable List section of the dialogue box. To enter the hypothesized percentages, click on the Values slot in the Expected Values section of the dialogue box. Type 30 in the Values slot and click on the Add button; then, type 30 again in the Values slot and click on the Add button; now type 20 in the Values slot and click on the Add button; finally, type 20 again in the Values slot and click on the Add button. The order in which these hypothesized percentages are entered must correspond to the order of the codes used for the different categories. (Recall that the integers 1, 2, 3, and 4 were used respectively as the codes for republican, democrat, independent, and other.) Click on the OK button, after which an SPSS output viewer window containing the results of the chi-square test is displayed. The results displayed consist of a list of the categories with corresponding codes, expected frequencies, and residuals; and the value of the chi-square statistic together with its degrees of freedom and P-value. (Note: this chi-square test is appropriate only if all expected frequencies are greater than 1 and if most of the expected frequencies are greater than 5; a warning message will be printed when this is not the case.)
Using the Window option from the menu, return to the window containing the data.
Example #7 for section VI. The chi-square test in a two-way contingency table is used for deciding whether or not two categorical variables are independent. Suppose we want to test the null hypothesis that type of job and smoking habits are independent among employees at a large corporation using a random sample of 250 employees for which the following data are observed:
| Nonsmoker | Light Smoker | Heavy Smoker | |
| Blue Collar | 52 | 12 | 49 |
| White Collar | 48 | 63 | 26 |
It is convenient to have access to the data file from which this table is generated. However, if this data file is not available, then we need to enter data which will generate this table. A new data file is opened as soon as you enter SPSS; if you are already in SPSS and need to start a new data file, you can use the File>New>Data options to accomplish this. Recall that data files are organized so that rows represent different cases (i.e., different items in a sample), and columns represent different variables. Each column of a blank data file is named var.
To name and define categories for the first variable, double click on the name var in the first column after which the Define Variable dialogue box will be displayed. In the Variable Name slot of this dialogue box, type the name jobtype. To define the labels for the two categories of this variable, click on the Labels button after which the Define Labels dialogue box will be displayed. Note that there is a Value slot and a Value Label slot in the Value Labels section of this dialogue box.
In the Value slot type a 1, and in the Value Label slot type Blue Collar. Then click on the Add button, and observe that 1.00="Blue Collar" appears in the section at the bottom of the dialogue box. Now, type a 2 in the Value slot, and type White Collar in the Value Label slot. Then click on the Add button, and observe that 2.00="White Collar" appears in the section at the bottom of the dialogue box. Click on the Continue button to return to the Define Variable dialogue box, and then click on the OK button to return to the data window. You should now see that the first column has been renamed jobtype.
To name and define categories for the second variable, double click on the name var in the second column after which the Define Variable dialogue box will be displayed. In the Variable Name slot of this dialogue box, type the name smoking. To define the labels for the two categories of this variable, click on the Labels button after which the Define Labels dialogue box will be displayed. In the Value slot type a 0, and in the Value Label slot type Nonsmoker. Then click on the Add button, and observe that .00="Nonsmoker" appears in the section at the bottom of the dialogue box. Now, type a 1 in the Value slot, and type Light Smoker in the Value Label slot. Then click on the Add button, and observe that 1.00="Light Smoker" appears in the section at the bottom of the dialogue box. Finally, type a 2 in the Value slot, and type Heavy Smoker in the Value Label slot. Then click on the Add button, and observe that 2.00="Heavy Smoker" appears in the section at the bottom of the dialogue box. Click on the Continue button to return to the Define Variable dialogue box, and then click on the OK button to return to the data window. You should now see that the first column has been renamed smoking.
In the contingency table displayed earlier, we see that there are 52 blue collar nonsmokers. In the first cell of the jobtype column enter a 1, which should display as Blue Collar, and in the first cell of the smoking column enter a 0, which should display as Nonsmoker; however, you may have to increase column widths to see the labels completely. Now, click on the number 1 which is the label for the first row, and observe that the data in the first row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 2 which is the label for the second row down to the number 52 which is the label for the 52nd row, and select Edit>Paste from the menu. You should now observe that 52 lines of data, each representing a blue collar nonsmoker, have been entered.
In the contingency table displayed earlier, we see that there are 12 blue collar light smokers. In the 53rd cell of the jobtype column enter a 1, which should display as Blue Collar, and in the 53rd cell of the smoking column enter a 1, which should display as Light Smoker. Now, click on the number 53 which is the label for the 53rd row, and observe that the data in the 53rd row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 54 which is the label for the 54th row down to the number 64 which is the label for the 64th row, and select Edit>Paste from the menu. You should now observe that 12 lines of data, each representing a blue collar light smoker, have been entered.
In the contingency table displayed earlier, we see that there are 49 blue collar heavy smokers. In the 65th cell of the jobtype column enter a 1, which should display as Blue Collar, and in the 65th cell of the smoking column enter a 2, which should display as Heavy Smoker. Now, click on the number 65 which is the label for the 65th row, and observe that the data in the 65th row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 66 which is the label for the 66th row down to the number 113 which is the label for the 113th row, and select Edit>Paste from the menu. You should now observe that 49 lines of data, each representing a blue collar heavy smoker, have been entered.
At this point, since you have put a fair amount of work into this data file, it is a good idea to save what has been done thus far. In order to save this data file, use either the File>Save Data options or the File>Save As options. Suppose you decide to name the data file jobsmoke.sav. (Recall that .sav is the default extension for data files in SPSS.) Since this is the first time you are saving this data file, no previous version exists; as a result, you must specify both the drive (the a: drive, for example) and the directory. You can use the mouse to click on the appropriate options in order to select the drive and directory, or you can just type the appropriate name in the slot. Type a:\jobsmoke.sav, and click on the Save button to save the data in the main directory of the disk in the a: drive.
In the contingency table displayed earlier, we see that there are 48 white collar nonsmokers. In the 114th cell of the jobtype column enter a 2, which should display as White Collar, and in the 114th cell of the smoking column enter a 0, which should display as Nonsmoker. Now, click on the number 114 which is the label for the 114th row, and observe that the data in the 114th row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 115 which is the label for the 115th row down to the number 161 which is the label for the 161st row, and select Edit>Paste from the menu. You should now observe that 48 lines of data, each representing a white collar nonsmoker, have been entered.
In the contingency table displayed earlier, we see that there are 63 white collar light smokers. In the 162nd cell of the jobtype column enter a 2, which should display as White Collar, and in the 162nd cell of the smoking column enter a 1, which should display as Light Smoker. Now, click on the number 162 which is the label for the 162nd row, and observe that the data in the 162nd row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 163 which is the label for the 163rd row down to the number 224 which is the label for the 224th row, and select Edit>Paste from the menu. You should now observe that 63 lines of data, each representing a white collar light smoker, have been entered.
In the contingency table displayed earlier, we see that there are 26 white collar heavy smokers. In the 225th cell of the jobtype column enter a 2, which should display as white Collar, and in the 225th cell of the smoking column enter a 2, which should display as Heavy Smoker. Now, click on the number 225 which is the label for the 225th row, and observe that the data in the 225th row have been selected. Then, select Edit>Copy from the menu. Next, click and drag from the number 226 which is the label for the 226th row down to the number 250 which is the label for the 250th row, and select Edit>Paste from the menu. You should now observe that 26 lines of data, each representing a white collar heavy smoker, have been entered.
Use either the File>Save Data options or the File>Save As options to save the data. If you use the File>Save Data options, your copy of the file jobsmoke.sav will immediately be updated (and you should see the light for the a: drive go on as this occurs). If you use the File>Save As options, you must specify the a: drive, and you must enter the desired file name (both of which can be accomplished either by using the mouse to select the appropriate choices or by typing A:\JOBSMOKE.SAV). You must then click on the Save button after which you will be asked whether or not you wish to replace the earlier version of jobsmoke.sav with the updated version; by clicking on Yes button, the data file will be saved.
To create the contingency table displayed earlier and perform the chi-square test, select the Statistics>Summarize>Crosstabs options from the main menu. In the Crosstabs dialogue box which is displayed, you should see a list of the variables on the left. Select jobtype, and click on the arrow button pointing toward the Row(s) section of the dialogue box; then, select smoking, and click on the arrow button pointing toward the Column(s) section of the dialogue box. Next, click on the Statistics button to enter the Crosstabs: Statistics dialogue box. Select the Chi-square option in the upper left corner of the dialogue box, and click on the Continue button. Click on the Cells button to enter the Crosstabs: Cell Display dialogue box. Note that the Observed option has been selected and that other options concerning the display of expected values, percentages, and residuals are available. Select the Row options in the Percentages section of the dialogue box, and click on the Continue button. Now, click on the OK button, after which an SPSS output viewer window is displayed. The results displayed in the output window consist of a contingency table containing observed frequencies and row percentages along with marginal totals and marginal percentages, results for three chi-square tests, and the minimum expected value. The first chi-square statistic for which results are displayed is the Pearson chi-square; the value of this statistic is displayed together with its degrees of freedom and P-value. The Pearson chi-square is a popular test statistic often used when considering the relationship between two categorical variables; however, this statistic is not easily generalized to situations involving more than two variables. The second chi-square statistic for which results are displayed is the likelihood ratio chi-square; the value of this statistic is displayed together with its degrees of freedom and P-value. The likelihood ratio chi-square statistic is derived using a different approach than that from which the Pearson chi-square statistic is derived, but the P-values for the two statistics are almost always in close agreement; however, the likelihood ratio chi-square statistic is much more easily generalized to situations involving more than two variables than is the Pearson chi-square statistic. (Note: Both of these chi-square tests are appropriate only if all expected frequencies are greater than 1 and if most of the expected frequencies are greater than 5; a warning message will be printed when this is not the case.)
Using the Window option from the menu, return to the window containing the data.
You may use the File>Exit SPSS option to exit from SPSS.