In a computerized statistical analysis, descriptive statistics serve two purposes. The first is to describe the data, especially on those variables that will not be a part of the inferential statistical analyses. These might include the demographic characteristics of the sample, correlations of dependent variables with other variables measured, and the characteristics of the dependent variables themselves. The second purpose is to find evidence of errors in the data entry process. No matter how diligent you are in checking your data during the entry process, it is relatively easy to input a data point incorrectly. In order to be successful at spotting these data entry errors before doing the inferential statistics, you must be familiar with the data and the variables being measured. Checking the maximum and minimum scores will often help you spot errors, such as by finding a score that is out of range (i.e., larger or smaller than it could possibly be). Of course, you must know what the largest and smallest possible scores could be to make this strategy work. Also look for scores that are highly unlikely although technically possible. If they show up, check the original data to make sure that such scores actually exist. Check to see that the mean is close to the middle of the numbers you remember seeing for a variable during the data collection and entry processes. Be particularly careful in checking variables that you may have computed--either before data entry or as part of the statistical analyses. Errors in the computations or the formulas given to the computer program are easy to make and will often result in clearly wrong answers that can be easily spotted if you are looking for them.
In this section we will show you how to set up some basic descriptive statistics for (1) categorical data and (2) score data. In each section, we will show how to generate both descriptive statistics and appropriate graphs.
Our examples will draw heavily on the data set entered previously and shown in Table 5.2 of the text. Before we do the analyses, we must select the data file that was previously prepared and saved by selecting the File menu, the Open submenu, and the Data choice on the Open submenu, which will give us this screen. We then select the file and click on OK to open it for the SPSS for Windows program.
Categorical data represents a classification of subjects, and the appropriate summary statistics are frequencies. We compute summary statistics for categorical data in SPSS for Windows by selecting the Statistics menu, the Summarize submenu, and the Frequencies choice, which gives us this screen. We want to compute frequencies for the two categorical variables ("sex" and "polaffil"). To do so, we highlight each of these variables in turn by clicking on them in the left box and moving them to the right box by clicking on the arrow button between the boxes. If we change our mind, we can move the variable back to the left box in the same manner. Once both variables have been moved, we click on OK and the analysis is run, producing the output shown in this screen. This output lists both the frequency and percent of subjects in each category. Notice that on the left side of the screen is a subdirectory structure. This structure controls access to different sections of complex statistical analyses. Like any data file, the output file can be saved using the save command on the File menu. It can also be printed by using the print command from the File menu.
Sometimes we want to tabulate frequencies for joint categories (e.g., female independents). To do this we use a procedure called crosstabs (short for cross tabulation). We select the Statistics menu, the Summarize submenu, and the Crosstabs option, which will give us this screen. To do a cross tabulation of sex by polaffil, we move one of these variables to the box marked "row(s)" and the other the box marked "column(s)" and press the OK button. This will produce the output shown here.
Finally, if we wanted to graph the data with a histogram, we select the Graph menu, the Bar submenu, and then click on the Simple icon and the Define button (in that order), which gives us this screen. To produce a graph of the frequencies of political affiliations, we move the variable polaffil to the Category Axis box and click on OK. This produces the Bar graph shown in this screen.
Descriptive statistics for score data involve more than just the frequencies of each score. We can produce such a frequency distribution if we desire by using the procedure described above for obtaining the frequency counts for our categorical variables. If we want additional summary statistics, such as mean and variance, we must use the Descriptives option (under Statistics menu and Summarize submenu). Selecting the Descriptives option will give us this screen. Notice that not all the variables are listed in the left box. Descriptives cannot be run on categorical data, and our alphabetical code for the "sex" and "polaffil" variables implied that these were categorical variables. Hence, they were excluded from the list. We will produce descriptive statistics for the variables "age," "income," and "voted" by moving them from the left box to the right box in the same manner as described previously. We could do the same for the "subject" variable, but since that variable is simply an identification number for each subject, the analysis would have no purpose. The Descriptives option will compute by default the mean, standard deviation, and minimum and maximum scores for each variable that we select. If we want to compute additional summary statistics, we click on the Options button and select the additional summary statistics we want. When we have identified the variables and selected the summary statistics, clicking on OK will run the analyses, producing this output.