LIS397.1:
Introduction to Research Methods in LIS
Survey
Data and the Use of Chi Square
Maximum points on final exam: 25
Due: May 3rd (at
the time of the final)
Analysis of Survey Data
Most surveys yield variables that are of differing levels of measurement--some are ratio level like age of respondent, some are nominal like gender of respondent. Some Likert-type opinion questions can be considered either ordinal or interval. Also, each question on a survey can yield at least one variable (because the respondents answers will vary), and a lengthy survey can yield a large number of variables. For this reason, a computerized package, like SPSS, is ideal for the analysis of survey data.
For your chi square assignment, you will have access to the survey data you helped to collect. The survey returned over 180 responses and so you should have little trouble with cell counts of less than 5, but watch for those statements on the SPSS printout.
Also included as
separate links are:
1.
a copy of the survey
2.
a copy of the codebook (the codebook
explains how the data were coded from each survey questionnaire).
Objective: The objective for this exercise is to acquaint you with a bit more of the power of SPSS and its hypothesis testing capabilities using the chi square test of independence.
Your assigned task: After reading the questions on the survey, form a tentative research hypothesis. (“I wonder if … there is a relationship between the respondents’ enrollment status (freshman, sophomore, etc.) and their degree of web search skill?”)
Use two separate questions from the survey for each hypothesis you consider, but you can use some questions more than once. Enrollment status have a relationship with likelihood for using the web from on/off campus locations, etc. Each hypothesis should be a stated relationship between two questions (variables).
Once you have found the variables/question(s) that correspond to the elements of your hypothesis. Ask yourself if a chi square test is appropriate (that depends on the level of measurement used for the questions’ response set. All of the survey questions ARE appropriate for chi square, but just don’t imagine that ALL surveys yield questions that are appropriate for chi square. If we had the respondents age (ratio level), we could do a t-test to see if their ages are significantly different when considered in two separate groups, like those who access UTLOL and those who don’t/haven’t.
After running your analysis (more on how to do that below) with SPSS:
Write up the results of your hypothesis test using the 4-Step Method. For the full 25 points you must test five different research hypotheses. And you must endure a bit of SPSS agony....more on that, too, in a moment.
Remember to attach the printout for your test to each of your write-ups. When you would include the calculation in Step 3, you can write “From the print out the calculated chi square is XXX. The assessment of your effort will include:
1. the “sensible-ness” of your hypothesis,
2. the correctness of your write up,
3. the correctness of you final decision: to reject the null or to fail to reject the null, and also
4. the ease with which the reader can find the various steps and elements of your write up.
In order to accomplish this assignment, a few more SPSS screens must be navigated. On the attached pages, I have tried to provide a bare outline of the menus and choices to be made. Hopefully, there will be time for another in-class demonstration.
SPSS Tutorial
The datafile will be provided on the machines that have SPSS for Windows. The file should appear in the default folder, called SPSS. Data files have the extension .sav to indicate a saved data file. Output files are given the extension .spo and there should not be any output files saved to the SPSS directory. If you wish to save yours, place them on an A diskette. The SPSS folder is the default choice when you initially enter SPSS and indicate that you wish to open an existing data file. It should be clearly labeled to indicate the Student Survey data.
Note: Please be careful about saving copies of your executed SPSS sessions to the SPSS directories on the hard disks in the Lab. Sometimes these files can be very large. In saving them you may inadvertently overwrite existing data files. Try to bring your own floppy and copy items you wish to save as you work to the A: drive.
To Calculate a ChiSquare Value
If you identify two variables from the survey that are nominal level or ordinal level as they stand, you can continue as follows:
Choose to open an existing data file from the initial SPSS screen.
Once you have selected the data file, you will find yourself looking at the coded data page, also called the Data Editor screen.
Menus for running the chi square and creating the crosstab tables:
ANALYZE…
...DESCRIPTIVE
STATISTICS
...CROSSTABS
From the listing of the variables on the resulting screen, choose one for the ROW variable and one for the COLUMN variable. First you must click on the name of the variable and then on the arrow opposite the box for ROW and the box for COLUMN.
Next, at the bottom of this screen you will see three buttons...
STATISTICS....allows you to choose the chisquare test, you can ignore the others;
CELLS...allows you to tailor what gets reported within each cell in the crosstabulation. Choose to display at least OBSERVED and EXPECTED. If you were actually writing a report on the data you might want to know about the percentages that each frequency represents out of the COLUMN total or the ROW total or out of the Grand TOTAL, but too many numbers in the cell can also be confusing at this stage.
FORMAT...should be fine with the defaults as they are set.
Finally, to run the chisquare and create the tables, click on OK.
NEXT, examine the resulting table. Notice the number of cells that have expected counts of less than 5. These are extremely important to the validity of the calculated chi square. I am asking that you try to reduce the percentage of low expected counts/cells to less than 25% or as close to that as you can come. Obviously, if a variable that has 2 levels is crossed with another that has only 2 levels, there is little you can do to improve that situation, if there are low cell frequencies.
WHY? The problem with the chi square test is that expected values of less than 5 in a cell can cause the computation of the test statistic to be weakened. Most computerized packages will provide you with an indication of how many cells contain lower than 5 as an expected value. If you have more than 25% of your cells with expected counts of less than 5, then you will have to collapse or recode some of the levels of the variable, essentially pooling the observed frequencies for several levels of the variable and place them together into some acceptable existing level. To do this, you need to know how to transform the data using SPSS....
To Collapse or Recode Cells Using the Transform Menu
SPSS is remarkably good at the manipulation of existing data. Some packages will not let you create new variables from your existing variables very easily, but SPSS does....even if that means that you increase the number of observations in some way. The TRANSFORM Menu on the DATA Editor Screen contains the item, Recode. When you go to that option you also get two choices—“into Same Variable” or “into New Variable.” Please choose the “into Same Variable” option. On the next screen choose the Old and New Values dialog box.
Note the Old Value side and the New Value side. Most often you will simply fill in the Old Value (from your first trial effort where there are too few responses) and change it to (or collapse it into) another response value in the New Value box. We have already coded the missing values, that is, values where there was no response from some of the respondents, as a System-Missing item. This will mean that SPSS will not try to use these in the calculation of the chi square value at all.
TRANSFORMATION
...Recode --> Into Same Variable (again be sure not to save your recodes
to the original data set when you are through!)Choose the variable you want to recode using the arrow box >
Then choose the Old/New... button
Once you have filled in the Old Value box and the New Value box, click on the Add button… You will then see your recode choices appear in the Old -> New box.
You can continue to make other recode decisions, pressing Add after each.
When you have finished, click on the Continue button at the bottom. To accomplish the recoding task, click on OK… You will see the changes fill various cells in the Data Editor.
Then go back to ....
ANALYZE
...DESCRIPTIVE
STATISTICS
...CROSSTABS
Click on OK to rerun with recoded
values
Previous variable names should still be shown on the screen from your previous run. .
As you can see, the real power of SPSS or any computerized package is the power to let you ask questions of the data as you are running an analysis. We did not have much time in class to become familiar with SPSS, nevertheless do spend some effort getting comfortable, if you have the time and interest. SPSS contains a tutorial, for example and some other sample data sets. Also, please compose your hypotheses and the possible recoding you might need to do before you reach the keyboard! You do not need to find a significant (reject the null) relationship in order to be considered successful. As someone must have said to you, “No is also an answer.”
Your submission should have one page for each hypothesis and include:
Step 1) Stated Hypotheses. Also, include mention of which Survey questions you are selecting with their accompanying variable names used to test the hypotheses.
Step 2) Conditions of test. Also include a clear description of any recoding you had to do.
Step 3) Calculation of chisquare, just write “from the printout” and give the calculated Pearson chi square from the printout. (I don’t want to have to hunt for it.)
Step 4) Conclusion (reject or fail to reject the null) and some brief re-statement or interpretation.