Here are four useful tips for writing shorter, more efficient SPSS syntax.
1. A simpler way to calculate scale totals.
I often need to calculate a total score for questionnaires with multiple items. For example, I might ask participants to answer ten different questions, responding to each question using a scale of 1 (strongly disagree) to 5 (strongly agree). In particular, I’ll often want to calculate an average of all ten items to use in statistical analyses. I used to calculate scale totals using the following SPSS syntax:
[box] COMPUTE vartotal = (var1 + var2 + var3 + var4 + var5 + var6 + var7 + var8 + var9 + var10) / 10.
EXECUTE.[/box]
So, this would create one new variable “vartotal” which would be the average of all 10 items. A quicker way to do this would be:
[box]MEAN(Var1 TO Var10).
EXECUTE.[/box]
There are two important caveats to keep in mind when using the quicker syntax.
First, variables need to be arranged side-by-side in columns in your database for the “TO” command to work properly. In the above example using the TO command, the syntax takes the average of Var1, Var10 and every variable in between. So if there were other, unwanted variables in between Var1 and Var10 in your dataset (e.g., maybe it went var1, var2, var3, sex, var4 …), SPSS won’t know that you didn’t want those extra variables, and will just average them all together.
Second, these two approaches handle missing data in a slightly different way. The first example I provided will return a “system missing” value for vartotal if there is ANY missing data on any of the 10 individual items. In contrast, the second shorter syntax example will report the mean of all existing variables (e.g., if you were missing a value for var5, SPSS would add the remaining 9 items together and divide by 9). Depending on how you plan on dealing with missing data, this could be undesirable.
2. A shorter way to reverse-score items
Another thing I often need to do when working with questionnaires is reverse-scoring. For example, I might have these two items:
“Is talkative”
“Tends to be quiet”
These two items are measuring the same thing (Extraversion), but are worded in the opposite way. If I want high values of the total score to indicate high levels of Extraversion, I would reverse code “tends to be quiet” so that low values are now high, and vice versa. So, assuming that this was measured on a 9-point scale from 1 (strongly disagree) to 9 (strongly agree), one way to do this would be:
[box]RECODE var1 (1=9) (2=8) (3=7) (4=6) (5=5) (6=4) (7=3) (8=2) (9=1) INTO var1_r.
EXECUTE.[/box]
That can be a little tedious to write out, so an alternative would be the following:
[box]COMPUTE Var1_r=ABS(Var1 – 10).
EXECUTE.[/box]
In this syntax, I take the absolute value of Var1 – 10. You will always subtract a number 1 higher than the highest possible value on your scale.
3. Saving a smaller datafile with only a subset of variables
If you’re working on really large datasets, sometimes you want to create a dataset that contains only a handful of variables that you’re interested in (e.g., the full dataset has 1000 variables, but you only care about 5 of them). There’s a very simple bit of syntax that will let you do this with ease:
[box]SAVE OUTFILE=’C:\Users\Sean Mackinnon\Desktop\small_data.sav’
/KEEP= var1 var2 var3 var4 var5
/COMPRESSED.[/box]
This will create a new datafile that contains only the five variables you specified, deleting all the rest. I find this to be very useful when dealing with enormous datasets.
4. The COUNT command: Counting the number of instances of a particular value
Occasionally, I need to count the frequency of a particular response. For example, when measuring alcohol consumption, I might have 7 variables: drinkday1 TO drinkday7. Each of these variables indicates how many alcoholic beverages a person had on a particular day.
What if I want to know how many days participants had did not drink at all? This can be easily done with the COUNT command in SPSS:
[box]COUNT drinkfreq = drinkday1 TO drinkday7 (0).
EXECUTE.[/box]
The above syntax will look at all seven days (i.e., drinkday1 TO drinkday7), and count the number of “0” values for each participant. So if a single participant had these values:
drinkday1 = 1
drinkday2 = 0
drinkday3 = 0
drinkday4 = 0
drinkday5 = 7
drinkday6 = 2
drinkday7 = 3
The above syntax would report a value of “3” because on three of those days, the participant had zero drinks.
What if I want to know how many days participants had at least one drink? We could accomplish this with similar syntax:
[box]COUNT drinkfreq = drinkday1 TO drinkday7 (1 THRU 100).
EXECUTE.[/box]
In this case, we’re counting all the instances of values from 1 to 100 (assuming that nobody has more than 100 drinks in a day!). So using the same data as above, this time the count command would produce a value of “4.” The count command is pretty flexible, and is useful for this kind of problem.
Hopefully you find some of these useful! Feel free to post a comment if anything is unclear.
[facebook]