Converting an SPSS datafile into a format readable by Mplus
Mplus is a fabulous statistical program. It’s very flexible, and is my favorite program to use when I need to analyze data using structural equation modeling – and I definitely prefer it over AMOS software. The latter is easier to use because of the graphical user interface (GUI), but I often find myself running into software limitations (e.g., AMOS cannot use bootstrapping when there is missing data) and in complex models, I often find the GUI tends to get clunky, and visually cluttered. This said, Mplus is not terribly user-friendly for new users – despite having an extensive discussion board of answers to various problems.
Much of my initial training – like many in psychology – was running statistics using SPSS software. SPSS has the advantage of being very user friendly, but moving to a syntax-based coding language like the one used by Mplus can be daunting at first. When I was first trying to figure out Mplus for myself during graduate school, I immediately ran into a problem: The datafile I had was not properly formatted for Mplus. Since (at the time) I had been mostly working with SPSS software, my datafile was in .sav format (the proprietary format of SPSS). Before I could get started, I needed to convert the file into format understandable by Mplus. Sounds simple, right? Well, it is actually. But the problem is that there is a LOT of documentation on Mplus, and finding precisely what needs to be done to your dataset to get started isn’t immediately apparent. With this in mind, I’m going to present three simple steps to convert your SPSS datafile into a form readable by Mplus.
Step 1: Make sure missing values are indicated by a specific value
If you’re an SPSS user, you may be used to leaving missing values as “blanks” within SPSS itself. What may not be immediately apparent is that SPSS still needs to indicate missing values with a character of some sort. Specifically, SPSS actually fills in any blanks with a period (.) by default, and designates all periods as a piece of missing data. If you look closely at your SPSS datafile when it’s open, you can actually see the periods filled in all for the blanks.
Unfortunately, Mplus doesn’t like it when you use periods as the symbol for missing data. Even though Mplus can ostensibly use periods as missing data indicators, I would recommend that you pick some other number to represent missing data. When I was first working with Mplus using periods as missing data indicators, I kept getting incredibly uninformative error messages (or alternatively, the program would sometimes instead read the data incorrectly without giving an error message) which I eventually figured out was being caused by having my missing values represented by a period, as is default in SPSS. I usually use “999” to represent missing data instead. You can replace all the periods with “999’s” this very easily in SPSS using the following syntax:
[box] RECODE var1 var2 var3 var4 var5 (SYSMIS=999) (ELSE=COPY). EXECUTE.[/box]
Step 2: Rename variables to be 8 characters or less
Though this is technically optional, Mplus will truncate all variable names to 8 characters in your output. So unless you want to be really confused later when running your analyses, I recommend that you assign new variable names to all your variables that 8 characters or less. For example, if your variable was “self_esteem_academic,” Mplus would shorten that to just “self_est” in the output. A better variable name might be something like “se_a.” In case you want to do this multiple times, you might write syntax to do this instead of changing all the variable names manually in the variable viewer:
Step 3: Convert the file into fixed-format ASCII
For Mplus to work its magic, your datafile needs to be in fixed-format ASCII. All you really need to know is that fixed-format ASCII files have the data arranged in columns with fixed sizes so that every record fits into a standard form (as opposed to, say, comma-delimited format, where each field is separated or ‘delimited’ by a comma). To convert an SPSS file (.sav) into fixed-format ASCII, first go into “variable view” and make sure that the “columns” and “width” columns in SPSS are all the same number. This is going to determine the space in between columns. If you were to pick a number like “12” it should be good for most purposes (unless you have very large numbers, or need many decimal places of precision). Instead of doing this manually, there is a straightforward kind of syntax that can alter the column widths of all your variables:
After you do this, open up your SPSS file and run the following syntax:
[box] WRITE OUTFILE=’C:\FileLocation\datafile_formplus.dat’ TABLE /ALL. EXECUTE.[/box]
Yup, it’s that straightforward. Before getting too far into your analyses, I would also recommend that you do some basic diagnostics by running simple analyses in both programs (e.g., checking means and standard deviations in SPSS and Mplus) to make sure that the conversion worked as expected. Note also that a fixed-format ASCII file doesn’t have variable names listed on the top! They will be in the same order as they were in the SPSS file, but this is another area where you might get confused when starting to run analyses (in every Mplus syntax file, you will list all the variables in order; if you make a mistake in that list though, your analyses will be wrong!). Aside from that though, you should be good to start analyzing data in Mplus!
****Update: Feb 16, 2015****
A reader helpfully pointed out that in version SPSS version 22, there is a problem that requires an additional step. For some reason, version 22 adds some nonsense characters to the beginning of the file that prevents Mplus from reading it. In order to work around this, you will have to open up the saved datafile in the Mplus Editor, and delete the characters manually. Annoyingly, these characters won’t show up if you open the datafile in notepad, excel, or SPSS, so you have to open it in the Mplus editor to find and delete them! Below is a picture showing the problem, and indicating what characters you need to delete. This should only be required if you have SPSS version 22, earlier versions do not require this workaround — when I originally wrote this tutorial, I used SPSS 20, which didn’t have this problem!
[facebook]