Categories
Tutorials

Converting an SPSS datafile to Mplus format

Converting an SPSS datafile into a format readable by Mplus

Mplus  is a fabulous statistical program. It’s very flexible, and is my favorite program to use when I need to analyze data using structural equation modeling – and I definitely prefer it over AMOS software. The latter is easier to use because of the graphical user interface (GUI), but I often find myself running into software limitations (e.g., AMOS cannot use bootstrapping when there is missing data) and in complex models, I often find the GUI tends to get clunky, and visually cluttered. This said, Mplus is not terribly user-friendly for new users – despite having an extensive discussion board of answers to various problems.

Much of my initial training – like many in psychology – was running statistics using SPSS software. SPSS has the advantage of being very user friendly, but moving to a syntax-based coding language like the one used by Mplus can be daunting at first. When I was first trying to figure out Mplus for myself during graduate school, I immediately ran into a problem: The datafile I had was not properly formatted for Mplus. Since (at the time) I had been mostly working with SPSS software, my datafile was in .sav format (the proprietary format of SPSS). Before I could get started, I needed to convert the file into format understandable by Mplus. Sounds simple, right? Well, it is actually. But the problem is that there is a LOT of documentation on Mplus, and finding precisely what needs to be done to your dataset to get started isn’t immediately apparent.  With this in mind, I’m going to present three simple steps to convert your SPSS datafile into a form readable by Mplus.

Step 1: Make sure missing values are indicated by a specific value

If you’re an SPSS user, you may be used to leaving missing values as “blanks” within SPSS itself. What may not be immediately apparent is that SPSS still needs to indicate missing values with a character of some sort. Specifically, SPSS actually fills in any blanks with a period (.) by default, and designates all periods as a piece of missing data. If you look closely at your SPSS datafile when it’s open, you can actually see the periods filled in all for the blanks.

Unfortunately, Mplus doesn’t like it when you use periods as the symbol for missing data. Even though Mplus can ostensibly use periods as missing data indicators, I would recommend that you pick some other number to represent missing data. When I was first working with Mplus using periods as missing data indicators, I kept getting incredibly uninformative error messages (or alternatively, the program would sometimes instead read the data incorrectly without giving an error message) which I eventually figured out was being caused by having my missing values represented by a period, as is default in SPSS. I usually use “999” to represent missing data instead. You can replace all the periods with “999’s” this very easily in SPSS using the following syntax:

[box] RECODE var1 var2 var3 var4 var5 (SYSMIS=999) (ELSE=COPY). EXECUTE.[/box]

Step 2: Rename variables to be 8 characters or less

Though this is technically optional, Mplus will truncate all variable names to 8 characters in your output.  So unless you want to be really confused later when running your analyses, I recommend that you assign new variable names to all your variables that 8 characters or less. For example, if your variable was “self_esteem_academic,” Mplus would shorten that to just “self_est” in the output. A better variable name might be something like “se_a.” In case you want to do this multiple times, you might write syntax to do this instead of changing all the variable names manually in the variable viewer:

[box] RENAME VARIABLES (longvariable1 longvariable2 = var1 var2). EXECUTE. [/box]

Step 3: Convert the file into fixed-format ASCII

For Mplus to work its magic, your datafile needs to be in fixed-format ASCII.  All you really need to know is that fixed-format ASCII files have the data arranged in columns with fixed sizes so that every record fits into a standard form (as opposed to, say, comma-delimited format, where each field is separated or ‘delimited’ by a comma). To convert an SPSS file (.sav) into fixed-format ASCII, first go into “variable view” and make sure that the “columns” and “width” columns in SPSS are all the same number. This is going to determine the space in between columns. If you were to pick a number like “12” it should be good for most purposes (unless you have very large numbers, or need many decimal places of precision).  Instead of doing this manually, there is a straightforward kind of syntax that can alter the column widths of all your variables:

[box]*f = numeric, 12 = column width, .0 = decimals) ALTER TYPE var1 TO var10 (f12.0). EXECUTE. [/box]

After you do this, open up your SPSS file and run the following syntax:

[box] WRITE OUTFILE=’C:\FileLocation\datafile_formplus.dat’ TABLE /ALL. EXECUTE.[/box]

Yup, it’s that straightforward. Before getting too far into your analyses, I would also recommend that you do some basic diagnostics by running simple analyses in both programs (e.g., checking means and standard deviations in SPSS and Mplus) to make sure that the conversion worked as expected. Note also that a fixed-format ASCII file doesn’t have variable names listed on the top! They will be in the same order as they were in the SPSS file, but this is another area where you might get confused when starting to run analyses (in every Mplus syntax file, you will list all the variables in order; if you make a mistake in that list though, your analyses will be wrong!). Aside from that though, you should be good to start analyzing data in Mplus!

****Update: Feb 16, 2015****

A reader helpfully pointed out that in version SPSS version 22, there is a problem that requires an additional step. For some reason, version 22 adds some nonsense characters to the beginning of the file that prevents Mplus from reading it. In order to work around this, you will have to open up the saved datafile in the Mplus Editor, and delete the characters manually. Annoyingly, these characters won’t show up if you open the datafile in notepad, excel, or SPSS, so you have to open it in the Mplus editor to find and delete them! Below is a picture showing the problem, and indicating what characters you need to delete. This should only be required if you have SPSS version 22, earlier versions do not require this workaround — when I originally wrote this tutorial, I used SPSS 20, which didn’t have this problem!

mplus.character

[facebook]

21 replies on “Converting an SPSS datafile to Mplus format”

Which SPSS version do you use?
I am running the syntax with SPSS22 and it doesn’t work properly.
A .dat-file is created but MPLUS still can’t read it.

Hi Oliver. I used SPSS version 20.0, but that shouldn’t make any difference for the syntax. Could be that some other thing went wrong. Mplus can’t read string variable (e.g., if you have letters or some other non-numeric character) so maybe that’s it? Otherwise, hard to diagnose without an error message or looking at the datafile/syntax you wrote.

Hi Sean,

thanks for the reply. It could have something to do with the SPSS version. I found out that SPSS can’t create a proper .dat-file (in the version I am using at least). It adds apparently some bogus characters at the beginning of the file, which make it impossible for MPLUS to read the file. Hence you need to remove those additional characters manually.

The problem has been discussed also here:
http://www.statmodel.com/discussion/messages/12/13334.html?1393010770

Another problem can occur if you are using a large file with many variables. The .dat file with 500 Variables which was produced by SPSS was kind of skewed. Once I cut that number to 100 variables, it worked.

All in all, I find it very difficult to make use of SPSS-files in MPLUS. But otherwise, I agree with you that MPLUS gives you many more options than AMOS. So I am not looking back 🙂

Huh, well leave it to IBM to ruin a good thing they had going in previous versions of the program :p Thanks for the info and the workaround. I also usually keep datafiles small when working in Mplus. Seems the biggest potential for error using Mplus is in this datafile creation process, honestly.

Hi, I’m wondering if the syntax would apply to Mac users too? I am getting an error message that reads “SPSS Statistics cannot access a file with the given file specification. The file specification is either syntactically invalid, specifies an invalid drive, specifies a protected directory, specifies a protected file, or specifies a non-sharable file. Execution of this command stops.”

For Macs, file locations are a bit different than PCs. Omit the “C:\” and the “\” symbols should instead be “/” symbols when specifying where you want to save the file. That could be the issue.

Otherwise, you can always try using the GUI (point and click interface) with:

File –> Save as –> Then on the dropdown menu for “Save as Type” click on fixed ASCI –> then click save.

For the last step, you could alternatively click “paste” and it’ll save the syntax for you to look at.

Hi,
Wondering if you can help me. I am using SPSS 22 as well and trying to export the data for MPlus. I actually exported this same dataset a couple of years ago but now have added in an extra variable to spss for some additional analysis. I have created the .dat file and mplus can read it fine. The only problem is that when re-running the analysis which I previously ran (I.e without the new variable included) I am getting a slightly different result to that which I got a couple of years ago. I can’ think why this could be..Any ideas?
Thank you

Without seeing the data, can’t be 100% sure, but the most likely answer is that you are reading the wrong variable in. So the list of variable names you put after “NAMES ARE” in the syntax is essentially you putting labels on the columns. This is because the fixed ascii format doesn’t have any variable names / column names attached. Thus, the variable names have to be in *exactly* the same order as they were in the SPSS datafile, or you’ll accidentally be running the analysis on the wrong variable.

I’d start by looking at the descriptives (means, SDs) of the variables in Mplus, and comparing to SPSS output to make sure you’re actually selecting the variables you think you are.

Hi,
So prior to changing the spss file into a mplus format, I did multiple imputation to account for missing data. When I open the file up in mplus, each imputed value has a huge number of decimal places which I don’t think can be read and skews the columns. Is there a way to make these decimals into whole numbers? And would this need to be done in mplus or spss?

Thanks for your help.

Hi Elle,

I would not make the decimal numbers into whole numbers, as that would undermine the value of the multiple imputation. Try doing the same steps, but saving as a comma delimited file (.csv). That might help. Alternatively, maybe changing the number of decimal places in the variable view in SPSS before converting the file might help.

Hello
I have followed the steps and am just getting error messages.

WRITE OUTFILE=’C:\testing’ TABLE /ALL. EXECUTE.

>Warning # 206 in column 20. Text: \
>An invalid character has been found on a command.

>Error # 4702 in column 19. Text: :
>An unexpected symbol was encountered on an output command.
>Execution of this command stops.

>Error # 4700 in column 21. Text: testing’
>An output command contains an unrecognized keyword. The recognized keywords
>are OUTFILE, RECORDS, TABLE, NOTABLE ENCODING and BOM.

>Error # 34 in column 15. Text: ’C
>SPSS Statistics cannot access a file with the given file specification. The
>file specification is either syntactically invalid, specifies an invalid
>drive, specifies a protected directory, specifies a protected file, or
>specifies a non-sharable file.

>Error # 4713. Command name: WRITE
>No variables or strings are specified. If you intended to write empty
>records, use the PRINT SPACE command.

Hi Clare. Try running SPSS in administrator mode (if a PC). There look like other errors in your code too, but that’s a starting place.

Hello! I was wondering if you help me… I am trying to use a fixed format dataset with missing vales defined as 99 for mplus to read in testing a SEM. He does read it, but the means are completly different from what I get in SPSS.

I am defining the format in the data option in the mplus (i.e., 22F8.2 – for 22 variable, in colums with 8 caracteres, two of which are after de decimal point).

And I am defining missing in the variable option in mplus (i.e., missing = all (99)…

Any thoughts?

Thank you

Hi Paula. Most liekly, your list of variables in the VARIABLE command in the Mplus syntax is in the wrong order. The order has to be exactly the same (with exactly the same number of variables) or the procedure I describe will be off. The datafile you create for Mplus has no variable names in it; you create the variable names with the VARIABLE command in Mplus.

To troubleshoot, I would write down the mean you’re getting in Mplus, then calculate means for all the variables in your SPSS file and compare them, to see what mean you are actually getting. Probably, you’re getting the mean for the wrong variable in the Mplus file because your variable list is out of order. If that’s the case, the mean you’re getting in Mplus will correspond to some other variable in the SPSS file.

Alternatively, the “99” values might not be properly counted as missing in either the SPSS or Mplus file.

Hi! Just a quick note – the extra nonsense values that occur when creating a .dat file in SPSS also occurs when using SPSS 19.

Hi! I just gotten the mplus software. I need help,especially the steps involved in the analysis. That is from importing data from the excel or SPSS to the final output using any analysis example (regression, factorial etc)

Good evening. The original dataset I am using is from a national study. I’m wondering if it is necessary to pair the file down before saving to .dat file. For example, to have only variable name, label and value? I’m thinking it is necessary to leave label so that plus knows it is categorical.

Yes, you still need to pare it down. You can’t have the variable names at the top. In mplus this part specifies the variable names:

VARIABLE: NAMES ARE

You can’t have labels in mplus either. If your dependent variable is categorical, you can use:

CATEGORICAL ARE

So that Mplus knows which it is.

Leave a Reply to CC Cancel reply

Your email address will not be published. Required fields are marked *