Simple utility for converting data to/from SPSS data files
The SPSS Converter is a simple utility that facilitates the easy conversion of SPSS data to / from a variety of formats, including:
To install the SPSS Converter, just execute:
$ pip install spss-converter
* Pandas v0.24 or higher
* Pyreadstat v1.0 or higher
* OpenPyXL v.3.0.7 or higher
* PyYAML v3.10 or higher
* simplejson v3.0 or higher
* Validator-Collection v1.5.0 or higher
Why the SPSS Converter?¶
If you work with SPSS data in the Python ecosystem, you probably use a combination of two or three key libraries: Pandas, Pyreadstat, and savReaderWriter. All three libraries are vital tools, incredibly well-constructed, designed, and managed. But over the years, I have found that converting from SPSS to other file formats using these libraries requires some fairly repetitive boilerplate code. So why not make it easier?
The SPSS Converter library is a simple wrapper around the
Pandas libraries that provides a clean and simple API for
reading data files in a variety of formats and converting them to a variety of formats.
The semantics are super simple, and should be as simple as:
Key SPSS Converter Features¶
With one function call, convert an SPSS file into:
With one function call, create an SPSS data file from data in:
With one function call, generate a Pythonic data map or meta-data collection from your SPSS data file.
Decide which variables (columns) you want to include / exclude when doing your conversion.
SPSS Converter vs Alternatives¶
The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that simplifies the syntax for converting between different file formats.
While I am (I think understandably) biased in favor of the SPSS Converter, there some perfectly reasonable alternatives:
Obviously, since the SPSS Converter is just a wrapper around Pyreadstat and Pandas, you can simply call their functions directly.
Both libraries are excellent, stable, and use fairly straightforward syntax. However:
using those libraries directly does double the number of function calls you need to make to convert between different data formats, and
those libraries (and Pyreadstat in particular) provide limited validation or Pythonic object representation (less “batteries included” in its syntactical approach).
Of course, these differences are largely stylistic in nature.
When to use it?
Honestly, since initially building this wrapper I rarely use Pyreadstat and Pandas directly. Mostly, this is a matter of syntactical taste and personal preference.
However, I would definitely look to those libraries directly if I were:
writing this kind of wrapper
working in older versions of Python (< 3.7)
working with other formats of data than SPSS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
from spss_converter import read, write # SPSS File to CSV read.to_csv('my-spss-file.sav', target = 'my-csv-file.csv') # CSV to SPSS File write.from_csv('my-csv-file.csv', target = 'my-spss-file.sav') # SPSS File to Excel file read.to_excel('my-spss-file.sav', target = 'my-excel-file.xlsx') # Excel to SPSS file write.from_excel('my-excel-file.xlsx', target = 'my-spss-file.sav') # ... similar pattern for other formats
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import pyreadstat import pandas # SPSS File to CSV df, metadata = pyreadstat.read_sav('my-spss-file.sav') csv_file = df.to_csv('my-csv-file.csv') # CSV to SPSS file df = pandas.read_csv('my-csv-file.csv') spss_file = pyreadstat.write_sav(df, 'my-spss-file.sav') # SPSS File to Excel File df, metadata = pyreadstat.read_sav('my-spss-file.sav') excel_file = df.to_excel('my-excel-file.xlsx') # Excel file to SPSS file df = pandas.read_excel('my-excel-file.xlsx') spss_file = pyreadstat.write_sav(df, 'my-spss-file.sav') # .. similar pattern for other formats
The savReaderWriter library is a powerful library for converting SPSS data to/from different formats. Its core strength is its ability to get very granular metadata about the SPSS data and to sequentially iterate through its records.
However, the library has three significant limitations when it comes to format conversion:
The library only provides read and write access for SPSS data, and this means that you would have to write the actual “conversion” logic yourself. This can get quite complicated, particularly when dealing with data serialization challenges.
The library depends on the SPSS I/O module, which is packaged with the library. This module has both licensing implications and is a “heavy” module for distribution.
The library’s most-recent commits date back to 2017, and it would seem that it is no longer being actively maintained.
When to use it?
When you actually need to dive into the data at the level of particular cases or values.
When your data has Multiple Response Sets, which are not (yet) supported by either Pyreadstat or the SPSS Converter.
Questions and Issues¶
You can ask questions and report issues on the project’s Github Issues Page
We welcome contributions and pull requests! For more information, please see the Contributor Guide.
We use TravisCI for our build automation, Codecov.io for our test coverage, and ReadTheDocs for our documentation.
Detailed information about our test suite and how to run tests locally can be found in our Testing Reference.
The SPSS Converter is made available under an MIT License.