SPSS Converter

Simple utility for converting data to/from SPSS data files

Branch

Unit Tests

latest

Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)

v.0.1

Build Status (Travis CI) Code Coverage Status (Codecov) Documentation Status (ReadTheDocs)

The SPSS Converter is a simple utility that facilitates the easy conversion of SPSS data to / from a variety of formats, including:


Installation

To install the SPSS Converter, just execute:

$ pip install spss-converter

Dependencies

Python 3.x

* Pandas v0.24 or higher
* Pyreadstat v1.0 or higher
* OpenPyXL v.3.0.7 or higher
* PyYAML v3.10 or higher
* simplejson v3.0 or higher

Why the SPSS Converter?

If you work with SPSS data in the Python ecosystem, you probably use a combination of two or three key libraries: Pandas, Pyreadstat, and savReaderWriter. All three libraries are vital tools, incredibly well-constructed, designed, and managed. But over the years, I have found that converting from SPSS to other file formats using these libraries requires some fairly repetitive boilerplate code. So why not make it easier?

The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that provides a clean and simple API for reading data files in a variety of formats and converting them to a variety of formats. The semantics are super simple, and should be as simple as: spss_converter.to_csv('my-spss-file.sav') or spss_converter.from_json('my-json-file.json').

Key SPSS Converter Features

  • With one function call, convert an SPSS file into:

  • With one function call, create an SPSS data file from data in:

  • With one function call, generate a Pythonic data map or meta-data collection from your SPSS data file.

  • Decide which variables (columns) you want to include / exclude when doing your conversion.

SPSS Converter vs Alternatives

The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that simplifies the syntax for converting between different file formats.

While I am (I think understandably) biased in favor of the SPSS Converter, there some perfectly reasonable alternatives:

Obviously, since the SPSS Converter is just a wrapper around Pyreadstat and Pandas, you can simply call their functions directly.

Both libraries are excellent, stable, and use fairly straightforward syntax. However:

  • using those libraries directly does double the number of function calls you need to make to convert between different data formats, and

  • those libraries (and Pyreadstat in particular) provide limited validation or Pythonic object representation (less “batteries included” in its syntactical approach).

Of course, these differences are largely stylistic in nature.

Tip

When to use it?

Honestly, since initially building this wrapper I rarely use Pyreadstat and Pandas directly. Mostly, this is a matter of syntactical taste and personal preference.

However, I would definitely look to those libraries directly if I were:

  • writing this kind of wrapper

  • working in older versions of Python (< 3.7)

  • working with other formats of data than SPSS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from spss_converter import read, write

# SPSS File to CSV
read.to_csv('my-spss-file.sav',
            target = 'my-csv-file.csv')

# CSV to SPSS File
write.from_csv('my-csv-file.csv',
               target = 'my-spss-file.sav')

# SPSS File to Excel file
read.to_excel('my-spss-file.sav',
              target = 'my-excel-file.xlsx')

# Excel to SPSS file
write.from_excel('my-excel-file.xlsx',
                 target = 'my-spss-file.sav')

# ... similar pattern for other formats
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import pyreadstat
import pandas

# SPSS File to CSV
df, metadata = pyreadstat.read_sav('my-spss-file.sav')
csv_file = df.to_csv('my-csv-file.csv')

# CSV to SPSS file
df = pandas.read_csv('my-csv-file.csv')
spss_file = pyreadstat.write_sav(df,
                                 'my-spss-file.sav')

# SPSS File to Excel File
df, metadata = pyreadstat.read_sav('my-spss-file.sav')
excel_file = df.to_excel('my-excel-file.xlsx')

# Excel file to SPSS file
df = pandas.read_excel('my-excel-file.xlsx')
spss_file = pyreadstat.write_sav(df,
                                 'my-spss-file.sav')

# .. similar pattern for other formats

The savReaderWriter library is a powerful library for converting SPSS data to/from different formats. Its core strength is its ability to get very granular metadata about the SPSS data and to sequentially iterate through its records.

However, the library has three significant limitations when it comes to format conversion:

  • The library only provides read and write access for SPSS data, and this means that you would have to write the actual “conversion” logic yourself. This can get quite complicated, particularly when dealing with data serialization challenges.

  • The library depends on the SPSS I/O module, which is packaged with the library. This module has both licensing implications and is a “heavy” module for distribution.

  • The library’s most-recent commits date back to 2017, and it would seem that it is no longer being actively maintained.

Tip

When to use it?

  • When you actually need to dive into the data at the level of particular cases or values.

  • When your data has Multiple Response Sets, which are not (yet) supported by either Pyreadstat or the SPSS Converter.


Questions and Issues

You can ask questions and report issues on the project’s Github Issues Page


Contributing

We welcome contributions and pull requests! For more information, please see the Contributor Guide.


Testing

We use TravisCI for our build automation, Codecov.io for our test coverage, and ReadTheDocs for our documentation.

Detailed information about our test suite and how to run tests locally can be found in our Testing Reference.


License

The SPSS Converter is made available under an MIT License.