SPSS Converter ¶

Simple utility for converting data to/from SPSS data files

Branch	Unit Tests
latest
v.0.1

The SPSS Converter is a simple utility that facilitates the easy conversion of SPSS data to / from a variety of formats, including:

CSV

JSON

YAML

Excel

Pandas DataFrame

Contents

SPSS Converter

Installation ¶

To install the SPSS Converter, just execute:

$ pip install spss-converter

Dependencies ¶

Python 3.x
* Pandas v0.24 or higher * Pyreadstat v1.0 or higher * OpenPyXL v.3.0.7 or higher * PyYAML v3.10 or higher * simplejson v3.0 or higher * Validator-Collection v1.5.0 or higher

If you work with SPSS data in the Python ecosystem, you probably use a combination of two or three key libraries: Pandas, Pyreadstat, and savReaderWriter. All three libraries are vital tools, incredibly well-constructed, designed, and managed. But over the years, I have found that converting from SPSS to other file formats using these libraries requires some fairly repetitive boilerplate code. So why not make it easier?

The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that provides a clean and simple API for reading data files in a variety of formats and converting them to a variety of formats. The semantics are super simple, and should be as simple as: spss_converter.to_csv('my-spss-file.sav') or spss_converter.from_json('my-json-file.json').

Key SPSS Converter Features ¶

With one function call, convert an SPSS file into:
- a Pandas DataFrame
- CSV
- JSON
- YAML
- Excel
- a dict
With one function call, create an SPSS data file from data in:
- a Pandas DataFrame
- CSV
- JSON
- YAML
- Excel
- a dict
With one function call, generate a Pythonic data map or meta-data collection from your SPSS data file.
Decide which variables (columns) you want to include / exclude when doing your conversion.

SPSS Converter vs Alternatives ¶

The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that simplifies the syntax for converting between different file formats.

While I am (I think understandably) biased in favor of the SPSS Converter, there some perfectly reasonable alternatives:

Obviously, since the SPSS Converter is just a wrapper around Pyreadstat and Pandas, you can simply call their functions directly.

Both libraries are excellent, stable, and use fairly straightforward syntax. However:

using those libraries directly does double the number of function calls you need to make to convert between different data formats, and

those libraries (and Pyreadstat in particular) provide limited validation or Pythonic object representation (less “batteries included” in its syntactical approach).

Of course, these differences are largely stylistic in nature.

Tip

When to use it?

Honestly, since initially building this wrapper I rarely use Pyreadstat and Pandas directly. Mostly, this is a matter of syntactical taste and personal preference.

However, I would definitely look to those libraries directly if I were:

writing this kind of wrapper

working in older versions of Python (< 3.7)

working with other formats of data than SPSS

from spss_converter import read, write

# SPSS File to CSV
read.to_csv('my-spss-file.sav',
            target = 'my-csv-file.csv')

# CSV to SPSS File
write.from_csv('my-csv-file.csv',
               target = 'my-spss-file.sav')

# SPSS File to Excel file
read.to_excel('my-spss-file.sav',
              target = 'my-excel-file.xlsx')

# Excel to SPSS file
write.from_excel('my-excel-file.xlsx',
                 target = 'my-spss-file.sav')

# ... similar pattern for other formats

import pyreadstat
import pandas

# SPSS File to CSV
df, metadata = pyreadstat.read_sav('my-spss-file.sav')
csv_file = df.to_csv('my-csv-file.csv')

# CSV to SPSS file
df = pandas.read_csv('my-csv-file.csv')
spss_file = pyreadstat.write_sav(df,
                                 'my-spss-file.sav')

# SPSS File to Excel File
df, metadata = pyreadstat.read_sav('my-spss-file.sav')
excel_file = df.to_excel('my-excel-file.xlsx')

# Excel file to SPSS file
df = pandas.read_excel('my-excel-file.xlsx')
spss_file = pyreadstat.write_sav(df,
                                 'my-spss-file.sav')

# .. similar pattern for other formats

The savReaderWriter library is a powerful library for converting SPSS data to/from different formats. Its core strength is its ability to get very granular metadata about the SPSS data and to sequentially iterate through its records.

However, the library has three significant limitations when it comes to format conversion:

The library only provides read and write access for SPSS data, and this means that you would have to write the actual “conversion” logic yourself. This can get quite complicated, particularly when dealing with data serialization challenges.

The library depends on the SPSS I/O module, which is packaged with the library. This module has both licensing implications and is a “heavy” module for distribution.

The library’s most-recent commits date back to 2017, and it would seem that it is no longer being actively maintained.

Tip

When to use it?

When you actually need to dive into the data at the level of particular cases or values.
When your data has Multiple Response Sets, which are not (yet) supported by either Pyreadstat or the SPSS Converter.

SPSS Converter ¶

Installation ¶

Dependencies ¶

Why the SPSS Converter?¶

Key SPSS Converter Features ¶

SPSS Converter vs Alternatives ¶

Questions and Issues ¶

Contributing ¶

Testing ¶

License ¶