Using the SPSS Converter


Introduction

The SPSS Converter library is a simple wrapper around the Pyreadstat and Pandas libraries that provides a clean and simple API for reading data files in a variety of formats and converting them to a variety of formats. The semantics are super simple, and should be as simple as: spss_converter.to_csv('my-spss-file.sav') or spss_converter.from_json('my-json-file.json').


Converting Data from SPSS

To read from SPSS files and convert them to a different format you can use functions whose names start with spss_converter.to_*. The examples below provide specifics:

Converting to Pandas DataFrame

To convert from an SPSS file to a Pandas DataFrame, simply call the to_dataframe() function:

import spss_converter

df, metadata = spss_converter.to_dataframe('my-spss-file.sav')

The code above will read your data from the file my-spss-file.sav, convert it into a Pandas DataFrame, and generate an spss_converter.Metadata representation of the SPSS file’s meta-data, which includes its data map, labeling, etc.

See also

Converting to CSV

To read data from an SPSS file and convert it into a CSV file, simply call the to_csv() function:

import spss_converter

as_csv = spss_converter.to_csv('my-spss-file.sav')
# Will store the contents of the CSV as a string in as_csv.

spss_converter.to_csv('my-spss-file.sav', target = 'my-csv-file.csv')
# Will save the CSV data to the file my-csv-file.csv.

Both lines of code above will read the SPSS data from my-spss-file.sav, but the first line will store it in the str variable as_csv. The second will instead write it to the file my-csv-file.csv.

See also

Converting to JSON

To read data from an SPSS file and convert it into a JSON object, simply call the to_json() function:

import spss_converter

as_json = spss_converter.to_json('my-spss-file.sav', layout = 'records')
# Stores the JSON data as a string in the variable as_json.

spss_converter.to_json('my-spss-file.sav',
                       target = 'my-json-file.json',
                       layout = 'records')
# Stores the JSON data in the file "my-json-file.json".
import spss_converter

as_json = spss_converter.to_json('my-spss-file.sav', layout = 'table')
# Stores the JSON data as a string in the variable as_json.

spss_converter.to_json('my-spss-file.sav',
                       target = 'my-json-file.json',
                       layout = 'table')
# Stores the JSON data in the file "my-json-file.json".

The SPSS Converter supports two different layouts for JSON representation of data:

  • Records. This layout returns a JSON collection (array) of JSON objects. Each object in the collection represents one record from the SPSS file. The object is a a set of key/value pairs where each key represents a variable/column in the SPSS file and its value represents the value of that variable/column for that respondent. This is the default layout.

  • Table. This layout returns a JSON object that includes a schema with the data map, and a separate data key which contains a collection (array) of objects where each object represents a single record from the SPSS data file.

Note

If no target is supplied, then the JSON representation is stored in-memory in the return value. If a target is supplied, then the JSON representation will be written to this file.

See also

Converting to YAML

To read data from an SPSS file and convert it into a YAML object, simply call the to_yaml() function:

import spss_converter

as_yaml = spss_converter.to_yaml('my-spss-file.sav', layout = 'records')
# Stores the YAML data as a string in the variable as_yaml.

spss_converter.to_yaml('my-spss-file.sav',
                       target = 'my-yaml-file.yaml',
                       layout = 'records')
# Stores the YAML data in the file "my-yaml-file.yaml".
import spss_converter

as_yaml = spss_converter.to_yaml('my-spss-file.sav', layout = 'table')
# Stores the YAML data as a string in the variable as_yaml.

spss_converter.to_yaml('my-spss-file.sav',
                       target = 'my-yaml-file.yaml',
                       layout = 'table')
# Stores the YAML data in the file "my-yaml-file.yaml".

The SPSS Converter supports two different layouts for YAML representation of data:

  • Records. This layout returns a YAML collection (array) of YAML objects. Each object in the collection represents one record from the SPSS file. The object is a a set of key/value pairs where each key represents a variable/column in the SPSS file and its value represents the value of that variable/column for that respondent. This is the default layout.

  • Table. This layout returns a YAML object that includes a schema with the data map, and a separate data key which contains a collection (array) of objects where each object represents a single record from the SPSS data file.

Note

If no target is supplied, then the YAML representation is stored in-memory in the return value. If a target is supplied, then the JSON representation will be written to this file.

See also

Converting to Excel

To read data from an SPSS file and convert it into a Microsoft Excel file, simply call the to_excel() function:

import spss_converter

as_excel = spss_converter.to_excel('my-spss-file.sav')
# Will store the contents of the Excel file as a binary object in as_excel.

spss_converter.to_excel('my-spss-file.sav', target = 'my-excel-file.xlsx')
# Will save the Excel data to the file my-excel-file.xlsx.

Both lines of code above will read the SPSS data from my-spss-file.sav, but the first line will store it in the bytes variable as_excel. The second will instead write it to the file my-excel-file.xlsx.

See also

Converting to dict

To read data from an SPSS file and convert it into a dict object, simply call the to_dict() function:

import spss_converter

as_dict = spss_converter.to_dict('my-spss-file.sav', layout = 'records')
# Stores the data as a dict or list of dict in the variable as_dict.
import spss_converter

as_dict = spss_converter.to_dict('my-spss-file.sav', layout = 'table')
# Stores the data as a dict or list of dict in the variable as_dict.

The SPSS Converter supports two different layouts for dict representation of data:

  • Records. This layout returns a list of dict objects. Each object in the list represents one record from the SPSS file. The object is a dict whose keys each represent a variable/column in the SPSS file and whose values represent the value of that variable/column for that respondent. This is the default layout.

  • Table. This layout returns a dict object that includes a schema key with the data map, and a separate data key which contains a list of objects where each object represents a single record from the SPSS data file.

See also


Converting Data to SPSS

To convert other sources of data to SPSS format, you can simply use any function whose names start with spss_converter.from_*. The examples below provide specifics:

Converting from Pandas DataFrame

To generate an SPSS file from a Pandas DataFrame, simply call the from_dataframe() function:

Note

The examples below all assume that the variable df contains the DataFrame whose data will be converted to SPSS format and the variable meta contains the Metadata that describes that data frame.

import spss_converter

as_spss = spss_converter.from_dataframe(df, metadata = meta)
# Will store the SPSS data in-memory in a binary bytes object named as_spss.

spss_converter.from_dataframe(df, target = 'my-spss-file.sav', metadata = meta)
# Will store the SPSS data to the hard drive in the file named "my-spss-file.sav".

The code above will convert the data in the DataFrame named df, and generate it in SPSS format either in-memory or on the hard drive.

See also

Converting from CSV

To read data from a CSV file and convert it into SPSS format, simply call the from_csv() function:

import spss_converter

as_spss = spss_converter.from_csv('my-csv-file.csv')
# Will store the contents of the CSV file as an in-memory binary object called as_spss.

spss_converter.from_csv('my-csv-file.csv', target = 'my-spss-file.sav')
# Will save the CSV data to the file my-spss-file.sav.

Both lines of code above will read the data from my-csv-file.csv, but the first line will store it in the bytesIO variable as_spss. The second will instead write it to the file my-spss-file.sav.

See also

Converting from dict

To read data from a dict and convert it into an SPSS format, simply call the from_dict() function:

import spss_converter

as_spss = spss_converter.from_dict(as_dict)
# Stores the data in-memory in the variable as_spss.

spss_converter.from_dict(as_dict, target = 'my-spss-file.sav')
# Stores the data on the hard drive in the file named "my-spss-file.sav".

See also

Converting from JSON

To read data from a JSON file and convert it into SPSS format, simply call the from_json() function:

import spss_converter

as_spss = spss_converter.from_json('my-json-file.json', layout = 'records')
# Stores the SPSS data in-memory in the variable as_spss.

spss_converter.from_json('my-json-file.json',
                         target = 'my-spss-file.sav',
                         layout = 'records')
# Stores the SPSS data in the file "my-spss-file.sav".
import spss_converter

as_spss = spss_converter.from_json('my-json-file.json', layout = 'table')
# Stores the SPSS data in-memory in the variable as_spss.

spss_converter.from_json('my-json-file.json',
                         target = 'my-spss-file.sav',
                         layout = 'table')
# Stores the SPSS data in the file "my-spss-file.sav".

The SPSS Converter supports two different layouts for JSON representation of data:

  • Records. This layout expects a JSON collection (array) of JSON objects. Each object in the collection represents one record in the SPSS file. The object is a a set of key/value pairs where each key represents a variable/column in the SPSS file and its value represents the value of that variable/column for that respondent. This is the default layout.

  • Table. This layout returns a JSON object that includes a schema with the data map, and a separate data key which contains a collection (array) of objects where each object represents a single record in the SPSS data file.

Note

If no target is supplied, then the SPSS representation is stored in-memory in the return value. If a target is supplied, then the SPSS representation will be written to this file.

Tip

The from_json() function can accept either a filename or a string with JSON data.

See also

Converting from YAML

To read data from a YAML file and convert it into SPSS format, simply call the from_yaml() function:

import spss_converter

as_spss = spss_converter.from_yaml('my-yaml-file.yaml', layout = 'records')
# Stores the SPSS data in-memory in the variable as_spss.

spss_converter.from_yaml('my-yaml-file.yaml',
                         target = 'my-spss-file.sav',
                         layout = 'records')
# Stores the SPSS data in the file "my-spss-file.sav".
import spss_converter

as_spss = spss_converter.from_yaml('my-yaml-file.yaml', layout = 'table')
# Stores the SPSS data in-memory in the variable as_spss.

spss_converter.from_yaml('my-yaml-file.yaml',
                         target = 'my-spss-file.sav',
                         layout = 'table')
# Stores the SPSS data in the file "my-spss-file.sav".

The SPSS Converter supports two different layouts for YAML representation of data:

  • Records. This layout expects a YAML collection (array) of YAML objects. Each object in the collection represents one record in the SPSS file. The object is a a set of key/value pairs where each key represents a variable/column in the SPSS file and its value represents the value of that variable/column for that respondent. This is the default layout.

  • Table. This layout returns a YAML object that includes a schema with the data map, and a separate data key which contains a collection (array) of objects where each object represents a single record in the SPSS data file.

Note

If no target is supplied, then the SPSS representation is stored in-memory in the return value. If a target is supplied, then the SPSS representation will be written to this file.

Tip

The from_yaml() function can accept either a filename or a string with YAML data.

See also

Converting to Excel

To read data from an Excel file and convert it into SPSS format, simply call the from_excel() function:

import spss_converter

as_excel = spss_converter.from_excel('my-excel-file.xlsx')
# Will store the contents of the SPSS data as a binary object in-memory in as_excel.

spss_converter.from_excel('my-excel-file.xlsx', target = 'my-spss-file.sav')
# Will save the SPSS data to the file my-spss-file.xlsx.

Both lines of code above will read the data from my-excel-file.xlsx, but the first line will store it in the bytes variable as_excel. The second will instead write it to the file my-spss-file.sav.

See also


Working with Metadata

Key to working with SPSS data is understanding the distinction between the raw data’s storage format and the metadata that describes that data. Fundamentally, think of metadata as the map of how a value stored in the raw data (such as a numerical value 1) can actually represent a human-readable labeled value (such as the labeled value "Female").

The metadata for an SPSS file can itself be quite verbose and define various rules for what can and should be expected when analyzing the records in the SPSS file. Within the SPSS Converter, this meta-data is represented using the Metadata class.

Various functions that read SPSS data produce Metadata instances, and these instances can be manipulated to restate and adjust the human-readable labels applied to your SPSS data.