API Reference¶

Reading Data from SPSS
- to_dataframe
- to_csv
- to_excel
- to_json
- to_yaml
- to_dict
- get_metadata
Writing Data to SPSS
Utility Classes
- Metadata
- ColumnMetadata

to_dataframe(data: Union[bytes, _io.BytesIO, os.PathLike[Any]], limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Reads SPSS data and returns a tuple with a Pandas DataFrame object and relevant Metadata.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

A DataFrame representation of the SPSS data (or None) and a Metadata representation of the data’s meta-data (value and labels / data map).

Return type

pandas.DataFrame/None and Metadata

to_csv ¶

to_csv(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, include_header: bool = True, delimter: str = '|', null_text: str = 'NaN', wrapper_character: str = "'", escape_character: str = '\\', line_terminator: str = '\r\n', decimal: str = '.', limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Convert the SPSS data into a CSV string where each row represents a record of SPSS data.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
target (Path-like / StringIO / str / None) – The destination where the CSV representation should be stored. Accepts either a filename, file-pointer or a StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.
include_header (bool) – If True, will include a header row with column labels. If False, will not include a header row. Defaults to True.
delimiter (str) – The delimiter used between columns. Defaults to |.
null_text (str) – The text value to use in place of empty values. Only applies if wrap_empty_values is True. Defaults to 'NaN'.
wrapper_character (str) – The string used to wrap string values when wrapping is necessary. Defaults to '.
escape_character (str) – The character to use when escaping nested wrapper characters. Defaults to \.
line_terminator (str) – The character used to mark the end of a line. Defaults to \r\n.
decimal (str) – The character used to indicate a decimal place in a numerical value. Defaults to ..
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the CSV file.

Return type

None or str

to_excel ¶

to_excel(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.BytesIO, pandas.io.excel._base.ExcelWriter]] = None, sheet_name: str = 'Sheet1', start_row: int = 0, start_column: int = 0, null_text: str = 'NaN', include_header: bool = True, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Convert the SPSS data into an Excel file where each row represents a record of SPSS data.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
target (Path-like / BytesIO / ExcelWriter) – The destination where the Excel file should be stored. Accepts either a filename, file-pointer or a BytesIO, or an ExcelWriter instance.
sheet_name (str) – The worksheet on which the SPSS data should be written. Defaults to 'Sheet1'.
start_row (int) – The row number (starting at 0) where the SPSS data should begin. Defaults to 0.
start_column (int) – The column number (starting at 0) where the SPSS data should begin. Defaults to 0.
null_text (str) – The way that missing values should be represented in the Excel file. Defaults to '' (an empty string).
include_header (bool) – If True, will include a header row with column labels. If False, will not include a header row. Defaults to True.
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a BytesIO representation of the Excel file.

Return type

None or str

to_json ¶

to_json(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Convert the SPSS data into a JSON string.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
target (Path-like / StringIO / str / None) – The destination where the JSON representation should be stored. Accepts either a filename, file-pointer or StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.
layout (str) –
Indicates the layout schema to use for the JSON representation of the data. Accepts:
- records, where the resulting JSON object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value
- table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to records.
double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the JSON output.

Return type

None or str

to_yaml ¶

to_yaml(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Convert the SPSS data into a YAML string.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
target (Path-like / StringIO / str / None) – The destination where the YAML representation should be stored. Accepts either a filename, file-pointer or StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.
layout (str) –
Indicates the layout schema to use for the JSON representation of the data. Accepts:
- records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value
- table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to records.
double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the YAML output.

Return type

None or str

to_dict ¶

to_dict(data: Union[os.PathLike[Any], _io.BytesIO, bytes], layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶

Convert the SPSS data into a Python dict.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
layout (str) –
Indicates the layout schema to use for the JSON representation of the data. Accepts:
- records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value
- table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to records.
double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.
limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.
offset (int) – The record at which to start reading the data. Defaults to 0 (first record).
exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.
include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.
metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.
apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.
labels_as_categories (bool) –
If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

Caution

This parameter will only have an effect if the apply_labels parameter is True.
missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.
convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.
dates_as_datetime64 (bool) –
If True, will return any date values as Pandas datetime64 types. Defaults to False.

Caution

This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a list of dict if layout is records, or a dict if layout is table.

Return type

None or str

get_metadata ¶

get_metadata(data)[source]¶

Retrieve the metadata that describes the coded representation of the data, corresponding formatting information, and their related human-readable labels.

Parameters: data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.
Returns: The metadata that describes the raw data and its corresponding labels.
Return type: Metadata

Writing Data to SPSS ¶

from_dataframe ¶

from_dataframe(df: pandas.core.frame.DataFrame, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, metadata: Optional[spss_converter.Metadata.Metadata] = None, compress: bool = False)[source]¶

Create an SPSS dataset from a Pandas DataFrame.

Parameters

df (pandas.DataFrame) – The DataFrame to serialize to an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
metadata (Metadata / None) – The Metadata associated with the dataset. If None, will attempt to derive it form df. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

Raises

ValueError – if df is not a pandas.DataFrame
ValueError – if metadata is not a Metadata

from_csv ¶

from_csv(as_csv: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, delimiter='|', **kwargs)[source]¶

Convert a CSV file into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_csv() function.

Parameters

as_csv (str / File-location / BytesIO) – The CSV data that you wish to convert into an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.
delimiter (str) – The delimiter used between columns. Defaults to |.
kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_csv() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

from_excel ¶

from_excel(as_excel, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶

Convert Excel data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_excel() function.

Parameters

as_excel (str / File-location / BytesIO / bytes / ExcelFile) – The Excel data that you wish to convert into an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.
kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_excel() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

from_json ¶

from_json(as_json: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶

Convert JSON data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_json() function.

Parameters

as_json (str / File-location / BytesIO) – The JSON data that you wish to convert into an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.
kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_json() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

from_yaml ¶

from_yaml(as_yaml: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶

Convert YAML data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the DataFrame.from_dict() method.

Parameters

as_yaml (str / File-location / BytesIO) – The YAML data that you wish to convert into an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.
kwargs (dict) – Additional keyword arguments which will be passed onto the DataFrame.from_dict() method.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

from_dict ¶

from_dict(as_dict: dict, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶

Convert a dict object into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the DataFrame.from_dict() method.

Parameters

as_dict (dict) – The dict data that you wish to convert into an SPSS dataset.
target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.
compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.
kwargs (dict) – Additional keyword arguments which will be passed onto the DataFrame.from_dict() method.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

apply_metadata ¶

apply_metadata(df: pandas.core.frame.DataFrame, metadata: Union[spss_converter.Metadata.Metadata, dict, pyreadstat._readstat_parser.metadata_container], as_category: bool = True)[source]¶

Updates the DataFrame df based on the metadata.

Parameters

df (pandas.DataFrame) – The DataFrame to update.
metadata (Metadata, pyreadstat.metadata_container, or compatible dict) – The Metadata to apply to df.
as_category (bool) – if True, will variables with formats will be transformed into categories in the DataFrame. Defaults to True.

Returns

A copy of df updated to reflect metadata.

Return type

DataFrame

Utility Classes ¶

Metadata ¶

class Metadata(**kwargs)[source]¶

Object representation of metadata retrieved from an SPSS file.

classmethod from_dict(as_dict: dict)[source]¶

Create a Metadata instance from a dict representation.

Parameters: as_dict (dict) – A dict representation of the Metadata.
Returns: A Metadata instance
Return type: Metadata

classmethod from_pyreadstat(as_metadata)[source]¶

Create a Metadata instance from a Pyreadstat metadata object.

Parameters

as_metadata (Pyreadstat.metadata_container) –

The Pyreadstat metadata object from which the Metadata instance should be created.

Returns

The Metadata instance.

Return type

Metadata

to_dict() → dict [source]¶

Return a dict representation of the instance.

Return type: dict

to_pyreadstat()[source]¶

Create a Pyreadstat metadata representation of the Metadata instance.

Returns

The Pyreadstat metadata.

Return type

metadata_container <pyreadstat:_readstat_parser.metadata_container

property column_metadata¶

Collection of metadata that describes each column or variable within the dataset.

Returns: A dict where the key is the name of the column/variable and the value is a ColumnMetadata object or compatible dict.
Return type: dict / None

property columns¶

The number of columns/variables in the dataset.

Return type: int

property file_encoding¶

The file encoding for the dataset.

Return type: str or None

property file_label¶

The file label.

Note

This property is irrelevant for SPSS, but is relevant for SAS data.

Return type: str / None

property notes¶

Set of notes related to the file.

Return type: str / None

property rows¶

The number of cases or rows in the dataset.

Return type: int

property table_name¶

The name of the data table.

Return type: str / None

ColumnMetadata ¶

class ColumnMetadata(**kwargs)[source]¶

Object representation of the metadata that describes a column or variable form an SPSS file.

add_to_pyreadstat(pyreadstat)[source]¶

Update pyreadstat to include the metadata for this column/variable.

Parameters

pyreadstat (metadata_container <pyreadstat:_readstat_parser.metadata_container) –

The Pyreadstat metadata object where the ColumnMetadata data should be updated.

Returns

The Pyreadstat metadata.

Return type

metadata_container <pyreadstat:_readstat_parser.metadata_container

classmethod from_dict(as_dict: dict)[source]¶

Create a new ColumnMetadata instance from a dict representation.

Parameters: as_dict (dict) – The dict representation of the ColumnMetadata.
Returns: The ColumnMetadata instance.
Return type: ColumnMetadata

classmethod from_pyreadstat_metadata(name: str, as_metadata)[source]¶

Create a new ColumnMetadata instance from a Pyreadstat metadata object.

Parameters

name (str) – The name of the variable for which a ColumnMetadata instance should be created.
as_metadata (Pyreadstat.metadata_container) –
The Pyreadstat metadata object from which the column’s metadata should be extracted.

Returns

The ColumnMetadata instance.

Return type

ColumnMetadata

to_dict() → dict [source]¶

Generate a dict representation of the instance.

Return type: dict

property alignment¶

The alignment to apply to values from this column/variable when displaying data. Defaults to 'unknown'.

Accepts either 'unknown', 'left', 'center', or 'right' as either a case-insensitive str or a VariableAlignmentEnum.

Return type: VariableAlignmentEnum

property display_width¶

The maximum width at which the value is displayed. Defaults to 0.

Return type: int

property label¶

The label applied ot the column/variable.

Return type: str / None

property measure¶

A classification of the type of measure (or value type) represented by the variable. Defaults to 'unknown'.

Accepts either 'unknown', 'nominal', 'ordinal', or 'scale'.

Return type: VariableMeasureEnum

property missing_range_metadata¶

Collection of meta data that defines the numerical ranges that are to be considered missing in the underlying data.

Returns: list of dict with keys 'low' and 'high' for the low/high values of the range to apply when raw values are missing (None).
Return type: list of dict or None

property missing_value_metadata¶

Value used to represent misisng values in the raw data. Defaults to None.

Note

This is not actually relevant for SPSS data, but is an artifact for SAS and SATA data.

Return type: list of int or str / None

property name¶

The name of the column/variable.

Return type: str / None

property storage_width¶

The width of data to store in the data file for the value. Defaults to 0.

Rytpe: int

property value_metadata¶

Collection of values possible for the column/variable, with corresponding labels for each value.

Returns: dict whose keys are the values in the raw data and whose values are the labels for each value. May be None for variables whose value is not coded.
Return type: dict / None

API Reference¶

Reading Data from SPSS ¶

to_dataframe ¶

to_csv ¶

to_excel ¶

to_json ¶

to_yaml ¶

to_dict ¶

get_metadata ¶

Writing Data to SPSS ¶

from_dataframe ¶

from_csv ¶

from_excel ¶

from_json ¶

from_yaml ¶

from_dict ¶

apply_metadata ¶

Utility Classes ¶

Metadata ¶

ColumnMetadata ¶