API Reference


Reading Data from SPSS

to_dataframe

to_dataframe(data: Union[bytes, _io.BytesIO, os.PathLike[Any]], limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Reads SPSS data and returns a tuple with a Pandas DataFrame object and relevant Metadata.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

A DataFrame representation of the SPSS data (or None) and a Metadata representation of the data’s meta-data (value and labels / data map).

Return type

pandas.DataFrame/None and Metadata


to_csv

to_csv(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, include_header: bool = True, delimter: str = '|', null_text: str = 'NaN', wrapper_character: str = "'", escape_character: str = '\\', line_terminator: str = '\r\n', decimal: str = '.', limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Convert the SPSS data into a CSV string where each row represents a record of SPSS data.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • target (Path-like / StringIO / str / None) – The destination where the CSV representation should be stored. Accepts either a filename, file-pointer or a StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.

  • include_header (bool) – If True, will include a header row with column labels. If False, will not include a header row. Defaults to True.

  • delimiter (str) – The delimiter used between columns. Defaults to |.

  • null_text (str) – The text value to use in place of empty values. Only applies if wrap_empty_values is True. Defaults to 'NaN'.

  • wrapper_character (str) – The string used to wrap string values when wrapping is necessary. Defaults to '.

  • escape_character (str) – The character to use when escaping nested wrapper characters. Defaults to \.

  • line_terminator (str) – The character used to mark the end of a line. Defaults to \r\n.

  • decimal (str) – The character used to indicate a decimal place in a numerical value. Defaults to ..

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the CSV file.

Return type

None or str


to_excel

to_excel(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.BytesIO, pandas.io.excel._base.ExcelWriter]] = None, sheet_name: str = 'Sheet1', start_row: int = 0, start_column: int = 0, null_text: str = 'NaN', include_header: bool = True, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Convert the SPSS data into an Excel file where each row represents a record of SPSS data.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • target (Path-like / BytesIO / ExcelWriter) – The destination where the Excel file should be stored. Accepts either a filename, file-pointer or a BytesIO, or an ExcelWriter instance.

  • sheet_name (str) – The worksheet on which the SPSS data should be written. Defaults to 'Sheet1'.

  • start_row (int) – The row number (starting at 0) where the SPSS data should begin. Defaults to 0.

  • start_column (int) – The column number (starting at 0) where the SPSS data should begin. Defaults to 0.

  • null_text (str) – The way that missing values should be represented in the Excel file. Defaults to '' (an empty string).

  • include_header (bool) – If True, will include a header row with column labels. If False, will not include a header row. Defaults to True.

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a BytesIO representation of the Excel file.

Return type

None or str


to_json

to_json(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Convert the SPSS data into a JSON string.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • target (Path-like / StringIO / str / None) – The destination where the JSON representation should be stored. Accepts either a filename, file-pointer or StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.

  • layout (str) –

    Indicates the layout schema to use for the JSON representation of the data. Accepts:

    • records, where the resulting JSON object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value

    • table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects

    Defaults to records.

  • double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the JSON output.

Return type

None or str


to_yaml

to_yaml(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Convert the SPSS data into a YAML string.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • target (Path-like / StringIO / str / None) – The destination where the YAML representation should be stored. Accepts either a filename, file-pointer or StringIO, or None. If None, will return a str object stored in-memory. Defaults to None.

  • layout (str) –

    Indicates the layout schema to use for the JSON representation of the data. Accepts:

    • records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value

    • table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects

    Defaults to records.

  • double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a str representation of the YAML output.

Return type

None or str


to_dict

to_dict(data: Union[os.PathLike[Any], _io.BytesIO, bytes], layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]

Convert the SPSS data into a Python dict.

Parameters
  • data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

  • layout (str) –

    Indicates the layout schema to use for the JSON representation of the data. Accepts:

    • records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding value

    • table, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects

    Defaults to records.

  • double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to 10.

  • limit (int or None) – The number of records to read from the data. If None will return all records. Defaults to None.

  • offset (int) – The record at which to start reading the data. Defaults to 0 (first record).

  • exclude_variables (iterable of str or None) – A list of the variables that should be ignored when reading data. Defaults to None.

  • include_variables (iterable of str or None) – A list of the variables that should be explicitly included when reading data. Defaults to None.

  • metadata_only (bool) – If True, will return no data records in the resulting DataFrame but will return a complete Metadata instance. Defaults to False.

  • apply_labels (bool) – If True, converts the numerically-coded values in the raw data to their human-readable labels. Defaults to False.

  • labels_as_categories (bool) –

    If True, will convert labeled or formatted values to Pandas categories. Defaults to True.

    Caution

    This parameter will only have an effect if the apply_labels parameter is True.

  • missing_as_NaN (bool) – If True, will return any missing values as NaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults to False, which applies the missing value representation configured in the SPSS data itself.

  • convert_datetimes (bool) – if True, will convert the native integer representation of datetime values in the SPSS data to Pythonic datetime, or date, etc. representations (or Pandas datetime64, depending on the dates_as_datetime64 parameter). If False, will leave the original integer representation. Defaults to True.

  • dates_as_datetime64 (bool) –

    If True, will return any date values as Pandas datetime64 types. Defaults to False.

    Caution

    This parameter is only applied if convert_datetimes is set to True.

Returns

None if target was not None, otherwise a list of dict if layout is records, or a dict if layout is table.

Return type

None or str


get_metadata

get_metadata(data)[source]

Retrieve the metadata that describes the coded representation of the data, corresponding formatting information, and their related human-readable labels.

Parameters

data (Path-like filename, bytes or BytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.

Returns

The metadata that describes the raw data and its corresponding labels.

Return type

Metadata


Writing Data to SPSS

from_dataframe

from_dataframe(df: pandas.core.frame.DataFrame, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, metadata: Optional[spss_converter.Metadata.Metadata] = None, compress: bool = False)[source]

Create an SPSS dataset from a Pandas DataFrame.

Parameters
  • df (pandas.DataFrame) – The DataFrame to serialize to an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • metadata (Metadata / None) – The Metadata associated with the dataset. If None, will attempt to derive it form df. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None

Raises

from_csv

from_csv(as_csv: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, delimiter='|', **kwargs)[source]

Convert a CSV file into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_csv() function.

Parameters
  • as_csv (str / File-location / BytesIO) – The CSV data that you wish to convert into an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

  • delimiter (str) – The delimiter used between columns. Defaults to |.

  • kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_csv() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None


from_excel

from_excel(as_excel, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]

Convert Excel data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_excel() function.

Parameters
  • as_excel (str / File-location / BytesIO / bytes / ExcelFile) – The Excel data that you wish to convert into an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

  • kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_excel() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None


from_json

from_json(as_json: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]

Convert JSON data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the pandas.read_json() function.

Parameters
  • as_json (str / File-location / BytesIO) – The JSON data that you wish to convert into an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

  • kwargs (dict) – Additional keyword arguments which will be passed onto the pandas.read_json() function.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None


from_yaml

from_yaml(as_yaml: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]

Convert YAML data into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the DataFrame.from_dict() method.

Parameters
  • as_yaml (str / File-location / BytesIO) – The YAML data that you wish to convert into an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

  • kwargs (dict) – Additional keyword arguments which will be passed onto the DataFrame.from_dict() method.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None


from_dict

from_dict(as_dict: dict, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]

Convert a dict object into an SPSS dataset.

Tip

If you pass any additional keyword arguments, those keyword arguments will be passed onto the DataFrame.from_dict() method.

Parameters
  • as_dict (dict) – The dict data that you wish to convert into an SPSS dataset.

  • target (Path-like / BytesIO / None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, a BytesIO object, or None. If None will return a BytesIO object containing the SPSS dataset. Defaults to None.

  • compress (bool) – If True, will return data in the compressed ZSAV format. If False, will return data in the standards SAV format. Defaults to False.

  • kwargs (dict) – Additional keyword arguments which will be passed onto the DataFrame.from_dict() method.

Returns

A BytesIO object containing the SPSS data if target is None or not a filename, otherwise None

Return type

BytesIO or None


apply_metadata

apply_metadata(df: pandas.core.frame.DataFrame, metadata: Union[spss_converter.Metadata.Metadata, dict, pyreadstat._readstat_parser.metadata_container], as_category: bool = True)[source]

Updates the DataFrame df based on the metadata.

Parameters
  • df (pandas.DataFrame) – The DataFrame to update.

  • metadata (Metadata, pyreadstat.metadata_container, or compatible dict) – The Metadata to apply to df.

  • as_category (bool) – if True, will variables with formats will be transformed into categories in the DataFrame. Defaults to True.

Returns

A copy of df updated to reflect metadata.

Return type

DataFrame


Utility Classes

Metadata

class Metadata(**kwargs)[source]

Object representation of metadata retrieved from an SPSS file.

classmethod from_dict(as_dict: dict)[source]

Create a Metadata instance from a dict representation.

Parameters

as_dict (dict) – A dict representation of the Metadata.

Returns

A Metadata instance

Return type

Metadata

classmethod from_pyreadstat(as_metadata)[source]

Create a Metadata instance from a Pyreadstat metadata object.

Parameters

as_metadata (Pyreadstat.metadata_container) –

The Pyreadstat metadata object from which the Metadata instance should be created.

Returns

The Metadata instance.

Return type

Metadata

to_dict()dict[source]

Return a dict representation of the instance.

Return type

dict

to_pyreadstat()[source]

Create a Pyreadstat metadata representation of the Metadata instance.

Returns

The Pyreadstat metadata.

Return type

metadata_container <pyreadstat:_readstat_parser.metadata_container

property column_metadata

Collection of metadata that describes each column or variable within the dataset.

Returns

A dict where the key is the name of the column/variable and the value is a ColumnMetadata object or compatible dict.

Return type

dict / None

property columns

The number of columns/variables in the dataset.

Return type

int

property file_encoding

The file encoding for the dataset.

Return type

str or None

property file_label

The file label.

Note

This property is irrelevant for SPSS, but is relevant for SAS data.

Return type

str / None

property notes

Set of notes related to the file.

Return type

str / None

property rows

The number of cases or rows in the dataset.

Return type

int

property table_name

The name of the data table.

Return type

str / None


ColumnMetadata

class ColumnMetadata(**kwargs)[source]

Object representation of the metadata that describes a column or variable form an SPSS file.

add_to_pyreadstat(pyreadstat)[source]

Update pyreadstat to include the metadata for this column/variable.

Parameters

pyreadstat (metadata_container <pyreadstat:_readstat_parser.metadata_container) –

The Pyreadstat metadata object where the ColumnMetadata data should be updated.

Returns

The Pyreadstat metadata.

Return type

metadata_container <pyreadstat:_readstat_parser.metadata_container

classmethod from_dict(as_dict: dict)[source]

Create a new ColumnMetadata instance from a dict representation.

Parameters

as_dict (dict) – The dict representation of the ColumnMetadata.

Returns

The ColumnMetadata instance.

Return type

ColumnMetadata

classmethod from_pyreadstat_metadata(name: str, as_metadata)[source]

Create a new ColumnMetadata instance from a Pyreadstat metadata object.

Parameters
  • name (str) – The name of the variable for which a ColumnMetadata instance should be created.

  • as_metadata (Pyreadstat.metadata_container) –

    The Pyreadstat metadata object from which the column’s metadata should be extracted.

Returns

The ColumnMetadata instance.

Return type

ColumnMetadata

to_dict()dict[source]

Generate a dict representation of the instance.

Return type

dict

property alignment

The alignment to apply to values from this column/variable when displaying data. Defaults to 'unknown'.

Accepts either 'unknown', 'left', 'center', or 'right' as either a case-insensitive str or a VariableAlignmentEnum.

Return type

VariableAlignmentEnum

property display_width

The maximum width at which the value is displayed. Defaults to 0.

Return type

int

property label

The label applied ot the column/variable.

Return type

str / None

property measure

A classification of the type of measure (or value type) represented by the variable. Defaults to 'unknown'.

Accepts either 'unknown', 'nominal', 'ordinal', or 'scale'.

Return type

VariableMeasureEnum

property missing_range_metadata

Collection of meta data that defines the numerical ranges that are to be considered missing in the underlying data.

Returns

list of dict with keys 'low' and 'high' for the low/high values of the range to apply when raw values are missing (None).

Return type

list of dict or None

property missing_value_metadata

Value used to represent misisng values in the raw data. Defaults to None.

Note

This is not actually relevant for SPSS data, but is an artifact for SAS and SATA data.

Return type

list of int or str / None

property name

The name of the column/variable.

Return type

str / None

property storage_width

The width of data to store in the data file for the value. Defaults to 0.

Rytpe

int

property value_metadata

Collection of values possible for the column/variable, with corresponding labels for each value.

Returns

dict whose keys are the values in the raw data and whose values are the labels for each value. May be None for variables whose value is not coded.

Return type

dict / None