API Reference¶
Reading Data from SPSS¶
to_dataframe¶
-
to_dataframe(data: Union[bytes, _io.BytesIO, os.PathLike[Any]], limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Reads SPSS data and returns a
tuplewith a PandasDataFrameobject and relevantMetadata.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
A
DataFramerepresentation of the SPSS data (orNone) and aMetadatarepresentation of the data’s meta-data (value and labels / data map).- Return type
pandas.DataFrame/NoneandMetadata
to_csv¶
-
to_csv(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, include_header: bool = True, delimter: str = '|', null_text: str = 'NaN', wrapper_character: str = "'", escape_character: str = '\\', line_terminator: str = '\r\n', decimal: str = '.', limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
datainto a CSV string where each row represents a record of SPSS data.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO/str/None) – The destination where the CSV representation should be stored. Accepts either a filename, file-pointer or aStringIO, orNone. IfNone, will return astrobject stored in-memory. Defaults toNone.include_header (
bool) – IfTrue, will include a header row with column labels. IfFalse, will not include a header row. Defaults toTrue.delimiter (
str) – The delimiter used between columns. Defaults to|.null_text (
str) – The text value to use in place of empty values. Only applies ifwrap_empty_valuesisTrue. Defaults to'NaN'.wrapper_character (
str) – The string used to wrap string values when wrapping is necessary. Defaults to'.escape_character (
str) – The character to use when escaping nested wrapper characters. Defaults to\.line_terminator (
str) – The character used to mark the end of a line. Defaults to\r\n.decimal (
str) – The character used to indicate a decimal place in a numerical value. Defaults to..limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
Noneiftargetwas notNone, otherwise astrrepresentation of the CSV file.- Return type
to_excel¶
-
to_excel(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.BytesIO, pandas.io.excel._base.ExcelWriter]] = None, sheet_name: str = 'Sheet1', start_row: int = 0, start_column: int = 0, null_text: str = 'NaN', include_header: bool = True, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
datainto an Excel file where each row represents a record of SPSS data.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
BytesIO/ExcelWriter) – The destination where the Excel file should be stored. Accepts either a filename, file-pointer or aBytesIO, or anExcelWriterinstance.sheet_name (
str) – The worksheet on which the SPSS data should be written. Defaults to'Sheet1'.start_row (
int) – The row number (starting at 0) where the SPSS data should begin. Defaults to0.start_column (
int) – The column number (starting at 0) where the SPSS data should begin. Defaults to0.null_text (
str) – The way that missing values should be represented in the Excel file. Defaults to''(an empty string).include_header (
bool) – IfTrue, will include a header row with column labels. IfFalse, will not include a header row. Defaults toTrue.limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
Noneiftargetwas notNone, otherwise aBytesIOrepresentation of the Excel file.- Return type
to_json¶
-
to_json(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
datainto a JSON string.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO/str/None) – The destination where the JSON representation should be stored. Accepts either a filename, file-pointer orStringIO, orNone. IfNone, will return astrobject stored in-memory. Defaults toNone.layout (
str) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records, where the resulting JSON object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10.limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
Noneiftargetwas notNone, otherwise astrrepresentation of the JSON output.- Return type
to_yaml¶
-
to_yaml(data: Union[os.PathLike[Any], _io.BytesIO, bytes], target: Optional[Union[os.PathLike[Any], _io.StringIO]] = None, layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
datainto a YAML string.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.target (Path-like /
StringIO/str/None) – The destination where the YAML representation should be stored. Accepts either a filename, file-pointer orStringIO, orNone. IfNone, will return astrobject stored in-memory. Defaults toNone.layout (
str) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10.limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
Noneiftargetwas notNone, otherwise astrrepresentation of the YAML output.- Return type
to_dict¶
-
to_dict(data: Union[os.PathLike[Any], _io.BytesIO, bytes], layout: str = 'records', double_precision: int = 10, limit: Optional[int] = None, offset: int = 0, exclude_variables: Optional[List[str]] = None, include_variables: Optional[List[str]] = None, metadata_only: bool = False, apply_labels: bool = False, labels_as_categories: bool = True, missing_as_NaN: bool = False, convert_datetimes: bool = True, dates_as_datetime64: bool = False, **kwargs)[source]¶ Convert the SPSS
datainto a Pythondict.- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.layout (
str) –Indicates the layout schema to use for the JSON representation of the data. Accepts:
records, where the resulting YAML object represents an array of objects where each object corresponds to a single record, with key/value pairs for each column and that record’s corresponding valuetable, where the resulting JSON object contains a metadata (data map) describing the data schema along with the resulting collection of record objects
Defaults to
records.double_precision (class:int <python:int>) – Indicates the precision (places beyond the decimal point) to apply for floating point values. Defaults to
10.limit (
intorNone) – The number of records to read from the data. IfNonewill return all records. Defaults toNone.offset (
int) – The record at which to start reading the data. Defaults to 0 (first record).exclude_variables (iterable of
strorNone) – A list of the variables that should be ignored when reading data. Defaults toNone.include_variables (iterable of
strorNone) – A list of the variables that should be explicitly included when reading data. Defaults toNone.metadata_only (
bool) – IfTrue, will return no data records in the resultingDataFramebut will return a completeMetadatainstance. Defaults toFalse.apply_labels (
bool) – IfTrue, converts the numerically-coded values in the raw data to their human-readable labels. Defaults toFalse.labels_as_categories (
bool) –If
True, will convert labeled or formatted values to Pandas categories. Defaults toTrue.Caution
This parameter will only have an effect if the
apply_labelsparameter isTrue.missing_as_NaN (
bool) – IfTrue, will return any missing values asNaN. Otherwise will return missing values as per the configuration of missing value representation stored in the underlying SPSS data. Defaults toFalse, which applies the missing value representation configured in the SPSS data itself.convert_datetimes (
bool) – ifTrue, will convert the native integer representation of datetime values in the SPSS data to Pythonicdatetime, ordate, etc. representations (or Pandasdatetime64, depending on thedates_as_datetime64parameter). IfFalse, will leave the original integer representation. Defaults toTrue.dates_as_datetime64 (
bool) –If
True, will return any date values as Pandasdatetime64types. Defaults toFalse.Caution
This parameter is only applied if
convert_datetimesis set toTrue.
- Returns
Noneiftargetwas notNone, otherwise alistofdictiflayoutisrecords, or adictiflayoutistable.- Return type
get_metadata¶
-
get_metadata(data)[source]¶ Retrieve the metadata that describes the coded representation of the data, corresponding formatting information, and their related human-readable labels.
- Parameters
data (Path-like filename,
bytesorBytesIO) – The SPSS data to load. Accepts either a series of bytes or a filename.- Returns
The metadata that describes the raw data and its corresponding labels.
- Return type
Metadata
Writing Data to SPSS¶
from_dataframe¶
-
from_dataframe(df: pandas.core.frame.DataFrame, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, metadata: Optional[spss_converter.Metadata.Metadata] = None, compress: bool = False)[source]¶ Create an SPSS dataset from a Pandas
DataFrame.- Parameters
df (
pandas.DataFrame) – TheDataFrameto serialize to an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.metadata (
Metadata/None) – TheMetadataassociated with the dataset. IfNone, will attempt to derive it formdf. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
- Raises
ValueError – if
dfis not apandas.DataFrameValueError – if
metadatais not aMetadata
from_csv¶
-
from_csv(as_csv: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, delimiter='|', **kwargs)[source]¶ Convert a CSV file into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_csv()function.- Parameters
as_csv (
str/ File-location /BytesIO) – The CSV data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.delimiter (
str) – The delimiter used between columns. Defaults to|.kwargs (
dict) – Additional keyword arguments which will be passed onto thepandas.read_csv()function.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
from_excel¶
-
from_excel(as_excel, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert Excel data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_excel()function.- Parameters
as_excel (
str/ File-location /BytesIO/bytes/ExcelFile) – The Excel data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.kwargs (
dict) – Additional keyword arguments which will be passed onto thepandas.read_excel()function.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
from_json¶
-
from_json(as_json: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert JSON data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
pandas.read_json()function.- Parameters
as_json (
str/ File-location /BytesIO) – The JSON data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.kwargs (
dict) – Additional keyword arguments which will be passed onto thepandas.read_json()function.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
from_yaml¶
-
from_yaml(as_yaml: Union[str, PathLike[Any], _io.BytesIO], target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert YAML data into an SPSS dataset.
Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
DataFrame.from_dict()method.- Parameters
as_yaml (
str/ File-location /BytesIO) – The YAML data that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.kwargs (
dict) – Additional keyword arguments which will be passed onto theDataFrame.from_dict()method.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
from_dict¶
-
from_dict(as_dict: dict, target: Optional[Union[PathLike[Any], _io.BytesIO]] = None, compress: bool = False, **kwargs)[source]¶ Convert a
dictobject into an SPSS dataset.Tip
If you pass any additional keyword arguments, those keyword arguments will be passed onto the
DataFrame.from_dict()method.- Parameters
as_dict (
dict) – Thedictdata that you wish to convert into an SPSS dataset.target (Path-like /
BytesIO/None) – The target to which the SPSS dataset should be written. Accepts either a filename/path, aBytesIOobject, orNone. IfNonewill return aBytesIOobject containing the SPSS dataset. Defaults toNone.compress (
bool) – IfTrue, will return data in the compressed ZSAV format. IfFalse, will return data in the standards SAV format. Defaults toFalse.kwargs (
dict) – Additional keyword arguments which will be passed onto theDataFrame.from_dict()method.
- Returns
A
BytesIOobject containing the SPSS data iftargetisNoneor not a filename, otherwiseNone- Return type
apply_metadata¶
-
apply_metadata(df: pandas.core.frame.DataFrame, metadata: Union[spss_converter.Metadata.Metadata, dict, pyreadstat._readstat_parser.metadata_container], as_category: bool = True)[source]¶ Updates the
DataFramedfbased on themetadata.- Parameters
df (
pandas.DataFrame) – TheDataFrameto update.metadata (
Metadata,pyreadstat.metadata_container, or compatibledict) – TheMetadatato apply todf.as_category (
bool) – ifTrue, will variables with formats will be transformed into categories in theDataFrame. Defaults toTrue.
- Returns
A copy of
dfupdated to reflectmetadata.- Return type
Utility Classes¶
Metadata¶
-
class
Metadata(**kwargs)[source]¶ Object representation of metadata retrieved from an SPSS file.
-
classmethod
from_pyreadstat(as_metadata)[source]¶ Create a
Metadatainstance from a Pyreadstat metadata object.- Parameters
as_metadata (
Pyreadstat.metadata_container) –The Pyreadstat metadata object from which the
Metadatainstance should be created.- Returns
The
Metadatainstance.- Return type
-
to_pyreadstat()[source]¶ Create a Pyreadstat metadata representation of the
Metadatainstance.- Returns
The Pyreadstat metadata.
- Return type
metadata_container <pyreadstat:_readstat_parser.metadata_container
-
property
column_metadata¶ Collection of metadata that describes each column or variable within the dataset.
- Returns
A
dictwhere the key is the name of the column/variable and the value is aColumnMetadataobject or compatibledict.- Return type
-
property
file_label¶ The file label.
Note
This property is irrelevant for SPSS, but is relevant for SAS data.
-
classmethod
ColumnMetadata¶
-
class
ColumnMetadata(**kwargs)[source]¶ Object representation of the metadata that describes a column or variable form an SPSS file.
-
add_to_pyreadstat(pyreadstat)[source]¶ Update
pyreadstatto include the metadata for this column/variable.- Parameters
pyreadstat (
metadata_container <pyreadstat:_readstat_parser.metadata_container) –The Pyreadstat metadata object where the
ColumnMetadatadata should be updated.- Returns
The Pyreadstat metadata.
- Return type
metadata_container <pyreadstat:_readstat_parser.metadata_container
-
classmethod
from_dict(as_dict: dict)[source]¶ Create a new
ColumnMetadatainstance from adictrepresentation.- Parameters
as_dict (
dict) – Thedictrepresentation of theColumnMetadata.- Returns
The
ColumnMetadatainstance.- Return type
-
classmethod
from_pyreadstat_metadata(name: str, as_metadata)[source]¶ Create a new
ColumnMetadatainstance from a Pyreadstat metadata object.- Parameters
name (
str) – The name of the variable for which aColumnMetadatainstance should be created.as_metadata (
Pyreadstat.metadata_container) –The Pyreadstat metadata object from which the column’s metadata should be extracted.
- Returns
The
ColumnMetadatainstance.- Return type
-
property
alignment¶ The alignment to apply to values from this column/variable when displaying data. Defaults to
'unknown'.Accepts either
'unknown','left','center', or'right'as either a case-insensitivestror aVariableAlignmentEnum.- Return type
VariableAlignmentEnum
-
property
display_width¶ The maximum width at which the value is displayed. Defaults to 0.
- Return type
-
property
measure¶ A classification of the type of measure (or value type) represented by the variable. Defaults to
'unknown'.Accepts either
'unknown','nominal','ordinal', or'scale'.- Return type
VariableMeasureEnum
-
property
missing_range_metadata¶ Collection of meta data that defines the numerical ranges that are to be considered missing in the underlying data.
-
property
missing_value_metadata¶ Value used to represent misisng values in the raw data. Defaults to
None.Note
This is not actually relevant for SPSS data, but is an artifact for SAS and SATA data.
-
property
storage_width¶ The width of data to store in the data file for the value. Defaults to 0.
- Rytpe
-
property
value_metadata¶ Collection of values possible for the column/variable, with corresponding labels for each value.
-