SAS Datasets not only contain data but also a whole lot of metadata information within. The most commonly used information from this metadata is as follows -
- Variable Name
- Variable Label
- Variable Type (num/char)
- Variable Position
In SAS, typically the PROC CONTENTS procedure is used by programmers to extract out this dataset level and column/variable level metadata; and it look something like this -
proc contents data = mycas.cars; run;
If you are trying to read this information using Python, then you have a couple of options, but here we will focus on an amazing Python library called
pyreadstat. Some quick brief about
Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat.
Let's follow a few steps to understand how
pyreadstat can be used to read SAS dataset metadata in a way that the output looks similar to the SAS Proc Contents output.
pyreadstat on your computer, if you haven't already.
pip install pyreadstat pandas
Start a new program
metadata.py and import this library into the program.
import pyreadstat as prs import pandas as pd
.read_sas7bdat() function from the
prs object which reads the SAS dataset (data as well as metadata)
ae001, ae001_meta = prs.read_sas7bdat("c:/ae001.sas7bdat")
Initialise an empty pandas dataframe and assign each piece of information as below.
# initalise empty pandas dataframe df_metadata = pd.DataFrame() # read column name, labels into the new pandas dataframe df_metadata["name"] = meta.column_names df_metadata["label"] = meta.column_labels
Follow the same steps as above to read the remaining information -
- format << meta.original_variable_types
- type << meta.readstat_variable_types
- length << meta.variable_storage_width
This website gives very good detail about the functions and parameters available in
pyreadstat to read SAS datasets.
Hope you found this article useful. Happy coding!