Extract SAS Dataset metadata using Python

Extract SAS Dataset metadata using Python

SAS Datasets not only contain data but also a whole lot of metadata information within. The most commonly used information from this metadata is as follows -

  • Variable Name
  • Variable Label
  • Variable Type (num/char)
  • Format
  • Variable Position
  • Length

In SAS, typically the PROC CONTENTS procedure is used by programmers to extract out this dataset level and column/variable level metadata; and it look something like this -

proc contents data = mycas.cars;
run;

image.png

If you are trying to read this information using Python, then you have a couple of options, but here we will focus on an amazing Python library called pyreadstat. Some quick brief about pyreadstat

About pyreadstat

Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat.

Let's follow a few steps to understand how pyreadstat can be used to read SAS dataset metadata in a way that the output looks similar to the SAS Proc Contents output.

Step 1.

Install pyreadstat on your computer, if you haven't already.

pip install  pyreadstat  pandas

Step 2.

Start a new program metadata.py and import this library into the program.

import pyreadstat as prs
import pandas as pd

Step 3.

Invoke the .read_sas7bdat() function from the prs object which reads the SAS dataset (data as well as metadata)

ae001, ae001_meta = prs.read_sas7bdat("c:/ae001.sas7bdat")

Step 4.

Initialise an empty pandas dataframe and assign each piece of information as below.

# initalise empty pandas dataframe
df_metadata = pd.DataFrame()

# read column name, labels into the new pandas dataframe
df_metadata["name"] = meta.column_names
df_metadata["label"] = meta.column_labels

Step 5.

Follow the same steps as above to read the remaining information -

  • format << meta.original_variable_types
  • type << meta.readstat_variable_types
  • length << meta.variable_storage_width

This website gives very good detail about the functions and parameters available in pyreadstat to read SAS datasets.

Hope you found this article useful. Happy coding!

Did you find this article valuable?

Support Allwyn Dsouza by becoming a sponsor. Any amount is appreciated!