Converting datasets from STATA
It's very important in practical data science to know how to convert datasets into the right format and structure. Being able to import data from language-/tool- specific formats like Stata is something that's very useful, especially for a lot of social science data. Fortunately, that's already available as part of the StatsModels library in Python.
Here's how you do it:
In the terminal, execute the command:
pip install -U statsmodels
. This should upgrade you to 0.5.0+. If you already have the latest version of statsmodels, you can skip this step.In ipython:
import statsmodels.iolib.foreign as smio from pandas import DataFrame arr = smio.genfromdta('~/path/to/stata/data.dta') frame = DataFrame.from_records(arr)
The genfromdta
function in statsmodels.iolib.foreign
converts a dta file to a NumPy record array (special numpy array type). The last line above show how to convert the record array into a pandas DataFrame so the data can live happily ever after.
blog comments powered by Disqus