3
���h�� � @ s d Z ddlmZ ddlZddlmZmZ ddlZddlm Z ddl
Z
ddlZddlm
Z
mZmZmZmZmZmZmZmZmZ ddlZddlmZ ddlZddlmZ dd lmZ dd
l m!Z!m"Z" ddl#m$Z$ ddl%m&Z&m'Z'm(Z( dd
l)m*Z*m+Z+m,Z,m-Z-m.Z.m/Z/m0Z0m1Z1 ddl2m3Z3 ddl4m5Z5 ddl6m7Z7 ddl8m9Z9m:Z:m;Z;m<Z<m=Z= dZ>dZ?dZ@dZAdZBdZCde?� de@� deA� deB� deC� d�ZDde?� de@� d�ZEde?� de@� deA� deC� d� ZFd d!d"d#d$d%d&d'd(g ZGejd)d*d*�ZHe7d+�d,d-�ZIe7eJe7d.�d/d0�ZKd1ZLG d2d3� d3eM�ZNd4ZOG d5d6� d6eM�ZPd7ZQG d8d9� d9eM�ZRd:ZSG d;d<� d<eM�ZTd=ZUe3e3d>�d?d@�ZVG dAdB� dB�ZWG dCdD� dD�ZXG dEdF� dF�ZYG dGdH� dHeYejZ�Z[e$eD�dqe!e\e\eeJ e\e\eeeJ e\ee] e\ee3e[f dK�dLdM��Z^e!eeJeeJeJf df eee\eeeJeeJeJf f f dN�dOdP�Z_eJeJdQ�dRdS�Z`ee]edT�dUdV�ZaeJejbdW�dXdY�Zceee" edZ�d[d\�Zdejbe7e]d]�d^d_�Zedre7e]e\eJda�dbdc�ZfG ddde� deeY�Zgejbe7e\e]df�dgdh�ZheeJeif e]eidT�didj�ZjG dkdl� dl�ZkG dmdn� dneg�ZlG dodp� dpel�ZmdS )sa�
Module contains tools for processing Stata files into DataFrames
The StataReader below was originally written by Joe Presbrey as part of PyDTA.
It has been extended and improved by Skipper Seabold from the Statsmodels
project who also developed the StataWriter and was finally added to pandas in
a once again improved version.
You can find more information on http://presbrey.mit.edu/PyDTA and
https://www.statsmodels.org/devel/
� )�abcN)�BytesIO�IOBase)�Path)
�Any�AnyStr�BinaryIO�Dict�List�Mapping�Optional�Sequence�Tuple�Union)�
relativedelta)�infer_dtype)�max_len_string_array)�FilePathOrBuffer�Label)�Appender)�
ensure_object�is_categorical_dtype�is_datetime64_dtype)�Categorical�
DatetimeIndex�NaT� Timestamp�concat�isna�to_datetime�to_timedelta)� DataFrame)�Index)�Series)�get_compression_method�get_filepath_or_buffer�
get_handle�infer_compression�stringify_pathz�Version of given Stata file is {version}. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).z�convert_dates : bool, default True
Convert date variables to DataFrame time values.
convert_categoricals : bool, default True
Read value labels and convert columns to Categorical/Factor variables.a index_col : str, optional
Column to set as index.
convert_missing : bool, default False
Flag indicating whether to convert missing values to their Stata
representations. If False, missing values are replaced with nan.
If True, columns containing missing values are returned with
object data types and missing values are represented by
StataMissingValue objects.
preserve_dtypes : bool, default True
Preserve Stata datatypes. If False, numeric data are upcast to pandas
default types for foreign data (float64 or int64).
columns : list or None
Columns to retain. Columns will be returned in the given order. None
returns all columns.
order_categoricals : bool, default True
Flag indicating whether converted categorical data are ordered.zzchunksize : int, default None
Return StataReader object for iterations, returns chunks with
given number of lines.z=iterator : bool, default False
Return StataReader object.z�Notes
-----
Categorical variables read through an iterator may not have the same
categories and dtype. This occurs when a variable stored in a DTA
file is associated to an incomplete set of value labels that only
label a strict subset of the values.a?
Read Stata file into DataFrame.
Parameters
----------
filepath_or_buffer : str, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid
URL schemes include http, ftp, s3, and file. For file URLs, a host is
expected. A local file could be: ``file://localhost/path/to/table.dta``.
If you want to pass in a path object, pandas accepts any ``os.PathLike``.
By file-like object, we refer to objects with a ``read()`` method,
such as a file handler (e.g. via builtin ``open`` function)
or ``StringIO``.
�
z�
Returns
-------
DataFrame or StataReader
See Also
--------
io.stata.StataReader : Low-level reader for Stata data files.
DataFrame.to_stata: Export Stata data files.
z�
Examples
--------
Read a Stata dta file:
>>> df = pd.read_stata('filename.dta')
Read a Stata dta file in 10,000 line chunks:
>>> itr = pd.read_stata('filename.dta', chunksize=10000)
>>> for chunk in itr:
... do_something(chunk)
z�Reads observations from Stata file, converting them into a dataframe
Parameters
----------
nrows : int
Number of lines to read from data file, if None read whole file.
z
Returns
-------
DataFrame
a" Class for reading Stata dta files.
Parameters
----------
path_or_buf : path (string), buffer or path object
string, path object (pathlib.Path or py._path.local.LocalPath) or object
implementing a binary read() functions.
.. versionadded:: 0.23.0 support for pathlib, py.path.
z
z%tcz%tCz%tdz%dz%twz%tmz%tqz%thz%tyi� � )�returnc sf t jjt jj ��t jtjddd� j� t jtjddd� j��d d d �� d d d �td���fdd�}td���fd d
�}td�� ���fdd�}tj| �}d
}|j � r�d}t| �}d||< |