3
���h� � 1 @ s
U d Z ddlmZmZ ddlZddlZddlmZmZ ddl Z ddl
Z
ddlZddlm
Z
ddlmZmZmZmZmZmZmZ ddlZddlZddljjZddljjZddljj Z ddlm!Z! ddl"m#Z# dd l$m%Z%m&Z& dd
l'm(Z(m)Z)m*Z*m+Z+ ddl,m-Z- ddl.m/Z/ dd
l0m1Z1m2Z2m3Z3m4Z4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z<m=Z=m>Z>m?Z?m@Z@ ddlAmBZB ddlCmDZD ddlEmFZF ddlGmHZH ddlImJZJ ddlKmLZLmMZMmNZNmOZO ddlPmQZQ ddlRmSZT ddlUmVZVmWZWmXZXmYZY ddlZm[Z[ dZ\de
dj]e^e!��ddd� d Z_dsdd �Z`d!d"� Zae%d#�d$d%�Zbddd&ejcd'd(dd)dddddddd'ddddd'ddd*d(d(d(dddd(dd(dd'd(d'd+�%Zdd(d'd'd(d'd'dd,�Zed)d-dd.�Zfd/hZgd0d1hZhi Zieejef iek� Zleej le-e_jmd2d3d4d5��d6dd)dddd(dd'dddddd(ddddd'd'd(d'd(d(d(dd(d'd(dd)dd*dd&ejcd'ddddd'd'd(eed0 d(df0e%ejd7�d8d2��Zne-e_jmd9d:d;d5��d<dd)dddd(dd'dddddd(ddddd'd'd(d'd(d(d(dd(d'd(dd)dd*dd&ejcd'ddddd'd'd(eed0 d(df0e%ejd7�d=d9��Zodte%d#�d>d?�ZpG d@dA� dAejq�ZrdBdC� Zsduee&eteeu f dD�dEdF�ZvdGdH� ZwdIdJ� ZxdKdL� ZydMdN� ZzdOdP� Z{G dQdR� dR�Z|G dSdT� dTe|�Z}dUdV� Z~dWdX� ZG dYdZ� dZe|�Z�dvd[d\�Z�dwd]d^�Z�d_d`� Z�dxdadb�Z�dcdd� Z�dydedf�Z�dgdh� Z�didj� Z�dkdl� Z�dmdn� Z�G dodp� dpejq�Z�G dqdr� dre��Z�dS )zzM
Module contains tools for processing files into DataFrames or other objects
� )�abc�defaultdictN)�StringIO�
TextIOWrapper)�fill)�Any�Dict�Iterable�List�Optional�Sequence�Set)�
STR_NA_VALUES)�parsing)�FilePathOrBuffer�Union)�AbstractMethodError�EmptyDataError�ParserError�
ParserWarning)�Appender)�astype_nansafe)�
ensure_object�
ensure_str�
is_bool_dtype�is_categorical_dtype�is_dict_like�is_dtype_equal�is_extension_array_dtype�is_file_like�is_float�
is_integer�is_integer_dtype�is_list_like�is_object_dtype� is_scalar�is_string_dtype�pandas_dtype)�CategoricalDtype)�isna)�
algorithms)�Categorical)� DataFrame)�Index�
MultiIndex�
RangeIndex�ensure_index_from_sequences)�Series)� datetimes)�get_filepath_or_buffer�
get_handle�infer_compression�validate_header_arg)�generic_parseru a�
{summary}
Also supports optionally iterating or breaking of the file
into chunks.
Additional help can be found in the online docs for
`IO Tools <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html>`_.
Parameters
----------
filepath_or_buffer : str, path object or file-like object
Any valid string path is acceptable. The string could be a URL. Valid
URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is
expected. A local file could be: file://localhost/path/to/table.csv.
If you want to pass in a path object, pandas accepts any ``os.PathLike``.
By file-like object, we refer to objects with a ``read()`` method, such as
a file handler (e.g. via builtin ``open`` function) or ``StringIO``.
sep : str, default {_default_sep}
Delimiter to use. If sep is None, the C engine cannot automatically detect
the separator, but the Python parsing engine can, meaning the latter will
be used and automatically detect the separator by Python's builtin sniffer
tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
different from ``'\s+'`` will be interpreted as regular expressions and
will also force the use of the Python parsing engine. Note that regex
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``.
delimiter : str, default ``None``
Alias for sep.
header : int, list of int, default 'infer'
Row number(s) to use as the column names, and the start of the
data. Default behavior is to infer the column names: if no names
are passed the behavior is identical to ``header=0`` and column
names are inferred from the first line of the file, if column
names are passed explicitly then the behavior is identical to
``header=None``. Explicitly pass ``header=0`` to be able to
replace existing names. The header can be a list of integers that
specify row locations for a multi-index on the columns
e.g. [0,1,3]. Intervening rows that are not specified will be
skipped (e.g. 2 in this example is skipped). Note that this
parameter ignores commented lines and empty lines if
``skip_blank_lines=True``, so ``header=0`` denotes the first line of
data rather than the first line of the file.
names : array-like, optional
List of column names to use. If the file contains a header row,
then you should explicitly pass ``header=0`` to override the column names.
Duplicates in this list are not allowed.
index_col : int, str, sequence of int / str, or False, default ``None``
Column(s) to use as the row labels of the ``DataFrame``, either given as
string name or column index. If a sequence of int / str is given, a
MultiIndex is used.
Note: ``index_col=False`` can be used to force pandas to *not* use the first
column as the index, e.g. when you have a malformed file with delimiters at
the end of each line.
usecols : list-like or callable, optional
Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or strings
that correspond to column names provided either by the user in `names` or
inferred from the document header row(s). For example, a valid list-like
`usecols` parameter would be ``[0, 1, 2]`` or ``['foo', 'bar', 'baz']``.
Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.
To instantiate a DataFrame from ``data`` with element order preserved use
``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
in ``['foo', 'bar']`` order or
``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``
for ``['bar', 'foo']`` order.
If callable, the callable function will be evaluated against the column
names, returning names where the callable function evaluates to True. An
example of a valid callable argument would be ``lambda x: x.upper() in
['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
parsing time and lower memory usage.
squeeze : bool, default False
If the parsed data only contains one column then return a Series.
prefix : str, optional
Prefix to add to column numbers when no header, e.g. 'X' for X0, X1, ...
mangle_dupe_cols : bool, default True
Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
'X'...'X'. Passing in False will cause data to be overwritten if there
are duplicate names in the columns.
dtype : Type name or dict of column -> type, optional
Data type for data or columns. E.g. {{'a': np.float64, 'b': np.int32,
'c': 'Int64'}}
Use `str` or `object` together with suitable `na_values` settings
to preserve and not interpret dtype.
If converters are specified, they will be applied INSTEAD
of dtype conversion.
engine : {{'c', 'python'}}, optional
Parser engine to use. The C engine is faster while the python engine is
currently more feature-complete.
converters : dict, optional
Dict of functions for converting values in certain columns. Keys can either
be integers or column labels.
true_values : list, optional
Values to consider as True.
false_values : list, optional
Values to consider as False.
skipinitialspace : bool, default False
Skip spaces after delimiter.
skiprows : list-like, int or callable, optional
Line numbers to skip (0-indexed) or number of lines to skip (int)
at the start of the file.
If callable, the callable function will be evaluated against the row
indices, returning True if the row should be skipped and False otherwise.
An example of a valid callable argument would be ``lambda x: x in [0, 2]``.
skipfooter : int, default 0
Number of lines at bottom of file to skip (Unsupported with engine='c').
nrows : int, optional
Number of rows of file to read. Useful for reading pieces of large files.
na_values : scalar, str, list-like, or dict, optional
Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted as
NaN: 'z', '�F z )�subsequent_indenta" |