HOME

sh-3ll 1.0
DIR:/usr/local/lib64/python3.6/site-packages/pandas/core/groupby/__pycache__/
Current File : //usr/local/lib64/python3.6/site-packages/pandas/core/groupby/__pycache__/grouper.cpython-36.pyc
3

���h�r�
@sRdZddlmZmZmZmZmZddlZddlZ	ddl
mZddlm
Z
ddlmZddlmZmZmZmZmZddlmZddljjZdd	lmZmZddljjZ dd
l!m"Z"ddl#m$Z$ddl%m&Z&m'Z'dd
l(m)Z)m*Z*m+Z+ddl,m-Z-ddl.m/Z/Gdd�d�Z0Gdd�d�Z1d ee2e3e3e3e3e3dd�dd�Z4e3d�dd�Z5e*d�dd�Z6dS)!z]
Provide user facing operators for doing the split part of the
split-apply-combine paradigm.
�)�Dict�Hashable�List�Optional�TupleN)�
FrameOrSeries)�InvalidIndexError)�cache_readonly)�is_categorical_dtype�is_datetime64_dtype�is_list_like�	is_scalar�is_timedelta64_dtype)�	ABCSeries)�Categorical�ExtensionArray)�	DataFrame)�ops)�recode_for_groupby�recode_from_groupby)�CategoricalIndex�Index�
MultiIndex)�Series)�pprint_thingcs�eZdZUdZdZeedf�fdd	�Zddd�Ze	dd��Z
ded�dd�Zd e
ed�dd�Ze	dd��Zed�dd�Z�ZS)!�Groupera
    A Grouper allows the user to specify a groupby instruction for an object.

    This specification will select a column via the key parameter, or if the
    level and/or axis parameters are given, a level of the index of the target
    object.

    If `axis` and/or `level` are passed as keywords to both `Grouper` and
    `groupby`, the values passed to `Grouper` take precedence.

    Parameters
    ----------
    key : str, defaults to None
        Groupby key, which selects the grouping column of the target.
    level : name/number, defaults to None
        The level for the target index.
    freq : str / frequency object, defaults to None
        This will groupby the specified frequency if the target selection
        (via key or level) is a datetime-like object. For full specification
        of available frequencies, please see `here
        <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_.
    axis : str, int, defaults to 0
        Number/name of the axis.
    sort : bool, default to False
        Whether to sort the resulting labels.
    closed : {'left' or 'right'}
        Closed end of interval. Only when `freq` parameter is passed.
    label : {'left' or 'right'}
        Interval boundary to use for labeling.
        Only when `freq` parameter is passed.
    convention : {'start', 'end', 'e', 's'}
        If grouper is PeriodIndex and `freq` parameter is passed.
    base : int, default 0
        Only when `freq` parameter is passed.
        For frequencies that evenly subdivide 1 day, the "origin" of the
        aggregated intervals. For example, for '5min' frequency, base could
        range from 0 through 4. Defaults to 0.

        .. deprecated:: 1.1.0
            The new arguments that you should use are 'offset' or 'origin'.

    loffset : str, DateOffset, timedelta object
        Only when `freq` parameter is passed.

        .. deprecated:: 1.1.0
            loffset is only working for ``.resample(...)`` and not for
            Grouper (:issue:`28302`).
            However, loffset is also deprecated for ``.resample(...)``
            See: :class:`DataFrame.resample`

    origin : {'epoch', 'start', 'start_day'}, Timestamp or str, default 'start_day'
        The timestamp on which to adjust the grouping. The timezone of origin must
        match the timezone of the index.
        If a timestamp is not used, these values are also supported:

        - 'epoch': `origin` is 1970-01-01
        - 'start': `origin` is the first value of the timeseries
        - 'start_day': `origin` is the first day at midnight of the timeseries

        .. versionadded:: 1.1.0

    offset : Timedelta or str, default is None
        An offset timedelta added to the origin.

        .. versionadded:: 1.1.0

    Returns
    -------
    A specification for a groupby instruction

    Examples
    --------
    Syntactic sugar for ``df.groupby('A')``

    >>> df = pd.DataFrame(
    ...     {
    ...         "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
    ...         "Speed": [100, 5, 200, 300, 15],
    ...     }
    ... )
    >>> df
       Animal  Speed
    0  Falcon    100
    1  Parrot      5
    2  Falcon    200
    3  Falcon    300
    4  Parrot     15
    >>> df.groupby(pd.Grouper(key="Animal")).mean()
            Speed
    Animal
    Falcon    200
    Parrot     10

    Specify a resample operation on the column 'Publish date'

    >>> df = pd.DataFrame(
    ...    {
    ...        "Publish date": [
    ...             pd.Timestamp("2000-01-02"),
    ...             pd.Timestamp("2000-01-02"),
    ...             pd.Timestamp("2000-01-09"),
    ...             pd.Timestamp("2000-01-16")
    ...         ],
    ...         "ID": [0, 1, 2, 3],
    ...         "Price": [10, 20, 30, 40]
    ...     }
    ... )
    >>> df
      Publish date  ID  Price
    0   2000-01-02   0     10
    1   2000-01-02   1     20
    2   2000-01-09   2     30
    3   2000-01-16   3     40
    >>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
                   ID  Price
    Publish date
    2000-01-02    0.5   15.0
    2000-01-09    2.0   30.0
    2000-01-16    3.0   40.0

    If you want to adjust the start of the bins based on a fixed timestamp:

    >>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
    >>> rng = pd.date_range(start, end, freq='7min')
    >>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
    >>> ts
    2000-10-01 23:30:00     0
    2000-10-01 23:37:00     3
    2000-10-01 23:44:00     6
    2000-10-01 23:51:00     9
    2000-10-01 23:58:00    12
    2000-10-02 00:05:00    15
    2000-10-02 00:12:00    18
    2000-10-02 00:19:00    21
    2000-10-02 00:26:00    24
    Freq: 7T, dtype: int64

    >>> ts.groupby(pd.Grouper(freq='17min')).sum()
    2000-10-01 23:14:00     0
    2000-10-01 23:31:00     9
    2000-10-01 23:48:00    21
    2000-10-02 00:05:00    54
    2000-10-02 00:22:00    24
    Freq: 17T, dtype: int64

    >>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum()
    2000-10-01 23:18:00     0
    2000-10-01 23:35:00    18
    2000-10-01 23:52:00    27
    2000-10-02 00:09:00    39
    2000-10-02 00:26:00    24
    Freq: 17T, dtype: int64

    >>> ts.groupby(pd.Grouper(freq='17min', origin='2000-01-01')).sum()
    2000-10-01 23:24:00     3
    2000-10-01 23:41:00    15
    2000-10-01 23:58:00    45
    2000-10-02 00:15:00    45
    Freq: 17T, dtype: int64

    If you want to adjust the start of the bins with an `offset` Timedelta, the two
    following lines are equivalent:

    >>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum()
    2000-10-01 23:30:00     9
    2000-10-01 23:47:00    21
    2000-10-02 00:04:00    54
    2000-10-02 00:21:00    24
    Freq: 17T, dtype: int64

    >>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum()
    2000-10-01 23:30:00     9
    2000-10-01 23:47:00    21
    2000-10-02 00:04:00    54
    2000-10-02 00:21:00    24
    Freq: 17T, dtype: int64

    To replace the use of the deprecated `base` argument, you can now use `offset`,
    in this example it is equivalent to have `base=2`:

    >>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum()
    2000-10-01 23:16:00     0
    2000-10-01 23:33:00     9
    2000-10-01 23:50:00    36
    2000-10-02 00:07:00    39
    2000-10-02 00:24:00    24
    Freq: 17T, dtype: int64
    �key�level�freq�axis�sort.csz|jd�dk	rnddlm}||kr&dnd}|jdd�dk	rJtjdt|d�|jd	d�dk	rjtjd
t|d�|}t�j|�S)Nrr)�TimeGrouper���basez�'base' in .resample() and in Grouper() is deprecated.
The new arguments that you should use are 'offset' or 'origin'.

>>> df.resample(freq="3s", base=2)

becomes:

>>> df.resample(freq="3s", offset="2s")
)�
stacklevelZloffseta'loffset' in .resample() and in Grouper() is deprecated.

>>> df.resample(freq="3s", loffset="8H")

becomes:

>>> from pandas.tseries.frequencies import to_offset
>>> df = df.resample(freq="3s").mean()
>>> df.index = df.index.to_timestamp() + to_offset("8H")
)�getZpandas.core.resampler!�warnings�warn�
FutureWarning�super�__new__)�cls�args�kwargsr!r%)�	__class__��=/tmp/pip-build-5_djhm0z/pandas/pandas/core/groupby/grouper.pyr+�szGrouper.__new__NrFTcCsF||_||_||_||_||_d|_d|_d|_d|_d|_	||_
dS)N)rrrrr �grouper�obj�indexer�binner�_grouper�dropna)�selfrrrrr r7r0r0r1�__init__szGrouper.__init__cCs|jS)N)r2)r8r0r0r1�axsz
Grouper.ax)�validatec	CsH|j|�t|j|jg|j|j|j||jd�\|_}|_|j	|j|jfS)z�
        Parameters
        ----------
        obj : the subject object
        validate : boolean, default True
            if True, validate the grouper

        Returns
        -------
        a tuple of binner, grouper, obj (possibly sorted)
        )rrr r;r7)
�_set_grouper�get_grouperr3rrrr r7r2r5)r8r3r;�_r0r0r1�_get_grouper"s
zGrouper._get_grouper)r3r cCsd|dk	st�|jdk	r(|jdk	r(td��|jdkr:|j|_|jdk	r�|j}t|jdd�|krvt|t�rv|jj	|j
�}n*||jkr�td|�d���t
|||d�}nl|j|j�}|jdk	�r|j}t|t�r�|j|�}t
|j|�|j|d�}n |d|jfk�rtd|�d	���|j�s|�rR|j�rR|jd
d�}|_|j	|�}|j	||jd�}||_||_|jS)
a%
        given an object and the specifications, setup the internal grouper
        for this particular specification

        Parameters
        ----------
        obj : Series or DataFrame
        sort : bool, default False
            whether the resulting grouper should be sorted
        Nz2The Grouper cannot specify both a key and a level!�namezThe grouper name z
 is not found)r@rz
The level z
 is not validZ	mergesort)�kind)r)�AssertionErrorrr�
ValueErrorr6r2�getattr�
isinstancerZtake�indexZ
_info_axis�KeyErrorr�	_get_axisrrZ_get_level_numberZ_get_level_values�namesr@r Zis_monotonicZargsortr4r3)r8r3r rr:rr4r0r0r1r<:s8





zGrouper._set_groupercCs|jjS)N)r2�groups)r8r0r0r1rJuszGrouper.groups)�returncs8�fdd��jD�}dj|�}t��j}|�d|�d�S)Nc3s4|],}t�|�dk	r|�dtt�|����VqdS)N�=)rD�repr)�.0�	attr_name)r8r0r1�	<genexpr>{sz#Grouper.__repr__.<locals>.<genexpr>z, �(�))�_attributes�join�type�__name__)r8Z
attrs_list�attrsZcls_namer0)r8r1�__repr__ys




zGrouper.__repr__)rrrrr )NNNrFT)T)F)rV�
__module__�__qualname__�__doc__rSr�strr+r9�propertyr:�boolr?rr<rJrX�
__classcell__r0r0)r/r1r#s
=,
;rc@s�eZdZUdZdeeeeeeed�dd�Ze	d�d	d
�Z
dd�ZdZee
jdZeeeed�d
d��Zedd��Zee
jd�dd��Zeed�dd��Zeed�dd��Zdd�dd�Zeeee
jfd�dd��ZdS)�GroupingaN
    Holds the grouping information for a single key

    Parameters
    ----------
    index : Index
    grouper :
    obj Union[DataFrame, Series]:
    name : Label
    level :
    observed : bool, default False
        If we are a Categorical, use the observed values
    in_axis : if the Grouping is a column in self.obj and hence among
        Groupby.exclusions list

    Returns
    -------
    **Attributes**:
      * indices : dict of {group -> index_list}
      * codes : ndarray, group codes
      * group_index : unique groups
      * groups : dict of {group -> label_list}
    NTF)rFr3r �observed�in_axisr7c
Cs8||_||_t||�|_d|_||_||_||_||_||_	|	|_
t|tt
f�r`|dkr`|j|_t|t�rr|j|_|dk	r�t|t�s�||jkr�td|�d���|jj|�}|jdkr�|j||_|j|j|�\|_|_|_�nt|jt��r2|jj|jdd�\}
}}
|jdk�r|jj|_|jj|_|j�|_�n�|jdk�rf|jdk	�rf|jdk	�rf|j|j|_n�t|jttf��r�tj|j�|_n�t|j��r(t|j|j|�\|_|_|jj}|jj |_|�r�t!j"|jj �}||dk}|�s�|jj#�rt$j|�}nt$j%t&|��}t't(j)|||jj#d�|jd�|_t|jt*��rB|jj|_n�t|jtt
t+t$j,f��s�t-|jdd�dk�r�|j�p�t.t/|j��}
t0d	|
�d
���|jj1|j�|_t2|jd��o�t&|j�t&|j�k�s�t3|j�}d|��}d|_t|��t-|jd
d�dk	�r4t4|j��r|jj5d�|_nt6|j��r4|jj5d�|_dS)NzLevel z
 not in indexF)r;�)�codes�
categories�ordered)r@�ndimz
Grouper for 'z' not 1-dimensional�__len__z9Grouper result violates len(labels) == len(data)
result: �dtypezdatetime64[ns]ztimedelta64[ns]���)7r@r�_convert_grouperr2�all_grouperrFr r3rarbr7rErrr�_values�intrIrBZ_get_grouper_for_level�_codes�_group_indexrr?�result_index�list�tuple�com�asarray_tuplesafer
rrerd�
algorithmsZunique1drf�npZarange�lenrr�
from_codesr`r�ndarrayrDr\rUrC�map�hasattrrrZastyper)r8rFr2r3r@rr rarbr7r>rerd�tZgrper�errmsgr0r0r1r9�s�





$


zGrouping.__init__)rKcCsd|j�d�S)Nz	Grouping(rR)r@)r8r0r0r1rXszGrouping.__repr__cCs
t|j�S)N)�iter�indices)r8r0r0r1�__iter__szGrouping.__iter__cCs
t|j�S)N)rx�group_index)r8r0r0r1�ngroups%szGrouping.ngroupscsFt|jtj�r|jjStj|j|jd�\�}�fdd�tt	|��D�S)N)r cs i|]\}}tj�|k�|�qSr0)rwZflatnonzero)rN�i�category)rdr0r1�
<dictcomp>2sz$Grouping.indices.<locals>.<dictcomp>)
rEr2r�BaseGrouperr�rv�	factorizer �	enumerater)r8�uniquesr0)rdr1r�)s

zGrouping.indicescCs|jdkr|j�|jS)N)ro�_make_codes)r8r0r0r1rd7s
zGrouping.codescCs"|jdk	rt|j|j|j�S|jS)N)rlrr r�)r8r0r0r1rq=s
zGrouping.result_indexcCs&|jdkr|j�|jdk	s t�|jS)N)rpr�rB)r8r0r0r1r�Cs
zGrouping.group_indexcCsz|jdks|jdkrvt|jtj�r4|jj}|jj}n6|js@d}nd}t	j
|j|j|d�\}}t||j
d�}||_||_dS)Nrc)r �na_sentinel)r@rj)rorprEr2rr�Z
codes_inforqr7rvr�r rr@)r8rdr�r�r0r0r1r�Js
zGrouping._make_codescCs|jjtj|j|j��S)N)rF�groupbyrryrdr�)r8r0r0r1rJ]szGrouping.groups)NNNNTFFT)rVrYrZr[rrrr^r9r\rXr�rorwrzrpr]rnr�r	r�rdrqr�r�rrrJr0r0r0r1r`�s2
ur`TFz5Tuple[ops.BaseGrouper, List[Hashable], FrameOrSeries])r3rr ra�mutatedr;r7rKc	sn�j|�}	|dk	r�t|	t�rXt|�r8t|�dkr8|d}|dkr�t|�r�|	j|�}d}n�t|�r�t|�}
|
dkrz|d}n|
dkr�td��ntd��t|t�rʈj|�j	|kr�td|�d�j
|�����n|dks�|dkr�td��d}|	}t|t��r0|j�d	d
�\}}�|j
dk�r |g�fS||j
g�fSnt|tj��rH|g�fSt|t��s`|g}
d	}n|}
t|
�t|	�k}tdd�|
D��}td
d�|
D��}tdd�|
D��}|�r.|�r.|�r.|�r.|dk�r.t�t��r�t�fdd�|
D��}n&t�t��st�t�fdd�|
D��}|�s.tj|
�g}
t|ttf��r\|dk�rVdgt|�}
|}n|gt|
�}g}g}td��fdd�}td��fdd�}�xdtt|
|��D�]P\}\}}||��r�d|j	}}|j|�n�||��rP|�k�r |�r�j||d�d|�|}}}|j|�n.�j||d��rFd	d|df\}}}}nt|��n6t|t��r~|j
dk	�r~|j|j
�d \}}nd!\}}t |��r�t|��j!|k�r�tdt|��d�j!|�d���t|t"��s�t"|	|�||||||d�	n|}|j|��q�Wt|�dk�r t���r td��n2t|�dk�rR|jt"t#gdd�t$j%gt$j&d���tj|	|||d�}||�fS)"a�
    Create and return a BaseGrouper, which is an internal
    mapping of how to create the grouper indexers.
    This may be composed of multiple Grouping objects, indicating
    multiple groupers

    Groupers are ultimately index mappings. They can originate as:
    index mappings, keys to columns, functions, or Groupers

    Groupers enable local references to axis,level,sort, while
    the passed in axis, level, and sort are 'global'.

    This routine tries to figure out what the passing in references
    are and then creates a Grouping for each one, combined into
    a BaseGrouper.

    If observed & we have a categorical grouper, only show the observed
    values.

    If validate, then check for key/level overlaps.

    NrcrzNo group keys passed!z*multiple levels only valid with MultiIndexzlevel name z is not the name of the z2level > 0 or level < -1 only valid with MultiIndexF)r;css |]}t|�pt|t�VqdS)N)�callablerE�dict)rN�gr0r0r1rP�szget_grouper.<locals>.<genexpr>css|]}t|t�VqdS)N)rEr)rNr�r0r0r1rP�scss$|]}t|tttttjf�VqdS)N)rErrrsrrrwrz)rNr�r0r0r1rP�sc3s$|]}|�jkp|�jjkVqdS)N)�columnsrFrI)rNr�)r3r0r1rP�sc3s|]}|�jjkVqdS)N)rFrI)rNr�)r3r0r1rP�s)rKc
s@t|�s<�jd}y|j|�Wntttfk
r:dSXdS)NrcFTrj)�_is_label_likeZaxesZget_locrG�	TypeErrorr)r�items)r3r0r1�
is_in_axis�s
zget_grouper.<locals>.is_in_axisc
s<t|d�sdSy|�|jkStttfk
r6dSXdS)Nr@F)r|r@rG�
IndexErrorrC)�gpr)r3r0r1�	is_in_obj�s
zget_grouper.<locals>.is_in_objT)rzLength of grouper (z) and axis (z) must be same length)r3r@rr rarbr7rn)ri)r r�rj)FN)FN)'rHrErrrxr
Zget_level_valuesrCr\r@Z_get_axis_namerr?rrr�rr�anyr�allrrBrtrursr^r��zip�appendZ_check_label_or_level_ambiguityZ_is_level_referencerGr
�shaper`rrw�arrayZintp)r3rrrr rar�r;r7Z
group_axisZnlevelsr5r2�keysZmatch_axis_lengthZany_callableZany_groupersZ
any_arraylikeZall_in_columns_indexZlevelsZ	groupingsZ
exclusionsr�r�r�r�rbr@Zpingr0)r3r1r=bs�!
	










 




&
$r=)rKcCst|ttf�p|dk	ot|�S)N)rEr\rsr
)�valr0r0r1r�>sr�)rcCsrt|t�r|jSt|t�r:|jj|�r,|jS|j|�jSn4t|ttt	t
jf�rjt|�t|�krft
d��|S|SdS)Nz$Grouper and axis must be same length)rEr�r&rrF�equalsrmZreindexrrrrwrzrxrC)rr2r0r0r1rkBs

rk)NrNTFFTT)7r[Ztypingrrrrrr'ZnumpyrwZpandas._typingrZ
pandas.errorsrZpandas.util._decoratorsr	Zpandas.core.dtypes.commonr
rrr
rZpandas.core.dtypes.genericrZpandas.core.algorithms�corervZpandas.core.arraysrrZpandas.core.common�commonrtZpandas.core.framerZpandas.core.groupbyrZpandas.core.groupby.categoricalrrZpandas.core.indexes.apirrrZpandas.core.seriesrZpandas.io.formats.printingrrr`rnr^r=r�rkr0r0r0r1�<module>sDcaS