3
���h�r �
@ sR d Z ddlmZmZmZmZmZ ddlZddlZ ddl
mZ ddlm
Z
ddlmZ ddlmZmZmZmZmZ ddlmZ ddljjZdd lmZmZ ddljjZ dd
l!m"Z" ddl#m$Z$ ddl%m&Z&m'Z' dd
l(m)Z)m*Z*m+Z+ ddl,m-Z- ddl.m/Z/ G dd� d�Z0G dd� d�Z1d ee2e3e3e3e3e3dd�dd�Z4e3d�dd�Z5e*d�dd�Z6dS )!z]
Provide user facing operators for doing the split part of the
split-apply-combine paradigm.
� )�Dict�Hashable�List�Optional�TupleN)�
FrameOrSeries)�InvalidIndexError)�cache_readonly)�is_categorical_dtype�is_datetime64_dtype�is_list_like� is_scalar�is_timedelta64_dtype)� ABCSeries)�Categorical�ExtensionArray)� DataFrame)�ops)�recode_for_groupby�recode_from_groupby)�CategoricalIndex�Index�
MultiIndex)�Series)�pprint_thingc s� e Zd ZU dZdZeedf � fdd �Zddd�Ze dd� �Z
ded�dd�Zd e
ed�dd�Ze dd� �Zed�dd�Z� ZS )!�Groupera
A Grouper allows the user to specify a groupby instruction for an object.
This specification will select a column via the key parameter, or if the
level and/or axis parameters are given, a level of the index of the target
object.
If `axis` and/or `level` are passed as keywords to both `Grouper` and
`groupby`, the values passed to `Grouper` take precedence.
Parameters
----------
key : str, defaults to None
Groupby key, which selects the grouping column of the target.
level : name/number, defaults to None
The level for the target index.
freq : str / frequency object, defaults to None
This will groupby the specified frequency if the target selection
(via key or level) is a datetime-like object. For full specification
of available frequencies, please see `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_.
axis : str, int, defaults to 0
Number/name of the axis.
sort : bool, default to False
Whether to sort the resulting labels.
closed : {'left' or 'right'}
Closed end of interval. Only when `freq` parameter is passed.
label : {'left' or 'right'}
Interval boundary to use for labeling.
Only when `freq` parameter is passed.
convention : {'start', 'end', 'e', 's'}
If grouper is PeriodIndex and `freq` parameter is passed.
base : int, default 0
Only when `freq` parameter is passed.
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0.
.. deprecated:: 1.1.0
The new arguments that you should use are 'offset' or 'origin'.
loffset : str, DateOffset, timedelta object
Only when `freq` parameter is passed.
.. deprecated:: 1.1.0
loffset is only working for ``.resample(...)`` and not for
Grouper (:issue:`28302`).
However, loffset is also deprecated for ``.resample(...)``
See: :class:`DataFrame.resample`
origin : {'epoch', 'start', 'start_day'}, Timestamp or str, default 'start_day'
The timestamp on which to adjust the grouping. The timezone of origin must
match the timezone of the index.
If a timestamp is not used, these values are also supported:
- 'epoch': `origin` is 1970-01-01
- 'start': `origin` is the first value of the timeseries
- 'start_day': `origin` is the first day at midnight of the timeseries
.. versionadded:: 1.1.0
offset : Timedelta or str, default is None
An offset timedelta added to the origin.
.. versionadded:: 1.1.0
Returns
-------
A specification for a groupby instruction
Examples
--------
Syntactic sugar for ``df.groupby('A')``
>>> df = pd.DataFrame(
... {
... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"],
... "Speed": [100, 5, 200, 300, 15],
... }
... )
>>> df
Animal Speed
0 Falcon 100
1 Parrot 5
2 Falcon 200
3 Falcon 300
4 Parrot 15
>>> df.groupby(pd.Grouper(key="Animal")).mean()
Speed
Animal
Falcon 200
Parrot 10
Specify a resample operation on the column 'Publish date'
>>> df = pd.DataFrame(
... {
... "Publish date": [
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-02"),
... pd.Timestamp("2000-01-09"),
... pd.Timestamp("2000-01-16")
... ],
... "ID": [0, 1, 2, 3],
... "Price": [10, 20, 30, 40]
... }
... )
>>> df
Publish date ID Price
0 2000-01-02 0 10
1 2000-01-02 1 20
2 2000-01-09 2 30
3 2000-01-16 3 40
>>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean()
ID Price
Publish date
2000-01-02 0.5 15.0
2000-01-09 2.0 30.0
2000-01-16 3.0 40.0
If you want to adjust the start of the bins based on a fixed timestamp:
>>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00'
>>> rng = pd.date_range(start, end, freq='7min')
>>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng)
>>> ts
2000-10-01 23:30:00 0
2000-10-01 23:37:00 3
2000-10-01 23:44:00 6
2000-10-01 23:51:00 9
2000-10-01 23:58:00 12
2000-10-02 00:05:00 15
2000-10-02 00:12:00 18
2000-10-02 00:19:00 21
2000-10-02 00:26:00 24
Freq: 7T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min')).sum()
2000-10-01 23:14:00 0
2000-10-01 23:31:00 9
2000-10-01 23:48:00 21
2000-10-02 00:05:00 54
2000-10-02 00:22:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum()
2000-10-01 23:18:00 0
2000-10-01 23:35:00 18
2000-10-01 23:52:00 27
2000-10-02 00:09:00 39
2000-10-02 00:26:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', origin='2000-01-01')).sum()
2000-10-01 23:24:00 3
2000-10-01 23:41:00 15
2000-10-01 23:58:00 45
2000-10-02 00:15:00 45
Freq: 17T, dtype: int64
If you want to adjust the start of the bins with an `offset` Timedelta, the two
following lines are equivalent:
>>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
>>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum()
2000-10-01 23:30:00 9
2000-10-01 23:47:00 21
2000-10-02 00:04:00 54
2000-10-02 00:21:00 24
Freq: 17T, dtype: int64
To replace the use of the deprecated `base` argument, you can now use `offset`,
in this example it is equivalent to have `base=2`:
>>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum()
2000-10-01 23:16:00 0
2000-10-01 23:33:00 9
2000-10-01 23:50:00 36
2000-10-02 00:07:00 39
2000-10-02 00:24:00 24
Freq: 17T, dtype: int64
�key�level�freq�axis�sort.c sz |j d�d k rnddlm} | |kr&dnd}|j dd �d k rJtjdt|d� |j d d �d k rjtjd
t|d� |} t� j| �S )Nr r )�TimeGrouper� � �basez�'base' in .resample() and in Grouper() is deprecated.
The new arguments that you should use are 'offset' or 'origin'.
>>> df.resample(freq="3s", base=2)
becomes:
>>> df.resample(freq="3s", offset="2s")
)�
stacklevelZloffseta 'loffset' in .resample() and in Grouper() is deprecated.
>>> df.resample(freq="3s", loffset="8H")
becomes:
>>> from pandas.tseries.frequencies import to_offset
>>> df = df.resample(freq="3s").mean()
>>> df.index = df.index.to_timestamp() + to_offset("8H")
)�getZpandas.core.resampler! �warnings�warn�
FutureWarning�super�__new__)�cls�args�kwargsr! r% )� __class__� �=/tmp/pip-build-5_djhm0z/pandas/pandas/core/groupby/grouper.pyr+ � s zGrouper.__new__Nr FTc C sF || _ || _|| _|| _|| _d | _d | _d | _d | _d | _ || _
d S )N)r r r r r �grouper�obj�indexer�binner�_grouper�dropna)�selfr r r r r r7 r0 r0 r1 �__init__ s zGrouper.__init__c C s | j S )N)r2 )r8 r0 r0 r1 �ax s z
Grouper.ax)�validatec C sH | j |� t| j| jg| j| j| j|| jd�\| _}| _| j | j| jfS )z�
Parameters
----------
obj : the subject object
validate : boolean, default True
if True, validate the grouper
Returns
-------
a tuple of binner, grouper, obj (possibly sorted)
)r r r r; r7 )
�_set_grouper�get_grouperr3 r r r r r7 r2 r5 )r8 r3 r; �_r0 r0 r1 �_get_grouper" s
zGrouper._get_grouper)r3 r c C sd |dk st �| jdk r(| jdk r(td��| jdkr:| j| _| jdk r�| j}t| jdd�|krvt|t�rv| jj |j
�}n*||jkr�td|� d���t
|| |d�}nl|j| j�}| jdk �r| j}t|t�r�|j|�}t
|j|�|j| d�}n |d|jfk�rtd|� d ���| j�s|�rR|j �rR|jd
d� }| _|j |�}|j || jd�}|| _|| _| jS )
a%
given an object and the specifications, setup the internal grouper
for this particular specification
Parameters
----------
obj : Series or DataFrame
sort : bool, default False
whether the resulting grouper should be sorted
Nz2The Grouper cannot specify both a key and a level!�namezThe grouper name z
is not found)r@ r z
The level z
is not validZ mergesort)�kind)r )�AssertionErrorr r �
ValueErrorr6 r2 �getattr�
isinstancer Ztake�indexZ
_info_axis�KeyErrorr � _get_axisr r Z_get_level_numberZ_get_level_values�namesr@ r Zis_monotonicZargsortr4 r3 )r8 r3 r r r: r r4 r0 r0 r1 r< |