3
���h�i � @ s� d Z ddlmZ ddlZddlmZmZ ddlZddlZddl Z ddl
mZmZm
Z
mZmZmZmZmZmZmZmZmZmZmZ ddlZddlmZ ddlmZ ddlj j!Z"ddl#m$Z$m%Z%m&Z&m'Z' dd l(m)Z* dd
l+m,Z, ddl-m.Z.m/Z/m0Z0m1Z1 ddl2m3Z3 dd
l4m5Z5m6Z6m7Z7m8Z8m9Z9m:Z:m;Z;m<Z< ddl=m>Z>m?Z? ddl@mAZA ddlBjCjDZDddlEmFZFmGZG ddlHmIZImJZJmKZK ddlLjCjMZNddlOmPZP ddlQmRZR ddlSmTZTmUZU ddlVmWZWmXZXmYZY ddlZm[Z[ ddl\m]Z] ddl^m_Z_ dZ`eadddd�ZbdZcdZdd Zed!ZfG d"d#� d#eJ�Zged$d%� �Zheeee eegef eeegef eeef f ZiG d&d'� d'eJeKee% �Zjed(eRd)�ZkG d*d+� d+eje% �Zle1el�d2eReei emd.eneneneneneneneld/�d0d1��ZodS )3a
Provide the groupby split-apply-combine paradigm. Define the GroupBy
class providing the base-class of operations.
The SeriesGroupBy and DataFrameGroupBy sub-class
(defined in pandas.core.groupby.generic)
expose these user-facing objects to provide specific functionality.
� )�contextmanagerN)�partial�wraps)�Callable�Dict� FrozenSet�Generic�Hashable�Iterable�List�Mapping�Optional�Sequence�Tuple�Type�TypeVar�Union)�option_context)� Timestamp)�F�
FrameOrSeries�FrameOrSeriesUnion�Scalar)�function)�AbstractMethodError)�Appender�Substitution�cache_readonly�doc)�maybe_cast_result)�ensure_float�
is_bool_dtype�is_datetime64_dtype�is_extension_array_dtype�is_integer_dtype�is_numeric_dtype�is_object_dtype� is_scalar)�isna�notna)�nanops)�Categorical�
DatetimeArray)� DataError�PandasObject�SelectionMixin)� DataFrame)�NDFrame)�base�ops)�CategoricalIndex�Index�
MultiIndex)�Series)�get_group_index_sorter)�maybe_use_numbazV
See Also
--------
Series.%(name)s
DataFrame.%(name)s
a�
Apply function `func` group-wise and combine the results together.
The function passed to `apply` must take a {input} as its first
argument and return a DataFrame, Series or scalar. `apply` will
then take care of combining the results back together into a single
dataframe or series. `apply` is therefore a highly flexible
grouping method.
While `apply` is a very flexible method, its downside is that
using it can be quite a bit slower than using more specific methods
like `agg` or `transform`. Pandas offers a wide range of method that will
be much faster than using `apply` for their specific purposes, so try to
use them before reaching for `apply`.
Parameters
----------
func : callable
A callable that takes a {input} as its first argument, and
returns a dataframe, a series or a scalar. In addition the
callable may take positional and keyword arguments.
args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to `func`.
Returns
-------
applied : Series or DataFrame
See Also
--------
pipe : Apply function to the full GroupBy object instead of to each
group.
aggregate : Apply aggregate function to the GroupBy object.
transform : Apply function column-by-column to the GroupBy object.
Series.apply : Apply a function to a Series.
DataFrame.apply : Apply a function to each row or column of a DataFrame.
a�
>>> df = pd.DataFrame({'A': 'a a b'.split(),
'B': [1,2,3],
'C': [4,6, 5]})
>>> g = df.groupby('A')
Notice that ``g`` has two groups, ``a`` and ``b``.
Calling `apply` in various ways, we can get different grouping results:
Example 1: below the function passed to `apply` takes a DataFrame as
its argument and returns a DataFrame. `apply` combines the result for
each group together into a new DataFrame:
>>> g[['B', 'C']].apply(lambda x: x / x.sum())
B C
0 0.333333 0.4
1 0.666667 0.6
2 1.000000 1.0
Example 2: The function passed to `apply` takes a DataFrame as
its argument and returns a Series. `apply` combines the result for
each group together into a new DataFrame:
>>> g[['B', 'C']].apply(lambda x: x.max() - x.min())
B C
A
a 1 2
b 0 0
Example 3: The function passed to `apply` takes a DataFrame as
its argument and returns a scalar. `apply` combines the result for
each group together into a Series, including setting the index as
appropriate:
>>> g.apply(lambda x: x.C.max() - x.B.min())
A
a 5
b 2
dtype: int64
a�
>>> s = pd.Series([0, 1, 2], index='a a b'.split())
>>> g = s.groupby(s.index)
From ``s`` above we can see that ``g`` has two groups, ``a`` and ``b``.
Calling `apply` in various ways, we can get different grouping results:
Example 1: The function passed to `apply` takes a Series as
its argument and returns a Series. `apply` combines the result for
each group together into a new Series:
>>> g.apply(lambda x: x*2 if x.name == 'b' else x/2)
0 0.0
1 0.5
2 4.0
dtype: float64
Example 2: The function passed to `apply` takes a Series as
its argument and returns a scalar. `apply` combines the result for
each group together into a Series, including setting the index as
appropriate:
>>> g.apply(lambda x: x.max() - x.min())
a 1
b 0
dtype: int64
Notes
-----
In the current implementation `apply` calls `func` twice on the
first group to decide whether it can take a fast or slow code
path. This can lead to unexpected behavior if `func` has
side-effects, as they will take effect twice for the first
group.
Examples
--------
{examples}
)�template�dataframe_examplesZseries_examplesa�
Compute {fname} of group values.
Parameters
----------
numeric_only : bool, default {no}
Include only float, int, boolean columns. If None, will attempt to use
everything, then use only numeric data.
min_count : int, default {mc}
The required number of valid values to perform the operation. If fewer
than ``min_count`` non-NA values are present the result will be NA.
Returns
-------
Series or DataFrame
Computed {fname} of values within each group.
as
Apply a function `func` with arguments to this %(klass)s object and return
the function's result.
%(versionadded)s
Use `.pipe` when you want to improve readability by chaining together
functions that expect Series, DataFrames, GroupBy or Resampler objects.
Instead of writing
>>> h(g(f(df.groupby('group')), arg1=a), arg2=b, arg3=c) # doctest: +SKIP
You can write
>>> (df.groupby('group')
... .pipe(f)
... .pipe(g, arg1=a)
... .pipe(h, arg2=b, arg3=c)) # doctest: +SKIP
which is much more readable.
Parameters
----------
func : callable or tuple of (callable, str)
Function to apply to this %(klass)s object or, alternatively,
a `(callable, data_keyword)` tuple where `data_keyword` is a
string indicating the keyword of `callable` that expects the
%(klass)s object.
args : iterable, optional
Positional arguments passed into `func`.
kwargs : dict, optional
A dictionary of keyword arguments passed into `func`.
Returns
-------
object : the return type of `func`.
See Also
--------
Series.pipe : Apply a function with arguments to a series.
DataFrame.pipe: Apply a function with arguments to a dataframe.
apply : Apply function to each group instead of to the
full %(klass)s object.
Notes
-----
See more `here
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#piping-function-calls>`_
Examples
--------
%(examples)s
a�
Call function producing a like-indexed %(klass)s on each group and
return a %(klass)s having the same indexes as the original object
filled with the transformed values
Parameters
----------
f : function
Function to apply to each group.
Can also accept a Numba JIT function with
``engine='numba'`` specified.
If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.
.. versionchanged:: 1.1.0
*args
Positional arguments to pass to func
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``
.. versionadded:: 1.1.0
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{'nopython': True, 'nogil': False, 'parallel': False}`` and will be
applied to the function
.. versionadded:: 1.1.0
**kwargs
Keyword arguments to be passed into func.
Returns
-------
%(klass)s
See Also
--------
%(klass)s.groupby.apply
%(klass)s.groupby.aggregate
%(klass)s.transform
Notes
-----
Each group is endowed the attribute 'name' in case you need to know
which group you are working on.
The current implementation imposes three requirements on f:
* f must return a value that either has the same shape as the input
subframe or can be broadcast to the shape of the input subframe.
For example, if `f` returns a scalar it will be broadcast to have the
same shape as the input subframe.
* if this is a DataFrame, f must support application column-by-column
in the subframe. If f also supports application to the entire subframe,
then a fast path is used starting from the second chunk.
* f must not mutate groups. Mutation is not supported and may
produce unexpected results.
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.
Examples
--------
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : ['one', 'one', 'two', 'three',
... 'two', 'two'],
... 'C' : [1, 5, 5, 2, 5, 5],
... 'D' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
C D
0 -1.154701 -0.577350
1 0.577350 0.000000
2 0.577350 1.154701
3 -1.154701 -1.000000
4 0.577350 -0.577350
5 0.577350 1.000000
Broadcast result of the transformation
>>> grouped.transform(lambda x: x.max() - x.min())
C D
0 4 6.0
1 3 8.0
2 4 6.0
3 3 8.0
4 4 6.0
5 3 8.0
aO
Aggregate using one or more operations over the specified axis.
Parameters
----------
func : function, str, list or dict
Function to use for aggregating the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply.
Accepted combinations are:
- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
Can also accept a Numba JIT function with
``engine='numba'`` specified.
If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.
.. versionchanged:: 1.1.0
*args
Positional arguments to pass to func
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``
.. versionadded:: 1.1.0
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{{'nopython': True, 'nogil': False, 'parallel': False}}`` and will be
applied to the function
.. versionadded:: 1.1.0
**kwargs
Keyword arguments to be passed into func.
Returns
-------
{klass}
See Also
--------
{klass}.groupby.apply
{klass}.groupby.transform
{klass}.aggregate
Notes
-----
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.
{examples}
c @ s. e Zd ZdZdd� Zdd� Zed�dd�Zd S )
�GroupByPlotzE
Class implementing the .plot attribute for groupby objects.
c C s
|| _ d S )N)�_groupby)�self�groupby� r@ �=/tmp/pip-build-5_djhm0z/pandas/pandas/core/groupby/groupby.py�__init__� s zGroupByPlot.__init__c s � �fdd�}d|_ | jj|�S )Nc s | j � ��S )N)�plot)r> |