3
���h# � @ s� d Z ddlZddlmZ ddlZddljjZddl m
Z
mZmZ ddl
mZmZmZmZ dZed�dd �Zd
ded
feeee ed�d
d�Zdefed�dd�Zeed�dd�Zded
feeed�dd�ZdS )z"
data hash pandas / numpy objects
� N)�Optional)�is_categorical_dtype�is_extension_array_dtype�is_list_like)�ABCDataFrame�
ABCIndexClass�
ABCMultiIndex� ABCSeriesZ0123456789123456)� num_itemsc
C s� yt | �}W n tk
r, tjg tjd�S X tj|g| �} tjd�}tj|�tjd� }xBt| �D ]6\}}|| }||N }||9 }|tjd| | �7 }qdW |d |ks�t d��|tjd�7 }|S )z�
Parameters
----------
arrays : generator
num_items : int
Should be the same as CPython's tupleobject.c
)�dtypeiCB ixV4 iXB � zFed in wrong num_itemsi�| )
�next�
StopIteration�np�array�uint64� itertools�chainZ
zeros_like� enumerate�AssertionError)�arraysr
�firstZmult�out�i�aZ inverse_i� r �:/tmp/pip-build-5_djhm0z/pandas/pandas/core/util/hashing.py�_combine_hash_arrays s
r T�utf8)�index�encoding�hash_key�
categorizec
sx ddl m} �dkrt�t�t�r8|t����ddd�S t�t�rpt�j��� �j ddd�}||�ddd�}�nt�t
�r�t�j��� �j ddd�}|rȇ ���fd d
�dD �}tj|g|�}t
|d�}||�jddd�}n�t�t��rbdd
� �j� D �} t�j�}
|�rD� ���fd
d
�dD �}|
d7 }
tj| |�}dd
� |D �} t
| |
�}||�jddd�}ntdt��� ���|S )aX
Return a data hash of the Index/Series/DataFrame.
Parameters
----------
index : bool, default True
Include the index in the hash (if Series/DataFrame).
encoding : str, default 'utf8'
Encoding for data & key when strings.
hash_key : str, default _default_hash_key
Hash_key for string key to encode.
categorize : bool, default True
Whether to first categorize object arrays before hashing. This is more
efficient when the array contains duplicate values.
Returns
-------
Series of uint64, same length as the object
r )�SeriesNr F)r �copy)r$ )r r r$ c 3 s$ | ]}t �jd ��� d�jV qdS )F)r r r! r" |