HOME


sh-3ll 1.0
DIR:/usr/local/lib64/python3.6/site-packages/pandas/core/util/__pycache__/
Upload File :
Current File : //usr/local/lib64/python3.6/site-packages/pandas/core/util/__pycache__/hashing.cpython-36.pyc
3

���h#�@s�dZddlZddlmZddlZddljjZddl	m
Z
mZmZddl
mZmZmZmZdZed�dd	�Zd
ded
feeeeed�d
d�Zdefed�dd�Zeed�dd�Zded
feeed�dd�ZdS)z"
data hash pandas / numpy objects
�N)�Optional)�is_categorical_dtype�is_extension_array_dtype�is_list_like)�ABCDataFrame�
ABCIndexClass�
ABCMultiIndex�	ABCSeriesZ0123456789123456)�	num_itemsc
Cs�yt|�}Wn tk
r,tjgtjd�SXtj|g|�}tjd�}tj|�tjd�}xBt|�D]6\}}||}||N}||9}|tjd||�7}qdW|d|ks�t	d��|tjd�7}|S)z�
    Parameters
    ----------
    arrays : generator
    num_items : int

    Should be the same as CPython's tupleobject.c
    )�dtypeiCBixV4iXB�zFed in wrong num_itemsi�|)
�next�
StopIteration�np�array�uint64�	itertools�chainZ
zeros_like�	enumerate�AssertionError)�arraysr
�firstZmult�out�i�aZ	inverse_i�r�:/tmp/pip-build-5_djhm0z/pandas/pandas/core/util/hashing.py�_combine_hash_arrayss	
rT�utf8)�index�encoding�hash_key�
categorizec
sxddlm}�dkrt�t�t�r8|t����ddd�St�t�rpt�j����j	ddd�}||�ddd�}�nt�t
�r�t�j����j	ddd�}|rȇ���fd	d
�dD�}tj|g|�}t
|d�}||�jddd�}n�t�t��rbdd
��j�D�}	t�j�}
|�rD����fd
d
�dD�}|
d7}
tj|	|�}dd
�|D�}	t
|	|
�}||�jddd�}ntdt������|S)aX
    Return a data hash of the Index/Series/DataFrame.

    Parameters
    ----------
    index : bool, default True
        Include the index in the hash (if Series/DataFrame).
    encoding : str, default 'utf8'
        Encoding for data & key when strings.
    hash_key : str, default _default_hash_key
        Hash_key for string key to encode.
    categorize : bool, default True
        Whether to first categorize object arrays before hashing. This is more
        efficient when the array contains duplicate values.

    Returns
    -------
    Series of uint64, same length as the object
    r)�SeriesNrF)r�copy)r$)rrr$c3s$|]}t�jd���d�jVqdS)F)rr r!r"N)�hash_pandas_objectr�_values)�.0�_)r"r r!�objrr�	<genexpr>esz%hash_pandas_object.<locals>.<genexpr>�css|]\}}t|j�VqdS)N)�
hash_arrayr&)r'r(Zseriesrrrr*tsc3s$|]}t�jd���d�jVqdS)F)rr r!r"N)r%rr&)r'r()r"r r!r)rrr*xsrcss|]
}|VqdS)Nr)r'�xrrrr*�szUnexpected type for hashing )N)N)�pandasr#�_default_hash_key�
isinstancer�hash_tuplesrr,r&�astyper	rrrrr�items�len�columns�	TypeError�type)
r)rr r!r"r#�hZ
index_iterr�hashesr
Zindex_hash_generatorZ_hashesr)r"r r!r)rr%7s>







r%)r!cs�d}t�t�r�g�d}nt��s*td��ddlm�m}t�t�sN|j�����fdd�t	�j
�D����fdd	��D�}t|t���}|r�|d}|S)
a
    Hash an MultiIndex / list-of-tuples efficiently

    Parameters
    ----------
    vals : MultiIndex, list-of-tuples, or single tuple
    encoding : str, default 'utf8'
    hash_key : str, default _default_hash_key

    Returns
    -------
    ndarray of hashed values array
    FTz'must be convertible to a list-of-tuplesr)�Categorical�
MultiIndexcs(g|] }��j|�j|ddd��qS)FT)�ordered�fastpath)�codesZlevels)r'�level)r:�valsrr�
<listcomp>�szhash_tuples.<locals>.<listcomp>c3s|]}t|��d�VqdS))r r!N)�_hash_categorical)r'�cat)r r!rrr*�szhash_tuples.<locals>.<genexpr>)
r0�tuplerr6r.r:r;r�from_tuples�rangeZnlevelsrr4)r@r r!Zis_tupler;r9r8r)r:r r!r@rr1�s 


r1)r r!cCsltj|jj�}t|||dd�}|j�}t|�r<|j|j�}ntj	t|�dd�}|j
�rhtjtj�j
||<|S)a
    Hash a Categorical by hashing its categories, and then mapping the codes
    to the hashes

    Parameters
    ----------
    c : Categorical
    encoding : str
    hash_key : str

    Returns
    -------
    ndarray of hashed values array, same size as len(c)
    F)r"r)r)rZasarray�
categoriesr&r,Zisnar4Ztaker>�zeros�anyZiinfor�max)�cr r!�values�hashed�mask�resultrrrrB�s	rB)r r!r"cCs�t|d�std��|j}t|�r,t|||�St|�rF|j�\}}|j}tj|tj	�rtt
tj|��dt
tj|��St
|t�r�|jd�}n�t|jtjtjf�r�|jd�jddd�}n�t|jtj�r�|jdkr�|jd	|jj���jd�}n�|�r2d
dlm}m}m}||dd�\}	}
||	||
�dd
d�}t|||�Sytj|||�}Wn0tk
�rttj|jt�jt�||�}YnX||d?N}|tjd�9}||d?N}|tjd�9}||d?N}|S)a9
    Given a 1d array, return an array of deterministic integers.

    Parameters
    ----------
    vals : ndarray, Categorical
    encoding : str, default 'utf8'
        Encoding for data & key when strings.
    hash_key : str, default _default_hash_key
        Hash_key for string key to encode.
    categorize : bool, default True
        Whether to first categorize object arrays before hashing. This is more
        efficient when the array contains duplicate values.

    Returns
    -------
    1d uint64 numpy array of hash values, same length as the vals
    rzmust pass a ndarray-like��u8�i8F)r$��ur)r:�Index�	factorize)�sortT)r<r=�l�e�9��z�l�b&�&�&	�) �hasattrr6rrrBrZ_values_for_factorizerZ
issubdtypeZ
complex128r,�real�imagr0�boolr2�
issubclassr7Z
datetime64Ztimedelta64�view�number�itemsizer.r:rUrV�hashingZhash_object_array�str�objectr)r@r r!r"rr(r:rUrVr>rGrCrrrr,�s@
 
r,)�__doc__rZtypingrZnumpyrZpandas._libs.hashingZ_libsrcZpandas.core.dtypes.commonrrrZpandas.core.dtypes.genericrrrr	r/�intrr^rdr%r1rBr,rrrr�<module>s"R+(