Supported pandas API

The following table shows the pandas APIs that implemented or non-implemented from pandas API on Spark. Some pandas API do not implement full parameters, so the third column shows missing parameters for each API.

  • ‘Y’ in the second column means it’s implemented including its whole parameter.

  • ‘N’ means it’s not implemented yet.

  • ‘P’ means it’s partially implemented with the missing of some parameters.

All API in the list below computes the data with distributed execution except the ones that require the local execution by design. For example, DataFrame.to_numpy() requires to collect the data to the driver side.

If there is non-implemented pandas API or parameter you want, you can create an Apache Spark JIRA to request or to contribute by your own.

The API list is updated based on the pandas 2.0.0 pre-release.

CategoricalIndex API

API

Implemented

Missing parameters

add_categories()

Y

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

as_ordered()

Y

as_unordered()

Y

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

dtype , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_dtype_equal

N

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

Y

memory_usage

N

min()

Y

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

remove_categories()

Y

remove_unused_categories()

Y

rename()

Y

rename_categories()

Y

reorder_categories()

Y

repeat()

P

axis

searchsorted

N

set_categories()

Y

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

take_nd

N

to_flat_index

N

to_frame()

Y

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

DataFrame API

API

Implemented

Missing parameters

abs()

Y

add()

P

axis , fill_value , level

add_prefix()

Y

add_suffix()

Y

agg()

P

axis

aggregate()

P

axis

align()

P

broadcast_axis , fill_axis , fill_value , level , limit and more. See the pandas.DataFrame.align and pyspark.pandas.DataFrame.align for detail.

all()

P

level

any()

P

level , skipna

append()

Y

apply()

P

raw , result_type

applymap()

P

na_action

asfreq

N

asof

N

assign()

Y

astype()

P

copy , errors

at_time()

Y

backfill()

P

downcast

between_time()

P

inclusive

bfill()

P

downcast

bool()

Y

boxplot()

P

ax , backend , by , column , figsize and more. See the pandas.DataFrame.boxplot and pyspark.pandas.DataFrame.boxplot for detail.

clip()

P

axis , inplace

combine

N

combine_first()

Y

compare

N

convert_dtypes

N

copy()

Y

corr()

P

numeric_only

corrwith()

P

numeric_only

count()

P

level

cov()

P

numeric_only

cummax()

P

axis

cummin()

P

axis

cumprod()

P

axis

cumsum()

P

axis

describe()

P

datetime_is_numeric , exclude , include

diff()

Y

div()

P

axis , fill_value , level

divide()

P

axis , fill_value , level

dot()

Y

drop()

P

errors , inplace , level

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated()

Y

eq()

P

axis , level

equals()

Y

eval()

Y

ewm()

P

adjust , axis , method , times

expanding()

P

axis , center , method

explode()

Y

ffill()

P

downcast

fillna()

P

downcast

filter()

Y

first()

Y

first_valid_index()

Y

floordiv()

P

axis , fill_value , level

ge()

P

axis , level

get()

Y

groupby()

P

group_keys , level , observed , sort , squeeze

gt()

P

axis , level

head()

Y

hist()

P

ax , backend , by , column , data and more. See the pandas.DataFrame.hist and pyspark.pandas.DataFrame.hist for detail.

idxmax()

P

numeric_only , skipna

idxmin()

P

numeric_only , skipna

infer_objects

N

info()

P

memory_usage , show_counts

insert()

Y

interpolate()

P

axis , downcast , inplace

isetitem

N

isin()

Y

isna()

Y

isnull()

Y

items()

Y

iteritems()

Y

iterrows()

Y

itertuples()

Y

join()

P

other , sort , validate

keys()

Y

kurt()

P

level

kurtosis()

P

level

last()

Y

last_valid_index()

Y

le()

P

axis , level

lookup

N

lt()

P

axis , level

mad()

P

level , skipna

mask()

P

axis , errors , inplace , level , try_cast

max()

P

level

mean()

P

level

median()

P

level

melt()

P

col_level , ignore_index

memory_usage

N

merge()

P

copy , indicator , sort , validate

min()

P

level

mod()

P

axis , fill_value , level

mode()

Y

mul()

P

axis , fill_value , level

multiply()

P

axis , fill_value , level

ne()

P

axis , level

nlargest()

Y

notna()

Y

notnull()

Y

nsmallest()

Y

nunique()

Y

pad()

P

downcast

pct_change()

P

fill_method , freq , limit

pipe()

Y

pivot()

Y

pivot_table()

P

dropna , margins , margins_name , observed , sort

pop()

Y

pow()

P

axis , fill_value , level

prod()

P

level

product()

P

level

quantile()

P

interpolation , method

query()

Y

radd()

P

axis , fill_value , level

rank()

P

axis , na_option , pct

rdiv()

P

axis , fill_value , level

reindex()

P

level , limit , method , tolerance

reindex_like()

P

limit , method , tolerance

rename()

P

copy

rename_axis()

Y

reorder_levels

N

replace()

Y

resample()

P

axis , base , convention , group_keys , kind and more. See the pandas.DataFrame.resample and pyspark.pandas.DataFrame.resample for detail.

reset_index()

P

allow_duplicates , names

rfloordiv()

P

axis , fill_value , level

rmod()

P

axis , fill_value , level

rmul()

P

axis , fill_value , level

rolling()

P

axis , center , closed , method , on and more. See the pandas.DataFrame.rolling and pyspark.pandas.DataFrame.rolling for detail.

round()

Y

rpow()

P

axis , fill_value , level

rsub()

P

axis , fill_value , level

rtruediv()

P

axis , fill_value , level

sample()

P

axis , weights

select_dtypes()

Y

sem()

P

level

set_axis

N

set_flags

N

set_index()

P

verify_integrity

shift()

P

axis , freq

skew()

P

level

slice_shift

N

sort_index()

P

key , sort_remaining

sort_values()

P

axis , key , kind

squeeze()

Y

stack()

P

dropna , level

std()

P

level

sub()

P

axis , fill_value , level

subtract()

P

axis , fill_value , level

sum()

P

level

swapaxes()

P

axis1 , axis2

swaplevel()

Y

tail()

Y

take()

P

is_copy

to_clipboard()

Y

to_csv()

P

chunksize , compression , decimal , doublequote , encoding and more. See the pandas.DataFrame.to_csv and pyspark.pandas.DataFrame.to_csv for detail.

to_dict()

Y

to_excel()

P

storage_options

to_feather

N

to_gbq

N

to_hdf

N

to_html()

P

encoding

to_json()

P

date_format , date_unit , default_handler , double_precision , force_ascii and more. See the pandas.DataFrame.to_json and pyspark.pandas.DataFrame.to_json for detail.

to_latex()

P

caption , label , position

to_markdown()

P

index , storage_options

to_numpy()

P

copy , dtype , na_value

to_orc()

P

engine , engine_kwargs , index

to_parquet()

P

engine , index , storage_options

to_period

N

to_pickle

N

to_records()

Y

to_sql

N

to_stata

N

to_string()

P

encoding , max_colwidth , min_rows

to_timestamp

N

to_xarray

N

to_xml

N

transform()

Y

transpose()

P

copy

truediv()

P

axis , fill_value , level

truncate()

Y

tshift

N

tz_convert

N

tz_localize

N

unstack()

P

fill_value , level

update()

P

errors , filter_func

value_counts

N

var()

P

level , skipna

where()

P

errors , inplace , level , try_cast

xs()

P

drop_level

DatetimeIndex API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

ceil()

Y

copy()

P

dtype , names

day_name()

Y

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

floor()

Y

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

indexer_at_time()

Y

indexer_between_time()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

isocalendar

N

item()

Y

join

N

map()

Y

max()

P

axis , skipna

mean

N

memory_usage

N

min()

P

axis , skipna

month_name()

Y

normalize()

Y

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

round()

Y

searchsorted

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

snap

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

std

N

strftime()

Y

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_julian_date

N

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_period

N

to_perioddelta

N

to_pydatetime

N

to_series()

P

index , keep_tz

tolist()

Y

transpose()

Y

tz_convert

N

tz_localize

N

union()

Y

union_many

N

unique()

Y

value_counts()

Y

view()

Y

where

N

Float64Index API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

dtype , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

searchsorted

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

Index API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

dtype , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

searchsorted

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

Int64Index API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

dtype , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

searchsorted

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

MultiIndex API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

copy()

P

codes , dtype , levels , name , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equal_levels()

Y

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_loc_level

N

get_locs

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_lexsorted

N

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

remove_unused_levels

N

rename()

P

level , names

reorder_levels

N

repeat()

P

axis

searchsorted

N

set_codes

N

set_levels

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

swaplevel()

Y

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

P

allow_duplicates

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_series()

P

index

tolist()

Y

transpose()

Y

truncate

N

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

Series API

API

Implemented

Missing parameters

abs()

Y

add()

P

axis , fill_value , level

add_prefix()

Y

add_suffix()

Y

agg()

P

axis

aggregate()

P

axis

align()

P

broadcast_axis , fill_axis , fill_value , level , limit and more. See the pandas.Series.align and pyspark.pandas.Series.align for detail.

all()

P

bool_only , level

any()

P

bool_only , level , skipna

append()

Y

apply()

P

convert_dtype

argmax()

Y

argmin()

Y

argsort()

P

axis , kind , order

asfreq

N

asof()

P

subset

astype()

P

copy , errors

at_time()

Y

autocorr()

Y

backfill()

P

downcast

between()

Y

between_time()

P

inclusive

bfill()

P

downcast

bool()

Y

clip()

P

axis

combine

N

combine_first()

Y

compare()

P

align_axis , result_names

convert_dtypes

N

copy()

Y

corr()

Y

count()

P

level

cov()

Y

cummax()

P

axis

cummin()

P

axis

cumprod()

P

axis

cumsum()

P

axis

describe()

P

datetime_is_numeric , exclude , include

diff()

Y

div()

P

axis , fill_value , level

divide()

P

axis , fill_value , level

divmod()

P

axis , fill_value , level

dot()

Y

drop()

P

axis , errors

drop_duplicates()

Y

droplevel()

P

axis

dropna()

P

how

duplicated()

Y

eq()

P

axis , fill_value , level

equals()

Y

ewm()

P

adjust , axis , method , times

expanding()

P

axis , center , method

explode()

P

ignore_index

factorize()

P

use_na_sentinel

ffill()

P

downcast

fillna()

P

downcast

filter()

Y

first()

Y

first_valid_index()

Y

floordiv()

P

axis , fill_value , level

ge()

P

axis , fill_value , level

get()

Y

groupby()

P

group_keys , level , observed , sort , squeeze

gt()

P

axis , fill_value , level

head()

Y

hist()

P

ax , backend , by , figsize , grid and more. See the pandas.Series.hist and pyspark.pandas.Series.hist for detail.

idxmax()

P

axis

idxmin()

P

axis

infer_objects

N

info

N

interpolate()

P

axis , downcast , inplace

isin()

Y

isna()

Y

isnull()

Y

item()

Y

items()

Y

iteritems()

Y

keys()

Y

kurt()

P

level

kurtosis()

P

level

last()

Y

last_valid_index()

Y

le()

P

axis , fill_value , level

lt()

P

axis , fill_value , level

mad()

P

axis , level , skipna

map()

Y

mask()

P

axis , errors , inplace , level , try_cast

max()

P

level

mean()

P

level

median()

P

level

memory_usage

N

min()

P

level

mod()

P

axis , fill_value , level

mode()

Y

mul()

P

axis , fill_value , level

multiply()

P

axis , fill_value , level

ne()

P

axis , fill_value , level

nlargest()

P

keep

notna()

Y

notnull()

Y

nsmallest()

P

keep

nunique()

Y

pad()

P

downcast

pct_change()

P

fill_method , freq , limit

pipe()

Y

pop()

Y

pow()

P

axis , fill_value , level

prod()

P

level

product()

P

level

quantile()

P

interpolation

radd()

P

axis , fill_value , level

rank()

P

axis , na_option , pct

ravel

N

rdiv()

P

axis , fill_value , level

rdivmod()

P

axis , fill_value , level

reindex()

Y

reindex_like()

P

copy , limit , method , tolerance

rename()

P

axis , copy , errors , inplace , level

rename_axis()

Y

reorder_levels

N

repeat()

P

axis

replace()

P

inplace , limit , method

resample()

P

axis , base , convention , group_keys , kind and more. See the pandas.Series.resample and pyspark.pandas.Series.resample for detail.

reset_index()

P

allow_duplicates

rfloordiv()

P

axis , fill_value , level

rmod()

P

axis , fill_value , level

rmul()

P

axis , fill_value , level

rolling()

P

axis , center , closed , method , on and more. See the pandas.Series.rolling and pyspark.pandas.Series.rolling for detail.

round()

Y

rpow()

P

axis , fill_value , level

rsub()

P

axis , fill_value , level

rtruediv()

P

axis , fill_value , level

sample()

P

axis , weights

searchsorted()

P

sorter

sem()

P

level

set_axis

N

set_flags

N

shift()

P

axis , freq

skew()

P

level

slice_shift

N

sort_index()

P

key , sort_remaining

sort_values()

P

axis , key , kind

squeeze()

Y

std()

P

level

sub()

P

axis , fill_value , level

subtract()

P

axis , fill_value , level

sum()

P

level

swapaxes()

P

axis1 , axis2

swaplevel()

Y

tail()

Y

take()

P

axis , is_copy

to_clipboard()

Y

to_csv()

P

chunksize , compression , decimal , doublequote , encoding and more. See the pandas.Series.to_csv and pyspark.pandas.Series.to_csv for detail.

to_dict()

Y

to_excel()

P

storage_options

to_frame()

Y

to_hdf

N

to_json()

P

date_format , date_unit , default_handler , double_precision , force_ascii and more. See the pandas.Series.to_json and pyspark.pandas.Series.to_json for detail.

to_latex()

P

caption , label , position

to_list()

Y

to_markdown()

P

index , storage_options

to_numpy()

P

copy , dtype , na_value

to_period

N

to_pickle

N

to_sql

N

to_string()

P

min_rows

to_timestamp

N

to_xarray

N

tolist()

Y

transform()

Y

transpose()

Y

truediv()

P

axis , fill_value , level

truncate()

Y

tshift

N

tz_convert

N

tz_localize

N

unique()

Y

unstack()

P

fill_value

update()

Y

value_counts()

Y

var()

P

level , skipna

view

N

where()

P

axis , errors , inplace , level , try_cast

xs()

P

axis , drop_level

TimedeltaIndex API

API

Implemented

Missing parameters

all()

Y

any()

Y

append()

Y

argmax()

P

axis , skipna

argmin()

P

axis , skipna

argsort

N

asof()

Y

asof_locs

N

astype()

P

copy

ceil

N

copy()

P

dtype , names

delete()

Y

difference()

Y

drop()

P

errors

drop_duplicates()

Y

droplevel()

Y

dropna()

Y

duplicated

N

equals()

Y

factorize()

P

use_na_sentinel

fillna()

P

downcast

floor

N

format

N

get_indexer

N

get_indexer_for

N

get_indexer_non_unique

N

get_level_values()

Y

get_loc

N

get_slice_bound

N

get_value

N

groupby

N

holds_integer()

Y

identical()

Y

insert()

Y

intersection()

P

sort

is_

N

is_boolean()

Y

is_categorical()

Y

is_floating()

Y

is_integer()

Y

is_interval()

Y

is_mixed

N

is_numeric()

Y

is_object()

Y

is_type_compatible()

Y

isin()

P

level

isna()

Y

isnull()

Y

item()

Y

join

N

map()

Y

max()

P

axis , skipna

mean

N

median

N

memory_usage

N

min()

P

axis , skipna

notna()

Y

notnull()

Y

nunique()

Y

putmask

N

ravel

N

reindex

N

rename()

Y

repeat()

P

axis

round

N

searchsorted

N

set_names()

Y

set_value

N

shift()

P

freq

slice_indexer

N

slice_locs

N

sort()

Y

sort_values()

P

key , na_position

sortlevel

N

std

N

sum

N

symmetric_difference()

Y

take()

P

allow_fill , axis , fill_value

to_flat_index

N

to_frame()

Y

to_list()

Y

to_native_types

N

to_numpy()

P

na_value

to_pytimedelta

N

to_series()

P

index

tolist()

Y

total_seconds

N

transpose()

Y

union()

Y

unique()

Y

value_counts()

Y

view()

Y

where

N

General Function API

API

Implemented

Missing parameters

array

N

bdate_range

N

concat()

P

copy , keys , levels , names , verify_integrity

crosstab

N

cut

N

date_range()

P

inclusive

eval

N

factorize

N

from_dummies

N

get_dummies()

Y

infer_freq

N

interval_range

N

isna()

Y

isnull()

Y

json_normalize

N

lreshape

N

melt()

P

col_level , ignore_index

merge()

P

copy , indicator , left , sort , validate

merge_asof()

Y

merge_ordered

N

notna()

Y

notnull()

Y

period_range

N

pivot

N

pivot_table

N

qcut

N

read_clipboard()

Y

read_csv()

P

cache_dates , chunksize , compression , converters , date_parser and more. See the pandas.read_csv and pyspark.pandas.read_csv for detail.

read_excel()

P

decimal , na_filter , storage_options

read_feather

N

read_fwf

N

read_gbq

N

read_hdf

N

read_html()

P

extract_links

read_json()

P

chunksize , compression , convert_axes , convert_dates , date_unit and more. See the pandas.read_json and pyspark.pandas.read_json for detail.

read_orc()

Y

read_parquet()

P

engine , storage_options , use_nullable_dtypes

read_pickle

N

read_sas

N

read_spss

N

read_sql()

P

chunksize , coerce_float , params , parse_dates

read_sql_query()

P

chunksize , coerce_float , dtype , params , parse_dates

read_sql_table()

P

chunksize , coerce_float , parse_dates

read_stata

N

read_table()

P

cache_dates , chunksize , comment , compression , converters and more. See the pandas.read_table and pyspark.pandas.read_table for detail.

read_xml

N

set_eng_float_format

N

show_versions

N

test

N

timedelta_range()

Y

to_datetime()

P

cache , dayfirst , exact , utc , yearfirst

to_numeric()

P

downcast

to_pickle

N

to_timedelta()

Y

unique

N

value_counts

N

wide_to_long

N

Expanding API

API

Implemented

Missing parameters

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

validate

N

var()

P

ddof , engine , engine_kwargs , numeric_only

ExpandingGroupby API

API

Implemented

Missing parameters

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

validate

N

var()

P

ddof , engine , engine_kwargs , numeric_only

Rolling API

API

Implemented

Missing parameters

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

validate

N

var()

P

ddof , engine , engine_kwargs , numeric_only

RollingGroupby API

API

Implemented

Missing parameters

agg

N

aggregate

N

apply

N

corr

N

count()

P

numeric_only

cov

N

kurt()

P

numeric_only

max()

P

engine , engine_kwargs , numeric_only

mean()

P

engine , engine_kwargs , numeric_only

median

N

min()

P

engine , engine_kwargs , numeric_only

quantile()

P

interpolation , numeric_only

rank

N

sem

N

skew()

P

numeric_only

std()

P

ddof , engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs , numeric_only

validate

N

var()

P

ddof , engine , engine_kwargs , numeric_only

Window API

API

Implemented

Missing parameters

agg

N

aggregate

N

mean

N

std

N

sum

N

validate

N

var

N

DataFrameGroupBy API

API

Implemented

Missing parameters

agg()

P

engine , engine_kwargs , func

aggregate()

P

engine , engine_kwargs , func

all()

Y

any()

P

skipna

apply()

Y

backfill()

Y

bfill()

Y

boxplot

N

count()

Y

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

describe()

Y

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

filter()

P

dropna

first()

Y

get_group()

P

obj

head()

Y

idxmax()

P

axis , numeric_only

idxmin()

P

axis , numeric_only

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

ngroup

N

nunique()

Y

ohlc

N

pad()

Y

pct_change

N

pipe

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

resample

N

rolling()

Y

sample

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

transform()

P

engine , engine_kwargs

value_counts

N

var()

P

engine , engine_kwargs , numeric_only

GroupBy API

API

Implemented

Missing parameters

agg()

P

func

aggregate()

P

func

all()

Y

any()

P

skipna

apply()

Y

backfill()

Y

bfill()

Y

count()

Y

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

describe

N

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

first()

Y

get_group()

P

obj

head()

Y

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

ngroup

N

ohlc

N

pad()

Y

pct_change

N

pipe

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

resample

N

rolling()

Y

sample

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

var()

P

engine , engine_kwargs , numeric_only

SeriesGroupBy API

API

Implemented

Missing parameters

agg()

P

engine , engine_kwargs , func

aggregate()

P

engine , engine_kwargs , func

all()

Y

any()

P

skipna

apply()

Y

backfill()

Y

bfill()

Y

count()

Y

cumcount()

Y

cummax()

P

axis , numeric_only

cummin()

P

axis , numeric_only

cumprod()

P

axis

cumsum()

P

axis

describe

N

diff()

P

axis

ewm()

Y

expanding()

Y

ffill()

Y

filter()

P

dropna

first()

Y

get_group()

P

obj

head()

Y

last()

Y

max()

P

engine , engine_kwargs

mean()

P

engine , engine_kwargs

median()

Y

min()

P

engine , engine_kwargs

ngroup

N

nlargest()

P

keep

nsmallest()

P

keep

nunique()

Y

ohlc

N

pad()

Y

pct_change

N

pipe

N

prod()

Y

quantile()

P

interpolation , numeric_only

rank()

P

axis , na_option , pct

resample

N

rolling()

Y

sample

N

sem()

P

numeric_only

shift()

P

axis , freq

size()

Y

std()

P

engine , engine_kwargs , numeric_only

sum()

P

engine , engine_kwargs

tail()

Y

transform()

P

engine , engine_kwargs

value_counts()

P

bins , normalize

var()

P

engine , engine_kwargs , numeric_only