pyspark.pandas.DataFrame.set_index¶

DataFrame.set_index(keys: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]]], drop: bool = True, append: bool = False, inplace: bool = False) → Optional[pyspark.pandas.frame.DataFrame][source]¶

Set the DataFrame index (row labels) using one or more existing columns.

Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.

Parameters

keyslabel or array-like or list of labels/arrays: This parameter can be either a single column key, a single array of the same length as the calling DataFrame, or a list containing an arbitrary combination of column keys and arrays. Here, “array” encompasses Series, Index and np.ndarray.
dropbool, default True: Delete columns to be used as the new index.
appendbool, default False: Whether to append columns to existing index.
inplacebool, default False: Modify the DataFrame in place (do not create a new object).

Returns

DataFrame: Changed row labels.

See also

DataFrame.reset_index: Opposite of set_index.

Examples

>>> df = ps.DataFrame({'month': [1, 4, 7, 10],
...                    'year': [2012, 2014, 2013, 2014],
...                    'sale': [55, 40, 84, 31]},
...                   columns=['month', 'year', 'sale'])
>>> df
   month  year  sale
0      1  2012    55
1      4  2014    40
2      7  2013    84
3     10  2014    31

Set the index to become the ‘month’ column:

>>> df.set_index('month')  
       year  sale
month
1      2012    55
4      2014    40
7      2013    84
10     2014    31

Create a MultiIndex using columns ‘year’ and ‘month’:

>>> df.set_index(['year', 'month'])  
            sale
year  month
2012  1     55
2014  4     40
2013  7     84
2014  10    31

pyspark.pandas.DataFrame.reset_index pyspark.pandas.DataFrame.swapaxes