pyspark.pandas.Index.sort_values¶
-
Index.
sort_values
(return_indexer: bool = False, ascending: bool = True) → Union[pyspark.pandas.indexes.base.Index, Tuple[pyspark.pandas.indexes.base.Index, pyspark.pandas.indexes.base.Index]][source]¶ Return a sorted copy of the index, and optionally return the indices that sorted the index itself.
Note
This method is not supported for pandas when index has NaN value. pandas raises unexpected TypeError, but we support treating NaN as the smallest value. This method returns indexer as a pandas-on-Spark index while pandas returns it as a list. That’s because indexer in pandas-on-Spark may not fit in memory.
- Parameters
- return_indexerbool, default False
Should the indices that would sort the index be returned.
- ascendingbool, default True
Should the index values be sorted in an ascending order.
- Returns
- sorted_indexps.Index or ps.MultiIndex
Sorted copy of the index.
- indexerps.Index
The indices that the index itself was sorted by.
See also
Series.sort_values
Sort values of a Series.
DataFrame.sort_values
Sort values in a DataFrame.
Examples
>>> idx = ps.Index([10, 100, 1, 1000]) >>> idx Int64Index([10, 100, 1, 1000], dtype='int64')
Sort values in ascending order (default behavior).
>>> idx.sort_values() Int64Index([1, 10, 100, 1000], dtype='int64')
Sort values in descending order.
>>> idx.sort_values(ascending=False) Int64Index([1000, 100, 10, 1], dtype='int64')
Sort values in descending order, and also get the indices idx was sorted by.
>>> idx.sort_values(ascending=False, return_indexer=True) (Int64Index([1000, 100, 10, 1], dtype='int64'), Int64Index([3, 1, 0, 2], dtype='int64'))
Support for MultiIndex.
>>> psidx = ps.MultiIndex.from_tuples([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)]) >>> psidx MultiIndex([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)], )
>>> psidx.sort_values() MultiIndex([('a', 'x', 1), ('b', 'z', 3), ('c', 'y', 2)], )
>>> psidx.sort_values(ascending=False) MultiIndex([('c', 'y', 2), ('b', 'z', 3), ('a', 'x', 1)], )
>>> psidx.sort_values(ascending=False, return_indexer=True) (MultiIndex([('c', 'y', 2), ('b', 'z', 3), ('a', 'x', 1)], ), Int64Index([1, 2, 0], dtype='int64'))