pyspark.pandas.DataFrame.nsmallest¶

DataFrame.nsmallest(n: int, columns: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]]]) → pyspark.pandas.frame.DataFrame[source]¶

Return the first n rows ordered by columns in ascending order.

Return the first n rows with the smallest values in columns, in ascending order. The columns that are not specified are returned as well, but not used for ordering.

This method is equivalent to df.sort_values(columns, ascending=True).head(n), but more performant. In pandas-on-Spark, thanks to Spark’s lazy execution and query optimizer, the two would have same performance.

Parameters

nint: Number of items to retrieve.
columnslist or str: Column name or names to order by.

Returns

DataFrame

See also

DataFrame.nlargest: Return the first n rows ordered by columns in descending order.
DataFrame.sort_values: Sort DataFrame by the values.
DataFrame.head: Return the first n rows without re-ordering.

Examples

>>> df = ps.DataFrame({'X': [1, 2, 3, 5, 6, 7, np.nan],
...                    'Y': [6, 7, 8, 9, 10, 11, 12]})
>>> df
     X   Y
0  1.0   6
1  2.0   7
2  3.0   8
3  5.0   9
4  6.0  10
5  7.0  11
6  NaN  12

In the following example, we will use nsmallest to select the three rows having the smallest values in column “a”.

>>> df.nsmallest(n=3, columns='X') 
     X   Y
0  1.0   6
1  2.0   7
2  3.0   8

To order by the largest values in column “a” and then “c”, we can specify multiple columns like in the next example.

>>> df.nsmallest(n=3, columns=['Y', 'X']) 
     X   Y
0  1.0   6
1  2.0   7
2  3.0   8

pyspark.pandas.DataFrame.nlargest pyspark.pandas.DataFrame.stack