pyspark.sql.DataFrame.sortWithinPartitions

DataFrame.sortWithinPartitions(*cols: Union[str, pyspark.sql.column.Column, List[Union[str, pyspark.sql.column.Column]]], **kwargs: Any) → pyspark.sql.dataframe.DataFrame[source]

Returns a new DataFrame with each partition sorted by the specified column(s).

New in version 1.6.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colsstr, list or Column, optional

list of Column or column names to sort by.

Returns
DataFrame

DataFrame sorted by partitions.

Other Parameters
ascendingbool or list, optional, default True

boolean or list of boolean. Sort ascending vs. descending. Specify list for multiple sort orders. If a list is specified, the length of the list must equal the length of the cols.

Examples

>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
>>> df.sortWithinPartitions("age", ascending=False)
DataFrame[age: bigint, name: string]