
DataFrame.rank(method: str = 'average', ascending: bool = True, numeric_only: Optional[bool] = None) → pyspark.pandas.frame.DataFrame[source]

Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values.


the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to moving all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}
  • average: average rank of group

  • min: lowest rank in group

  • max: highest rank in group

  • first: ranks assigned in order they appear in the array

  • dense: like ‘min’, but rank always increases by 1 between groups

ascendingboolean, default True

False for ranks by high (1) to low (N)

numeric_onlybool, optional

For DataFrame objects, rank only numeric columns if set to True.

rankssame type as caller


>>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': [4, 3, 2, 1]}, columns=['A', 'B'])
>>> df
   A  B
0  1  4
1  2  3
2  2  2
3  3  1
>>> df.rank().sort_index()
     A    B
0  1.0  4.0
1  2.5  3.0
2  2.5  2.0
3  4.0  1.0

If method is set to ‘min’, it uses lowest rank in group.

>>> df.rank(method='min').sort_index()
     A    B
0  1.0  4.0
1  2.0  3.0
2  2.0  2.0
3  4.0  1.0

If method is set to ‘max’, it uses highest rank in group.

>>> df.rank(method='max').sort_index()
     A    B
0  1.0  4.0
1  3.0  3.0
2  3.0  2.0
3  4.0  1.0

If method is set to ‘dense’, it leaves no gaps in group.

>>> df.rank(method='dense').sort_index()
     A    B
0  1.0  4.0
1  2.0  3.0
2  2.0  2.0
3  3.0  1.0

If numeric_only is set to ‘True’, rank only numeric columns.

>>> df = ps.DataFrame({'A': [1, 2, 2, 3], 'B': ['a', 'b', 'd', 'c']}, columns= ['A', 'B'])
>>> df
   A  B
0  1  a
1  2  b
2  2  d
3  3  c
>>> df.rank(numeric_only=True)
0  1.0
1  2.5
2  2.5
3  4.0