pyspark.pandas.Series.rank#
- Series.rank(method='average', ascending=True, numeric_only=False)[source]#
- Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values. - Note - the current implementation of rank uses Spark’s Window without specifying partition specification. This leads to moveing all data into a single partition in a single machine and could cause serious performance degradation. Avoid this method with very large datasets. - Parameters
- method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}
- average: average rank of group 
- min: lowest rank in group 
- max: highest rank in group 
- first: ranks assigned in order they appear in the array 
- dense: like ‘min’, but rank always increases by 1 between groups 
 
- ascendingboolean, default True
- False for ranks by high (1) to low (N) 
- numeric_onlybool, default False
- For DataFrame objects, rank only numeric columns if set to True. - Changed in version 4.0.0: The default value of - numeric_onlyis now- False.
 
- Returns
- rankssame type as caller
 
 - Examples - >>> s = ps.Series([1, 2, 2, 3], name='A') >>> s 0 1 1 2 2 2 3 3 Name: A, dtype: int64 - >>> s.rank() 0 1.0 1 2.5 2 2.5 3 4.0 Name: A, dtype: float64 - If method is set to ‘min’, it uses lowest rank in group. - >>> s.rank(method='min') 0 1.0 1 2.0 2 2.0 3 4.0 Name: A, dtype: float64 - If method is set to ‘max’, it uses highest rank in group. - >>> s.rank(method='max') 0 1.0 1 3.0 2 3.0 3 4.0 Name: A, dtype: float64 - If method is set to ‘first’, it is assigned rank in order without groups. - >>> s.rank(method='first') 0 1.0 1 2.0 2 3.0 3 4.0 Name: A, dtype: float64 - If method is set to ‘dense’, it leaves no gaps in group. - >>> s.rank(method='dense') 0 1.0 1 2.0 2 2.0 3 3.0 Name: A, dtype: float64 - If numeric_only is set to ‘True’, rank only numeric Series, return an empty Series otherwise. - >>> s = ps.Series(['a', 'b', 'c'], name='A', index=['x', 'y', 'z']) >>> s x a y b z c Name: A, dtype: object