pyspark.RDD.sortBy#
- RDD.sortBy(keyfunc, ascending=True, numPartitions=None)[source]#
- Sorts this RDD by the given keyfunc - New in version 1.1.0. - Parameters
- keyfuncfunction
- a function to compute the key 
- ascendingbool, optional, default True
- sort the keys in ascending or descending order 
- numPartitionsint, optional
- the number of partitions in new - RDD
 
- Returns
 - Examples - >>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)] >>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect() [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)] >>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect() [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]