pyspark.RDD.coalesce#
- RDD.coalesce(numPartitions, shuffle=False)[source]#
 Return a new RDD that is reduced into numPartitions partitions.
New in version 1.0.0.
- Parameters
 - numPartitionsint, optional
 the number of partitions in new
RDD- shufflebool, optional, default False
 whether to add a shuffle step
- Returns
 
See also
Examples
>>> sc.parallelize([1, 2, 3, 4, 5], 3).glom().collect() [[1], [2, 3], [4, 5]] >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect() [[1, 2, 3, 4, 5]]