org.apache.spark.graphx
Class PartitionStrategy.EdgePartition2D$

Object
  extended by org.apache.spark.graphx.PartitionStrategy.EdgePartition2D$
All Implemented Interfaces:
java.io.Serializable, PartitionStrategy, scala.Equals, scala.Product
Enclosing interface:
PartitionStrategy

public static class PartitionStrategy.EdgePartition2D$
extends Object
implements PartitionStrategy, scala.Product, scala.Serializable

Assigns edges to partitions using a 2D partitioning of the sparse edge adjacency matrix, guaranteeing a 2 * sqrt(numParts) - 1 bound on vertex replication.

Suppose we have a graph with 12 vertices that we want to partition over 9 machines. We can use the following sparse matrix representation:

       __________________________________
  v0   | P0 *     | P1       | P2    *  |
  v1   |  ****    |  *       |          |
  v2   |  ******* |      **  |  ****    |
  v3   |  *****   |  *  *    |       *  |
       ----------------------------------
  v4   | P3 *     | P4 ***   | P5 **  * |
  v5   |  *  *    |  *       |          |
  v6   |       *  |      **  |  ****    |
  v7   |  * * *   |  *  *    |       *  |
       ----------------------------------
  v8   | P6   *   | P7    *  | P8  *   *|
  v9   |     *    |  *    *  |          |
  v10  |       *  |      **  |  *  *    |
  v11  | * <-E    |  ***     |       ** |
       ----------------------------------
 

The edge denoted by E connects v11 with v1 and is assigned to processor P6. To get the processor number we divide the matrix into sqrt(numParts) by sqrt(numParts) blocks. Notice that edges adjacent to v11 can only be in the first column of blocks (P0, P3, P6) or the last row of blocks (P6, P7, P8). As a consequence we can guarantee that v11 will need to be replicated to at most 2 * sqrt(numParts) - 1 machines.

Notice that P0 has many edges and as a consequence this partitioning would lead to poor work balance. To improve balance we first multiply each vertex id by a large prime to shuffle the vertex locations.

One of the limitations of this approach is that the number of machines must either be a perfect square. We partially address this limitation by computing the machine assignment to the next largest perfect square and then mapping back down to the actual number of machines. Unfortunately, this can also lead to work imbalance and so it is suggested that a perfect square is used.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.apache.spark.graphx.PartitionStrategy
PartitionStrategy.CanonicalRandomVertexCut$, PartitionStrategy.EdgePartition1D$, PartitionStrategy.EdgePartition2D$, PartitionStrategy.RandomVertexCut$
 
Field Summary
static PartitionStrategy.EdgePartition2D$ MODULE$
          Static reference to the singleton instance of this Scala object.
 
Constructor Summary
PartitionStrategy.EdgePartition2D$()
           
 
Method Summary
 int getPartition(long src, long dst, int numParts)
          Returns the partition number for a given edge.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface scala.Product
productArity, productElement, productIterator, productPrefix
 
Methods inherited from interface scala.Equals
canEqual, equals
 

Field Detail

MODULE$

public static final PartitionStrategy.EdgePartition2D$ MODULE$
Static reference to the singleton instance of this Scala object.

Constructor Detail

PartitionStrategy.EdgePartition2D$

public PartitionStrategy.EdgePartition2D$()
Method Detail

getPartition

public int getPartition(long src,
                        long dst,
                        int numParts)
Description copied from interface: PartitionStrategy
Returns the partition number for a given edge.

Specified by:
getPartition in interface PartitionStrategy