Package pyspark :: Package mllib :: Module random :: Class RandomRDDs
[frames] | no frames]

Class RandomRDDs

source code

Generator methods for creating RDDs comprised of i.i.d samples from some distribution.

Static Methods
 
uniformRDD(sc, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
 
normalRDD(sc, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
 
poissonRDD(sc, mean, size, numPartitions=None, seed=None)
Generates an RDD comprised of i.i.d.
source code
 
uniformVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
 
normalVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
 
poissonVectorRDD(sc, mean, numRows, numCols, numPartitions=None, seed=None)
Generates an RDD comprised of vectors containing i.i.d.
source code
Method Details

uniformRDD(sc, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the uniform distribution U(0.0, 1.0).

To transform the distribution in the generated RDD from U(0.0, 1.0) to U(a, b), use RandomRDDs.uniformRDD(sc, n, p, seed) .map(lambda v: a + (b - a) * v)

>>> x = RandomRDDs.uniformRDD(sc, 100).collect()
>>> len(x)
100
>>> max(x) <= 1.0 and min(x) >= 0.0
True
>>> RandomRDDs.uniformRDD(sc, 100, 4).getNumPartitions()
4
>>> parts = RandomRDDs.uniformRDD(sc, 100, seed=4).getNumPartitions()
>>> parts == sc.defaultParallelism
True

normalRDD(sc, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the standard normal distribution.

To transform the distribution in the generated RDD from standard normal to some other normal N(mean, sigma^2), use RandomRDDs.normal(sc, n, p, seed) .map(lambda v: mean + sigma * v)

>>> x = RandomRDDs.normalRDD(sc, 1000, seed=1L)
>>> stats = x.stats()
>>> stats.count()
1000L
>>> abs(stats.mean() - 0.0) < 0.1
True
>>> abs(stats.stdev() - 1.0) < 0.1
True

poissonRDD(sc, mean, size, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of i.i.d. samples from the Poisson distribution with the input mean.

>>> mean = 100.0
>>> x = RandomRDDs.poissonRDD(sc, mean, 1000, seed=1L)
>>> stats = x.stats()
>>> stats.count()
1000L
>>> abs(stats.mean() - mean) < 0.5
True
>>> from math import sqrt
>>> abs(stats.stdev() - sqrt(mean)) < 0.5
True

uniformVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the uniform distribution U(0.0, 1.0).

>>> import numpy as np
>>> mat = np.matrix(RandomRDDs.uniformVectorRDD(sc, 10, 10).collect())
>>> mat.shape
(10, 10)
>>> mat.max() <= 1.0 and mat.min() >= 0.0
True
>>> RandomRDDs.uniformVectorRDD(sc, 10, 10, 4).getNumPartitions()
4

normalVectorRDD(sc, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the standard normal distribution.

>>> import numpy as np
>>> mat = np.matrix(RandomRDDs.normalVectorRDD(sc, 100, 100, seed=1L).collect())
>>> mat.shape
(100, 100)
>>> abs(mat.mean() - 0.0) < 0.1
True
>>> abs(mat.std() - 1.0) < 0.1
True

poissonVectorRDD(sc, mean, numRows, numCols, numPartitions=None, seed=None)
Static Method

source code 

Generates an RDD comprised of vectors containing i.i.d. samples drawn from the Poisson distribution with the input mean.

>>> import numpy as np
>>> mean = 100.0
>>> rdd = RandomRDDs.poissonVectorRDD(sc, mean, 100, 100, seed=1L)
>>> mat = np.mat(rdd.collect())
>>> mat.shape
(100, 100)
>>> abs(mat.mean() - mean) < 0.5
True
>>> from math import sqrt
>>> abs(mat.std() - sqrt(mean)) < 0.5
True