Vectors

class pyspark.mllib.linalg.Vectors[source]

Factory methods for working with vectors.

Notes

Dense vectors are simply represented as NumPy array objects, so there is no need to covert them for use in MLlib. For sparse vectors, the factory methods in this class create an MLlib-compatible type, or users can pass in SciPy’s scipy.sparse column vectors.

Methods

dense(*elements)

Create a dense vector of 64-bit floats from a Python list or numbers.

fromML(vec)

Convert a vector from the new mllib-local representation.

norm(vector, p)

Find norm of the given vector.

parse(s)

Parse a string representation back into the Vector.

sparse(size, *args)

Create a sparse vector, using either a dictionary, a list of (index, value) pairs, or two separate arrays of indices and values (sorted by index).

squared_distance(v1, v2)

Squared distance between two vectors.

stringify(vector)

Converts a vector into a string, which can be recognized by Vectors.parse().

zeros(size)

Methods Documentation

static dense(*elements: Union[float, bytes, numpy.ndarray, Iterable[float]])pyspark.mllib.linalg.DenseVector[source]

Create a dense vector of 64-bit floats from a Python list or numbers.

Examples

>>> Vectors.dense([1, 2, 3])
DenseVector([1.0, 2.0, 3.0])
>>> Vectors.dense(1.0, 2.0)
DenseVector([1.0, 2.0])
static fromML(vec: pyspark.ml.linalg.DenseVector)pyspark.mllib.linalg.DenseVector[source]

Convert a vector from the new mllib-local representation. This does NOT copy the data; it copies references.

New in version 2.0.0.

Parameters
vecpyspark.ml.linalg.Vector
Returns
pyspark.mllib.linalg.Vector
static norm(vector: pyspark.mllib.linalg.Vector, p: NormType) → numpy.float64[source]

Find norm of the given vector.

static parse(s: str)pyspark.mllib.linalg.Vector[source]

Parse a string representation back into the Vector.

Examples

>>> Vectors.parse('[2,1,2 ]')
DenseVector([2.0, 1.0, 2.0])
>>> Vectors.parse(' ( 100,  [0],  [2])')
SparseVector(100, {0: 2.0})
static sparse(size: int, *args: Union[bytes, Tuple[int, float], Iterable[float], Iterable[Tuple[int, float]], Dict[int, float]])pyspark.mllib.linalg.SparseVector[source]

Create a sparse vector, using either a dictionary, a list of (index, value) pairs, or two separate arrays of indices and values (sorted by index).

Parameters
sizeint

Size of the vector.

args

Non-zero entries, as a dictionary, list of tuples, or two sorted lists containing indices and values.

Examples

>>> Vectors.sparse(4, {1: 1.0, 3: 5.5})
SparseVector(4, {1: 1.0, 3: 5.5})
>>> Vectors.sparse(4, [(1, 1.0), (3, 5.5)])
SparseVector(4, {1: 1.0, 3: 5.5})
>>> Vectors.sparse(4, [1, 3], [1.0, 5.5])
SparseVector(4, {1: 1.0, 3: 5.5})
static squared_distance(v1: pyspark.mllib.linalg.Vector, v2: pyspark.mllib.linalg.Vector) → numpy.float64[source]

Squared distance between two vectors. a and b can be of type SparseVector, DenseVector, np.ndarray or array.array.

Examples

>>> a = Vectors.sparse(4, [(0, 1), (3, 4)])
>>> b = Vectors.dense([2, 5, 4, 1])
>>> a.squared_distance(b)
51.0
static stringify(vector: pyspark.mllib.linalg.Vector) → str[source]

Converts a vector into a string, which can be recognized by Vectors.parse().

Examples

>>> Vectors.stringify(Vectors.sparse(2, [1], [1.0]))
'(2,[1],[1.0])'
>>> Vectors.stringify(Vectors.dense([0.0, 1.0]))
'[0.0,1.0]'
static zeros(size: int)pyspark.mllib.linalg.DenseVector[source]