Normalizer#
- class pyspark.mllib.feature.Normalizer(p=2.0)[source]#
- Normalizes samples individually to unit Lp norm - For any 1 <= p < float(‘inf’), normalizes samples using sum(abs(vector) p) (1/p) as norm. - For p = float(‘inf’), max(abs(vector)) will be used as norm for normalization. - New in version 1.2.0. - Parameters
- pfloat, optional
- Normalization in L^p^ space, p = 2 by default. 
 
 - Examples - >>> from pyspark.mllib.linalg import Vectors >>> v = Vectors.dense(range(3)) >>> nor = Normalizer(1) >>> nor.transform(v) DenseVector([0.0, 0.3333, 0.6667]) - >>> rdd = sc.parallelize([v]) >>> nor.transform(rdd).collect() [DenseVector([0.0, 0.3333, 0.6667])] - >>> nor2 = Normalizer(float("inf")) >>> nor2.transform(v) DenseVector([0.0, 0.5, 1.0]) - Methods - transform(vector)- Applies unit length normalization on a vector. - Methods Documentation - transform(vector)[source]#
- Applies unit length normalization on a vector. - New in version 1.2.0. - Parameters
- vectorpyspark.mllib.linalg.Vectororpyspark.RDD
- vector or RDD of vector to be normalized. 
 
- vector
- Returns
- pyspark.mllib.linalg.Vectoror- pyspark.RDD
- normalized vector(s). If the norm of the input is zero, it will return the input vector.