pyspark.pandas.Series.dot

Series.dot(other: Union[Series, pyspark.pandas.frame.DataFrame]) → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, pyspark.pandas.series.Series][source]

Compute the dot product between the Series and the columns of other.

This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame.

It can also be called using self @ other in Python >= 3.5.

Note

This API is slightly different from pandas when indexes from both Series are not aligned. To match with pandas’, it requires to read the whole data for, for example, counting. pandas raises an exception; however, pandas-on-Spark just proceeds and performs by ignoring mismatches with NaN permissively.

>>> pdf1 = pd.Series([1, 2, 3], index=[0, 1, 2])
>>> pdf2 = pd.Series([1, 2, 3], index=[0, 1, 3])
>>> pdf1.dot(pdf2)  
...
ValueError: matrices are not aligned
>>> psdf1 = ps.Series([1, 2, 3], index=[0, 1, 2])
>>> psdf2 = ps.Series([1, 2, 3], index=[0, 1, 3])
>>> psdf1.dot(psdf2)  
5
Parameters
otherSeries, DataFrame.

The other object to compute the dot product with its columns.

Returns
scalar, Series

Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each rows of other if other is a DataFrame.

Notes

The Series and other has to share the same index if other is a Series or a DataFrame.

Examples

>>> s = ps.Series([0, 1, 2, 3])
>>> s.dot(s)
14
>>> s @ s
14
>>> psdf = ps.DataFrame({'x': [0, 1, 2, 3], 'y': [0, -1, -2, -3]})
>>> psdf
   x  y
0  0  0
1  1 -1
2  2 -2
3  3 -3
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     s.dot(psdf)
...
x    14
y   -14
dtype: int64