pyspark.pandas.Series.str.split¶
- 
str.split(pat: Optional[str] = None, n: int = - 1, expand: bool = False) → Union[pyspark.pandas.series.Series, pyspark.pandas.frame.DataFrame]¶
- Split strings around given separator/delimiter. - Splits the string in the Series from the beginning, at the specified delimiter string. Equivalent to - str.split().- Parameters
- patstr, optional
- String or regular expression to split on. If not specified, split on whitespace. 
- nint, default -1 (all)
- Limit number of splits in output. None, 0 and -1 will be interpreted as return all splits. 
- expandbool, default False
- Expand the split strings into separate columns. - If - True, n must be a positive integer, and return DataFrame expanding dimensionality.
- If - False, return Series, containing lists of strings.
 
 
- Returns
- Series, DataFrame
- Type matches caller unless expand=True (see Notes). 
 
 - See also - str.rsplit
- Splits string around given separator/delimiter, starting from the right. 
- str.join
- Join lists contained as elements in the Series/Index with passed delimiter. 
 - Notes - The handling of the n keyword depends on the number of found splits: - If found splits > n, make first n splits only 
- If found splits <= n, make all splits 
- If for a certain row the number of found splits < n, append None for padding up to n if - expand=True
 - If using - expand=True, Series callers return DataFrame objects with n + 1 columns.- Note - Even if n is much larger than found splits, the number of columns does NOT shrink unlike pandas. - Examples - >>> s = ps.Series(["this is a regular sentence", ... "https://docs.python.org/3/tutorial/index.html", ... np.nan]) - In the default setting, the string is split by whitespace. - >>> s.str.split() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object - Without the n parameter, the outputs of rsplit and split are identical. - >>> s.str.rsplit() 0 [this, is, a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object - The n parameter can be used to limit the number of splits on the delimiter. The outputs of split and rsplit are different. - >>> s.str.split(n=2) 0 [this, is, a regular sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object - >>> s.str.rsplit(n=2) 0 [this is a, regular, sentence] 1 [https://docs.python.org/3/tutorial/index.html] 2 None dtype: object - The pat parameter can be used to split by other characters. - >>> s.str.split(pat = "/") 0 [this is a regular sentence] 1 [https:, , docs.python.org, 3, tutorial, index... 2 None dtype: object - When using - expand=True, the split elements will expand out into separate columns. If NaN is present, it is propagated throughout the columns during the split.- >>> s.str.split(n=4, expand=True) 0 1 2 3 4 0 this is a regular sentence 1 https://docs.python.org/3/tutorial/index.html None None None None 2 None None None None None - For slightly more complex use cases like splitting the html document name from a url, a combination of parameter settings can be used. - >>> s.str.rsplit("/", n=1, expand=True) 0 1 0 this is a regular sentence None 1 https://docs.python.org/3/tutorial index.html 2 None None - Remember to escape special characters when explicitly using regular expressions. - >>> s = ps.Series(["1+1=2"]) >>> s.str.split(r"\+|=", n=2, expand=True) 0 1 2 0 1 1 2