pyspark.sql.functions.sentences

pyspark.sql.functions.sentences(string: ColumnOrName, language: Optional[ColumnOrName] = None, country: Optional[ColumnOrName] = None) → pyspark.sql.column.Column[source]

Splits a string into arrays of sentences, where each sentence is an array of words. The ‘language’ and ‘country’ arguments are optional, and if omitted, the default locale is used.

New in version 3.2.0.

Parameters
stringColumn or str

a string to be split

languageColumn or str, optional

a language of the locale

countryColumn or str, optional

a country of the locale

Examples

>>> df = spark.createDataFrame([["This is an example sentence."]], ["string"])
>>> df.select(sentences(df.string, lit("en"), lit("US"))).show(truncate=False)
+-----------------------------------+
|sentences(string, en, US)          |
+-----------------------------------+
|[[This, is, an, example, sentence]]|
+-----------------------------------+