pyspark.sql.functions.sentences#
- pyspark.sql.functions.sentences(string, language=None, country=None)[source]#
- Splits a string into arrays of sentences, where each sentence is an array of words. The language and country arguments are optional, When they are omitted: 1.If they are both omitted, the Locale.ROOT - locale(language=’’, country=’’) is used. The Locale.ROOT is regarded as the base locale of all locales, and is used as the language/country neutral locale for the locale sensitive operations. 2.If the country is omitted, the locale(language, country=’’) is used. When they are null: 1.If they are both null, the Locale.US - locale(language=’en’, country=’US’) is used. 2.If the language is null and the country is not null, the Locale.US - locale(language=’en’, country=’US’) is used. 3.If the language is not null and the country is null, the locale(language) is used. 4.If neither is null, the locale(language, country) is used. - New in version 3.2.0. - Changed in version 3.4.0: Supports Spark Connect. - Changed in version 4.0.0: Supports sentences(string, language). - Parameters
- Returns
- Column
- arrays of split sentences. 
 
 - Examples - >>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("This is an example sentence.", )], ["s"]) >>> df.select("*", sf.sentences(df.s, sf.lit("en"), sf.lit("US"))).show(truncate=False) +----------------------------+-----------------------------------+ |s |sentences(s, en, US) | +----------------------------+-----------------------------------+ |This is an example sentence.|[[This, is, an, example, sentence]]| +----------------------------+-----------------------------------+ - >>> df.select("*", sf.sentences(df.s, sf.lit("en"))).show(truncate=False) +----------------------------+-----------------------------------+ |s |sentences(s, en, ) | +----------------------------+-----------------------------------+ |This is an example sentence.|[[This, is, an, example, sentence]]| +----------------------------+-----------------------------------+ - >>> df.select("*", sf.sentences(df.s)).show(truncate=False) +----------------------------+-----------------------------------+ |s |sentences(s, , ) | +----------------------------+-----------------------------------+ |This is an example sentence.|[[This, is, an, example, sentence]]| +----------------------------+-----------------------------------+