Class DataFrameNaFunctions

Object
org.apache.spark.sql.api.DataFrameNaFunctions<Dataset>
org.apache.spark.sql.DataFrameNaFunctions

public final class DataFrameNaFunctions extends DataFrameNaFunctions<Dataset>
Functionality for working with missing data in DataFrames.

Since:
1.3.1
  • Method Summary

    Modifier and Type
    Method
    Description
    Returns a new DataFrame that drops rows containing any null or NaN values.
    drop(int minNonNulls)
    Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values.
    drop(int minNonNulls, String[] cols)
    Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
    drop(int minNonNulls, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.
    drop(String how)
    Returns a new DataFrame that drops rows containing null or NaN values.
    drop(String[] cols)
    Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
    drop(String how, String[] cols)
    Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
    drop(String how, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.
    drop(scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.
    fill(boolean value)
    Returns a new DataFrame that replaces null values in boolean columns with value.
    fill(boolean value, String[] cols)
    Returns a new DataFrame that replaces null values in specified boolean columns.
    fill(boolean value, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that replaces null values in specified boolean columns.
    fill(double value)
    Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
    fill(double value, String[] cols)
    Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
    fill(double value, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
    fill(long value)
    Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
    fill(long value, String[] cols)
    Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
    fill(long value, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns.
    fill(String value)
    Returns a new DataFrame that replaces null values in string columns with value.
    fill(String value, String[] cols)
    Returns a new DataFrame that replaces null values in specified string columns.
    fill(String value, scala.collection.immutable.Seq<String> cols)
    (Scala-specific) Returns a new DataFrame that replaces null values in specified string columns.
    fill(Map<String,Object> valueMap)
    Returns a new DataFrame that replaces null values.
    fill(scala.collection.immutable.Map<String,Object> valueMap)
    (Scala-specific) Returns a new DataFrame that replaces null values.
    <T> Dataset<Row>
    replace(String[] cols, Map<T,T> replacement)
    Replaces values matching keys in replacement map with the corresponding values.
    <T> Dataset<Row>
    replace(String col, Map<T,T> replacement)
    Replaces values matching keys in replacement map with the corresponding values.
    <T> Dataset<Row>
    replace(String col, scala.collection.immutable.Map<T,T> replacement)
    (Scala-specific) Replaces values matching keys in replacement map.
    <T> Dataset<Row>
    replace(scala.collection.immutable.Seq<String> cols, scala.collection.immutable.Map<T,T> replacement)
    (Scala-specific) Replaces values matching keys in replacement map.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • drop

      public Dataset<Row> drop()
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing any null or NaN values.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that drops rows containing any null or NaN values in the specified columns.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(String how, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.

      If how is "any", then drop rows containing any null or NaN values in the specified columns. If how is "all", then drop rows only if every specified column is null or NaN for that row.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      how - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(int minNonNulls, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      minNonNulls - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(String how)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing null or NaN values.

      If how is "any", then drop rows containing any null or NaN values. If how is "all", then drop rows only if every column is null or NaN for that row.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      how - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(String how, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that drops rows containing null or NaN values in the specified columns.

      If how is "any", then drop rows containing any null or NaN values in the specified columns. If how is "all", then drop rows only if every specified column is null or NaN for that row.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      how - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(int minNonNulls)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      minNonNulls - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • drop

      public Dataset<Row> drop(int minNonNulls, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null and non-NaN values in the specified columns.

      Overrides:
      drop in class DataFrameNaFunctions<Dataset>
      Parameters:
      minNonNulls - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(long value)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null or NaN values in numeric columns with value.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(double value)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null or NaN values in numeric columns with value.
      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(String value)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null values in string columns with value.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(long value, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(double value, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(String value, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(boolean value)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null values in boolean columns with value.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(boolean value, scala.collection.immutable.Seq<String> cols)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that replaces null values in specified boolean columns. If a specified column is not a boolean column, it is ignored.

      Specified by:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(long value, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(double value, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null or NaN values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(String value, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(boolean value, String[] cols)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null values in specified boolean columns. If a specified column is not a boolean column, it is ignored.

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      value - (undocumented)
      cols - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(Map<String,Object> valueMap)
      Description copied from class: DataFrameNaFunctions
      Returns a new DataFrame that replaces null values.

      The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Integer, Long, Float, Double, String, Boolean. Replacement values are cast to the column data type.

      For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.

      
         import com.google.common.collect.ImmutableMap;
         df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));
       

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      valueMap - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • fill

      public Dataset<Row> fill(scala.collection.immutable.Map<String,Object> valueMap)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Returns a new DataFrame that replaces null values.

      The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Int, Long, Float, Double, String, Boolean. Replacement values are cast to the column data type.

      For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.

      
         df.na.fill(Map(
           "A" -> "unknown",
           "B" -> 1.0
         ))
       

      Overrides:
      fill in class DataFrameNaFunctions<Dataset>
      Parameters:
      valueMap - (undocumented)
      Returns:
      (undocumented)
      Inheritdoc:
    • replace

      public <T> Dataset<Row> replace(String col, scala.collection.immutable.Map<T,T> replacement)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Replaces values matching keys in replacement map.

      
         // Replaces all occurrences of 1.0 with 2.0 in column "height".
         df.na.replace("height", Map(1.0 -> 2.0));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
         df.na.replace("name", Map("UNKNOWN" -> "unnamed"));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
         df.na.replace("*", Map("UNKNOWN" -> "unnamed"));
       

      Specified by:
      replace in class DataFrameNaFunctions<Dataset>
      Parameters:
      col - name of the column to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.
      replacement - value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.

      Returns:
      (undocumented)
      Inheritdoc:
    • replace

      public <T> Dataset<Row> replace(scala.collection.immutable.Seq<String> cols, scala.collection.immutable.Map<T,T> replacement)
      Description copied from class: DataFrameNaFunctions
      (Scala-specific) Replaces values matching keys in replacement map.

      
         // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
         df.na.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
         df.na.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed"));
       

      Specified by:
      replace in class DataFrameNaFunctions<Dataset>
      Parameters:
      cols - list of columns to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.
      replacement - value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.

      Returns:
      (undocumented)
      Inheritdoc:
    • replace

      public <T> Dataset<Row> replace(String col, Map<T,T> replacement)
      Description copied from class: DataFrameNaFunctions
      Replaces values matching keys in replacement map with the corresponding values.

      
         import com.google.common.collect.ImmutableMap;
      
         // Replaces all occurrences of 1.0 with 2.0 in column "height".
         df.na.replace("height", ImmutableMap.of(1.0, 2.0));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
         df.na.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
         df.na.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
       

      Overrides:
      replace in class DataFrameNaFunctions<Dataset>
      Parameters:
      col - name of the column to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.
      replacement - value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.

      Returns:
      (undocumented)
      Inheritdoc:
    • replace

      public <T> Dataset<Row> replace(String[] cols, Map<T,T> replacement)
      Description copied from class: DataFrameNaFunctions
      Replaces values matching keys in replacement map with the corresponding values.

      
         import com.google.common.collect.ImmutableMap;
      
         // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
         df.na.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0));
      
         // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
         df.na.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));
       

      Overrides:
      replace in class DataFrameNaFunctions<Dataset>
      Parameters:
      cols - list of columns to apply the value replacement. If col is "*", replacement is applied on all string, numeric or boolean columns.
      replacement - value replacement map. Key and value of replacement map must have the same type, and can only be doubles, strings or booleans. The map value can have nulls.

      Returns:
      (undocumented)
      Inheritdoc: