org.apache.spark.sql
Class DataFrameNaFunctions

Object
  extended by org.apache.spark.sql.DataFrameNaFunctions

public final class DataFrameNaFunctions
extends Object

:: Experimental :: Functionality for working with missing data in DataFrames.

Since:
1.3.1

Method Summary
 DataFrame drop()
          Returns a new DataFrame that drops rows containing any null values.
 DataFrame drop(int minNonNulls)
          Returns a new DataFrame that drops rows containing less than minNonNulls non-null values.
 DataFrame drop(int minNonNulls, scala.collection.Seq<String> cols)
          (Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null values in the specified columns.
 DataFrame drop(int minNonNulls, String[] cols)
          Returns a new DataFrame that drops rows containing less than minNonNulls non-null values in the specified columns.
 DataFrame drop(scala.collection.Seq<String> cols)
          (Scala-specific) Returns a new DataFrame that drops rows containing any null values in the specified columns.
 DataFrame drop(String how)
          Returns a new DataFrame that drops rows containing null values.
 DataFrame drop(String[] cols)
          Returns a new DataFrame that drops rows containing any null values in the specified columns.
 DataFrame drop(String how, scala.collection.Seq<String> cols)
          (Scala-specific) Returns a new DataFrame that drops rows containing null values in the specified columns.
 DataFrame drop(String how, String[] cols)
          Returns a new DataFrame that drops rows containing null values in the specified columns.
 DataFrame fill(double value)
          Returns a new DataFrame that replaces null values in numeric columns with value.
 DataFrame fill(double value, scala.collection.Seq<String> cols)
          (Scala-specific) Returns a new DataFrame that replaces null values in specified numeric columns.
 DataFrame fill(double value, String[] cols)
          Returns a new DataFrame that replaces null values in specified numeric columns.
 DataFrame fill(java.util.Map<String,Object> valueMap)
          Returns a new DataFrame that replaces null values.
 DataFrame fill(scala.collection.immutable.Map<String,Object> valueMap)
          (Scala-specific) Returns a new DataFrame that replaces null values.
 DataFrame fill(String value)
          Returns a new DataFrame that replaces null values in string columns with value.
 DataFrame fill(String value, scala.collection.Seq<String> cols)
          (Scala-specific) Returns a new DataFrame that replaces null values in specified string columns.
 DataFrame fill(String value, String[] cols)
          Returns a new DataFrame that replaces null values in specified string columns.
<T> DataFrame
replace(scala.collection.Seq<String> cols, scala.collection.immutable.Map<T,T> replacement)
          (Scala-specific) Replaces values matching keys in replacement map.
<T> DataFrame
replace(String[] cols, java.util.Map<T,T> replacement)
          Replaces values matching keys in replacement map with the corresponding values.
<T> DataFrame
replace(String col, java.util.Map<T,T> replacement)
          Replaces values matching keys in replacement map with the corresponding values.
<T> DataFrame
replace(String col, scala.collection.immutable.Map<T,T> replacement)
          (Scala-specific) Replaces values matching keys in replacement map.
 
Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

drop

public DataFrame drop()
Returns a new DataFrame that drops rows containing any null values.

Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(String how)
Returns a new DataFrame that drops rows containing null values.

If how is "any", then drop rows containing any null values. If how is "all", then drop rows only if every column is null for that row.

Parameters:
how - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(String[] cols)
Returns a new DataFrame that drops rows containing any null values in the specified columns.

Parameters:
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(scala.collection.Seq<String> cols)
(Scala-specific) Returns a new DataFrame that drops rows containing any null values in the specified columns.

Parameters:
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(String how,
                      String[] cols)
Returns a new DataFrame that drops rows containing null values in the specified columns.

If how is "any", then drop rows containing any null values in the specified columns. If how is "all", then drop rows only if every specified column is null for that row.

Parameters:
how - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(String how,
                      scala.collection.Seq<String> cols)
(Scala-specific) Returns a new DataFrame that drops rows containing null values in the specified columns.

If how is "any", then drop rows containing any null values in the specified columns. If how is "all", then drop rows only if every specified column is null for that row.

Parameters:
how - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(int minNonNulls)
Returns a new DataFrame that drops rows containing less than minNonNulls non-null values.

Parameters:
minNonNulls - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(int minNonNulls,
                      String[] cols)
Returns a new DataFrame that drops rows containing less than minNonNulls non-null values in the specified columns.

Parameters:
minNonNulls - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

drop

public DataFrame drop(int minNonNulls,
                      scala.collection.Seq<String> cols)
(Scala-specific) Returns a new DataFrame that drops rows containing less than minNonNulls non-null values in the specified columns.

Parameters:
minNonNulls - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(double value)
Returns a new DataFrame that replaces null values in numeric columns with value.

Parameters:
value - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(String value)
Returns a new DataFrame that replaces null values in string columns with value.

Parameters:
value - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(double value,
                      String[] cols)
Returns a new DataFrame that replaces null values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

Parameters:
value - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(double value,
                      scala.collection.Seq<String> cols)
(Scala-specific) Returns a new DataFrame that replaces null values in specified numeric columns. If a specified column is not a numeric column, it is ignored.

Parameters:
value - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(String value,
                      String[] cols)
Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.

Parameters:
value - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(String value,
                      scala.collection.Seq<String> cols)
(Scala-specific) Returns a new DataFrame that replaces null values in specified string columns. If a specified column is not a string column, it is ignored.

Parameters:
value - (undocumented)
cols - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(java.util.Map<String,Object> valueMap)
Returns a new DataFrame that replaces null values.

The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Integer, Long, Float, Double, String.

For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.


   import com.google.common.collect.ImmutableMap;
   df.na.fill(ImmutableMap.of("A", "unknown", "B", 1.0));
 

Parameters:
valueMap - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

fill

public DataFrame fill(scala.collection.immutable.Map<String,Object> valueMap)
(Scala-specific) Returns a new DataFrame that replaces null values.

The key of the map is the column name, and the value of the map is the replacement value. The value must be of the following type: Int, Long, Float, Double, String.

For example, the following replaces null values in column "A" with string "unknown", and null values in column "B" with numeric value 1.0.


   df.na.fill(Map(
     "A" -> "unknown",
     "B" -> 1.0
   ))
 

Parameters:
valueMap - (undocumented)
Returns:
(undocumented)
Since:
1.3.1

replace

public <T> DataFrame replace(String col,
                             java.util.Map<T,T> replacement)
Replaces values matching keys in replacement map with the corresponding values. Key and value of replacement map must have the same type, and can only be doubles or strings. If col is "*", then the replacement is applied on all string columns or numeric columns.


   import com.google.common.collect.ImmutableMap;

   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.replace("height", ImmutableMap.of(1.0, 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.replace("name", ImmutableMap.of("UNKNOWN", "unnamed"));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.replace("*", ImmutableMap.of("UNKNOWN", "unnamed"));
 

Parameters:
col - name of the column to apply the value replacement
replacement - value replacement map, as explained above

Returns:
(undocumented)
Since:
1.3.1

replace

public <T> DataFrame replace(String[] cols,
                             java.util.Map<T,T> replacement)
Replaces values matching keys in replacement map with the corresponding values. Key and value of replacement map must have the same type, and can only be doubles or strings.


   import com.google.common.collect.ImmutableMap;

   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.replace(new String[] {"height", "weight"}, ImmutableMap.of(1.0, 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.replace(new String[] {"firstname", "lastname"}, ImmutableMap.of("UNKNOWN", "unnamed"));
 

Parameters:
cols - list of columns to apply the value replacement
replacement - value replacement map, as explained above

Returns:
(undocumented)
Since:
1.3.1

replace

public <T> DataFrame replace(String col,
                             scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in replacement map. Key and value of replacement map must have the same type, and can only be doubles or strings. If col is "*", then the replacement is applied on all string columns or numeric columns.


   // Replaces all occurrences of 1.0 with 2.0 in column "height".
   df.replace("height", Map(1.0 -> 2.0))

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "name".
   df.replace("name", Map("UNKNOWN" -> "unnamed")

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in all string columns.
   df.replace("*", Map("UNKNOWN" -> "unnamed")
 

Parameters:
col - name of the column to apply the value replacement
replacement - value replacement map, as explained above

Returns:
(undocumented)
Since:
1.3.1

replace

public <T> DataFrame replace(scala.collection.Seq<String> cols,
                             scala.collection.immutable.Map<T,T> replacement)
(Scala-specific) Replaces values matching keys in replacement map. Key and value of replacement map must have the same type, and can only be doubles or strings.


   // Replaces all occurrences of 1.0 with 2.0 in column "height" and "weight".
   df.replace("height" :: "weight" :: Nil, Map(1.0 -> 2.0));

   // Replaces all occurrences of "UNKNOWN" with "unnamed" in column "firstname" and "lastname".
   df.replace("firstname" :: "lastname" :: Nil, Map("UNKNOWN" -> "unnamed");
 

Parameters:
cols - list of columns to apply the value replacement
replacement - value replacement map, as explained above

Returns:
(undocumented)
Since:
1.3.1