pyspark.pandas.DataFrame.select_dtypes¶
-
DataFrame.
select_dtypes
(include: Union[str, List[str], None] = None, exclude: Union[str, List[str], None] = None) → pyspark.pandas.frame.DataFrame[source]¶ Return a subset of the DataFrame’s columns based on the column dtypes.
- Parameters
- include, excludescalar or list-like
A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied. It also takes Spark SQL DDL type strings, for instance, ‘string’ and ‘date’.
- Returns
- DataFrame
The subset of the frame including the dtypes in
include
and excluding the dtypes inexclude
.
- Raises
- ValueError
If both of
include
andexclude
are empty>>> df = ps.DataFrame({'a': [1, 2] * 3, ... 'b': [True, False] * 3, ... 'c': [1.0, 2.0] * 3}) >>> df.select_dtypes() Traceback (most recent call last): ... ValueError: at least one of include or exclude must be nonempty
If
include
andexclude
have overlapping elements>>> df = ps.DataFrame({'a': [1, 2] * 3, ... 'b': [True, False] * 3, ... 'c': [1.0, 2.0] * 3}) >>> df.select_dtypes(include='a', exclude='a') Traceback (most recent call last): ... ValueError: include and exclude overlap on {'a'}
Notes
To select datetimes, use
np.datetime64
,'datetime'
or'datetime64'
Examples
>>> df = ps.DataFrame({'a': [1, 2] * 3, ... 'b': [True, False] * 3, ... 'c': [1.0, 2.0] * 3, ... 'd': ['a', 'b'] * 3}, columns=['a', 'b', 'c', 'd']) >>> df a b c d 0 1 True 1.0 a 1 2 False 2.0 b 2 1 True 1.0 a 3 2 False 2.0 b 4 1 True 1.0 a 5 2 False 2.0 b
>>> df.select_dtypes(include='bool') b 0 True 1 False 2 True 3 False 4 True 5 False
>>> df.select_dtypes(include=['float64'], exclude=['int']) c 0 1.0 1 2.0 2 1.0 3 2.0 4 1.0 5 2.0
>>> df.select_dtypes(include=['int'], exclude=['float64']) a 0 1 1 2 2 1 3 2 4 1 5 2
>>> df.select_dtypes(exclude=['int']) b c d 0 True 1.0 a 1 False 2.0 b 2 True 1.0 a 3 False 2.0 b 4 True 1.0 a 5 False 2.0 b
Spark SQL DDL type strings can be used as well.
>>> df.select_dtypes(exclude=['string']) a b c 0 1 True 1.0 1 2 False 2.0 2 1 True 1.0 3 2 False 2.0 4 1 True 1.0 5 2 False 2.0