pyspark.pandas.Series.to_hdf#
- Series.to_hdf(path_or_buf, key, mode='a', complevel=None, complib=None, append=False, format=None, index=True, min_itemsize=None, nan_rep=None, dropna=None, data_columns=None, errors='strict', encoding='UTF-8')#
Write the contained data to an HDF5 file using HDFStore.
Note
This method should only be used if the resulting DataFrame is expected to be small, as all the data is loaded into the driver’s memory.
New in version 4.0.0.
- Parameters
- path_or_bufstr or pandas.HDFStore
File path or HDFStore object.
- keystr
Identifier for the group in the store.
- mode{‘a’, ‘w’, ‘r+’}, default ‘a’
Mode to open file:
‘w’: write, a new file is created (an existing file with the same name would be deleted).
‘a’: append, an existing file is opened for reading and writing, and if the file does not exist it is created.
‘r+’: similar to ‘a’, but the file must already exist.
- complevel{0-9}, default None
Specifies a compression level for data. A value of 0 or None disables compression.
- complib{‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’}, default ‘zlib’
Specifies the compression library to be used. These additional compressors for Blosc are supported (default if no compressor specified: ‘blosc:blosclz’): {‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’}. Specifying a compression library which is not available issues a ValueError.
- appendbool, default False
For Table formats, append the input data to the existing.
- format{‘fixed’, ‘table’, None}, default ‘fixed’
Possible values:
‘fixed’: Fixed format. Fast writing/reading. Not-appendable, nor searchable.
‘table’: Table format. Write as a PyTables Table structure which may perform worse but allow more flexible operations like searching / selecting subsets of the data.
If None, pd.get_option(‘io.hdf.default_format’) is checked, followed by fallback to “fixed”.
- indexbool, default True
Write DataFrame index as a column.
- min_itemsizedict or int, optional
Map column names to minimum string sizes for columns.
- nan_repAny, optional
How to represent null values as str. Not allowed with append=True.
- dropnabool, default False, optional
Remove missing values.
- data_columnslist of columns or True, optional
List of columns to create as indexed data columns for on-disk queries, or True to use all columns. By default only the axes of the object are indexed. Applicable only to format=’table’.
- errorsstr, default ‘strict’
Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()
for a full list of options.- encodingstr, default “UTF-8”
See also
DataFrame.to_orc
Write a DataFrame to the binary orc format.
DataFrame.to_parquet
Write a DataFrame to the binary parquet format.
DataFrame.to_csv
Write out to a csv file.
Examples
>>> df = ps.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, ... index=['a', 'b', 'c']) >>> df.to_hdf('data.h5', key='df', mode='w')
We can add another object to the same file:
>>> s = ps.Series([1, 2, 3, 4]) >>> s.to_hdf('data.h5', key='s')