pyspark.sql.Catalog.refreshByPath#
- Catalog.refreshByPath(path)[source]#
- Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path. - New in version 2.2.0. - Parameters
- pathstr
- the path to refresh the cache. 
 
 - Examples - The example below caches a table, and then removes the data. - >>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="refreshByPath") as d: ... _ = spark.sql("DROP TABLE IF EXISTS tbl1") ... _ = spark.sql( ... "CREATE TABLE tbl1 (col STRING) USING TEXT LOCATION '{}'".format(d)) ... _ = spark.sql("INSERT INTO tbl1 SELECT 'abc'") ... spark.catalog.cacheTable("tbl1") ... spark.table("tbl1").show() +---+ |col| +---+ |abc| +---+ - Because the table is cached, it computes from the cached data as below. - >>> spark.table("tbl1").count() 1 - After refreshing the table by path, it shows 0 because the data does not exist anymore. - >>> spark.catalog.refreshByPath(d) >>> spark.table("tbl1").count() 0 - >>> _ = spark.sql("DROP TABLE tbl1")