Package pyspark :: Module sql :: Class LocalHiveContext
[frames] | no frames]

Class LocalHiveContext

source code

SQLContext --+    
             |    
   HiveContext --+
                 |
                LocalHiveContext

Starts up an instance of hive where metadata is stored locally.

An in-process metadata data is created with data stored in ./metadata. Warehouse data is stored in in ./warehouse.

>>> import os
>>> hiveCtx = LocalHiveContext(sc)
>>> try:
...     supress = hiveCtx.sql("DROP TABLE src")
... except Exception:
...     pass
>>> kv1 = os.path.join(os.environ["SPARK_HOME"],
...        'examples/src/main/resources/kv1.txt')
>>> supress = hiveCtx.sql(
...     "CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
>>> supress = hiveCtx.sql("LOAD DATA LOCAL INPATH '%s' INTO TABLE src"
...        % kv1)
>>> results = hiveCtx.sql("FROM src SELECT value"
...      ).map(lambda r: int(r.value.split('_')[1]))
>>> num = results.count()
>>> reduce_sum = results.reduce(lambda x, y: x + y)
>>> num
500
>>> reduce_sum
130091
Instance Methods
 
__init__(self, sparkContext, sqlContext=None)
Create a new HiveContext.
source code

Inherited from HiveContext: hiveql, hql

Inherited from SQLContext: applySchema, cacheTable, inferSchema, jsonFile, jsonRDD, parquetFile, registerFunction, registerRDDAsTable, sql, table, uncacheTable

Method Details

__init__(self, sparkContext, sqlContext=None)
(Constructor)

source code 

Create a new HiveContext.

Parameters:
  • sparkContext - The SparkContext to wrap.
  • hiveContext - An optional JVM Scala HiveContext. If set, we do not instatiate a new HiveContext in the JVM, instead we make all calls to this object.
Overrides: SQLContext.__init__
(inherited documentation)