Spark2

Spark2 #

This documentation is a guide for using Table Store in Spark2.

Version #

Table Store supports Spark 2.4+. It is highly recommended to use Spark 2.4+ version with many improvements.

Preparing Table Store Jar File #

Download flink-table-store-spark2-0.3.0.jar.

You can also manually build bundled jar from the source code.

To build from source code, either download the source of a release or clone the git repository.

Build bundled jar with the following command.

mvn clean install -DskipTests

You can find the bundled jar in ./flink-table-store-spark2/target/flink-table-store-spark2-0.3.0.jar.

Quick Start #

If you are using HDFS, make sure that the environment variable HADOOP_HOME or HADOOP_CONF_DIR is set.

Step 1: Prepare Test Data

Table Store currently only supports reading tables through Spark2. To create a Table Store table with records, please follow our Flink quick start guide.

After the guide, all table files should be stored under the path /tmp/table_store, or the warehouse path you’ve specified.

Step 2: Specify Table Store Jar File

You can append path to table store jar file to the --jars argument when starting spark-shell.

spark-shell ... --jars /path/to/flink-table-store-spark2-0.3.0.jar

Alternatively, you can copy flink-table-store-spark2-0.3.0.jar under spark/jars in your Spark installation directory.

Step 3: Query Table

Table store with Spark 2.4 does not support DDL. You can use the Dataset reader and register the Dataset as a temporary table. In spark shell:

val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/word_count")
dataset.createOrReplaceTempView("word_count")
spark.sql("SELECT * FROM word_count").show()