Spark2
This documentation is for an unreleased version of Apache Flink Table Store. We recommend you use the latest stable version.

Spark2 #

This documentation is a guide for using Table Store in Spark2.

Version #

Table Store supports Spark 2.4+. It is highly recommended to use Spark 2.4+ version with many improvements.

Preparing Table Store Jar File #

You are using an unreleased version of Table Store so you need to manually build bundled jar from the source code.

To build from source code, either download the source of a release or clone the git repository.

Build bundled jar with the following command.

mvn clean install -DskipTests

You can find the bundled jar in ./flink-table-store-spark/flink-table-store-spark-2/target/flink-table-store-spark-2-0.4-SNAPSHOT.jar.

Quick Start #

If you are using HDFS, make sure that the environment variable HADOOP_HOME or HADOOP_CONF_DIR is set.

Step 1: Prepare Test Data

Table Store currently only supports reading tables through Spark2. To create a Table Store table with records, please follow our Flink quick start guide.

After the guide, all table files should be stored under the path /tmp/table_store, or the warehouse path you’ve specified.

Step 2: Specify Table Store Jar File

You can append path to table store jar file to the --jars argument when starting spark-shell.

spark-shell ... --jars /path/to/flink-table-store-spark-2-0.4-SNAPSHOT.jar

Alternatively, you can copy flink-table-store-spark-2-0.4-SNAPSHOT.jar under spark/jars in your Spark installation directory.

Step 3: Query Table

Table store with Spark 2.4 does not support DDL. You can use the Dataset reader and register the Dataset as a temporary table. In spark shell:

val dataset = spark.read.format("tablestore").load("file:/tmp/table_store/default.db/word_count")
dataset.createOrReplaceTempView("word_count")
spark.sql("SELECT * FROM word_count").show()