Reading & Writing Hive Tables

本文档是 Apache Flink 的旧版本。建议访问最新的稳定版本。

Using the HiveCatalog and Flink’s connector to Hive, Flink can read and write from Hive data as an alternative to Hive’s batch engine. Be sure to follow the instructions to include the correct dependencies in your application.

Reading From Hive
Writing To Hive
- Limitations

Reading From Hive

Assume Hive contains a single table in its default database, named people that contains several rows.

hive> show databases;
OK
default
Time taken: 0.841 seconds, Fetched: 1 row(s)

hive> show tables;
OK
Time taken: 0.087 seconds

hive> CREATE TABLE mytable(name string, value double);
OK
Time taken: 0.127 seconds

hive> SELECT * FROM mytable;
OK
Tom   4.72
John  8.0
Tom   24.2
Bob   3.14
Bob   4.72
Tom   34.9
Mary  4.79
Tiff  2.72
Bill  4.33
Mary  77.7
Time taken: 0.097 seconds, Fetched: 10 row(s)

With the data ready your can connect to Hive connect to an existing Hive installation and begin querying.

Flink SQL> show catalogs;
myhive
default_catalog

# ------ Set the current catalog to be 'myhive' catalog if you haven't set it in the yaml file ------

Flink SQL> use catalog myhive;

# ------ See all registered database in catalog 'mytable' ------

Flink SQL> show databases;
default

# ------ See the previously registered table 'mytable' ------

Flink SQL> show tables;
mytable

# ------ The table schema that Flink sees is the same that we created in Hive, two columns - name as string and value as double ------ 
Flink SQL> describe mytable;
root
 |-- name: name
 |-- type: STRING
 |-- name: value
 |-- type: DOUBLE


Flink SQL> SELECT * FROM mytable;

   name      value
__________ __________

    Tom      4.72
    John     8.0
    Tom      24.2
    Bob      3.14
    Bob      4.72
    Tom      34.9
    Mary     4.79
    Tiff     2.72
    Bill     4.33
    Mary     77.7

Writing To Hive

Similarly, data can be written into hive using an INSERT INTO clause.

Flink SQL> INSERT INTO mytable (name, value) VALUES ('Tom', 4.72);

Limitations

The following is a list of major limitations of the Hive connector. And we’re actively working to close these gaps.

INSERT OVERWRITE is not supported.
Inserting into partitioned tables is not supported.
ACID tables are not supported.
Bucketed tables are not supported.
Some data types are not supported. See the limitations for details.
Only a limited number of table storage formats have been tested, namely text, SequenceFile, ORC, and Parquet.
Views are not supported.