Hive基础之导出数据
我们在hive中分析完数据后有时候需要将结果导出到本地文件以供其他用途。hive中提供了几种方式来导出结果,下面就详细介绍一些这几种导出数据的方法。
1.导出到本地文件目录
hive (default)> insert overwrite local directory '/opt/datas/hive_exp_emp'
> select * from default.emp ;
Query ID = hive_20190217171414_39007610-8451-4c28-86a1-928f6f3aec5c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1550060164760_0005, Tracking URL = http://node1:8088/proxy/application_1550060164760_0005/
Kill Command = /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1550060164760_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-02-17 17:15:01,010 Stage-1 map = 0%, reduce = 0%
2019-02-17 17:15:26,922 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.91 sec
MapReduce Total cumulative CPU time: 2 seconds 910 msec
Ended Job = job_1550060164760_0005
Copying data to local directory /opt/datas/hive_exp_emp
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.91 sec HDFS Read: 4776 HDFS Write: 661 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 910 msec
OK
empno ename job mgr hiredate sal comm deptno
Time taken: 76.089 seconds
查看一下在本地目录保存的文件:
[[email protected] datas]# cd /opt/datas/hive_exp_emp
[[email protected] hive_exp_emp]# ll
total 4
-rw-r--r-- 1 hive hive 661 Feb 17 17:15 000000_0
[[email protected] hive_exp_emp]# cat 000000_0
7369SMITHCLERK79021980-12-17800.0\N20
7499ALLENSALESMAN76981981-2-201600.0300.030
7521WARDSALESMAN76981981-2-221250.0500.030
7566JONESMANAGER78391981-4-22975.0\N20
7654MARTINSALESMAN76981981-9-281250.01400.030
7698BLAKEMANAGER78391981-5-12850.0\N30
7782CLARKMANAGER78391981-6-92450.0\N10
7788SCOTTANALYST75661987-4-193000.0\N20
7839KINGPRESIDENT\N1981-11-175000.0\N10
7844TURNERSALESMAN76981981-9-81500.00.030
7876ADAMSCLERK77881987-5-231100.0\N20
7900JAMESCLERK76981981-12-3950.0\N30
7902FORDANALYST75661981-12-33000.0\N20
7934MILLERCLERK77821982-1-231300.0\N10
可以看出,导出后保存的文件名称为000000_0,导出是以默认的分隔符来分隔数据的。
上面的导出方式没有加入格式,我们可以使用创建表时的格式语法来定义导出的数据格式:
insert overwrite local directory '/opt/datas/hive_exp_emp2'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY '\n'
select * from default.emp ;
执行结果如下:
hive (default)> insert overwrite local directory '/opt/datas/hive_exp_emp2'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY '\n'
> select * from default.emp ;
Query ID = hive_20190217172424_622c001a-7ca4-4a49-94dc-468cdd6a0475
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1550060164760_0006, Tracking URL = http://node1:8088/proxy/application_1550060164760_0006/
Kill Command = /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1550060164760_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-02-17 17:24:43,307 Stage-1 map = 0%, reduce = 0%
2019-02-17 17:25:38,391 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.55 sec
MapReduce Total cumulative CPU time: 1 seconds 550 msec
Ended Job = job_1550060164760_0006
Copying data to local directory /opt/datas/hive_exp_emp2
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.55 sec HDFS Read: 4814 HDFS Write: 661 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 550 msec
OK
empno ename job mgr hiredate sal comm deptno
Time taken: 95.151 seconds
查看一下结果:
[[email protected] hive_exp_emp]# cd /opt/datas/hive_exp_emp2
[[email protected] hive_exp_emp2]# ls
000000_0
[[email protected] hive_exp_emp2]# cat 000000_0
7369 SMITH CLERK 7902 1980-12-17 800.0 \N 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 \N 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 \N 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 \N 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 \N 20
7839 KING PRESIDENT \N 1981-11-17 5000.0 \N 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 \N 20
7900 JAMES CLERK 7698 1981-12-3 950.0 \N 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 \N 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 \N 10
可以看出,查询导出来的结果是以制表符来进行分割的。
2.命令行导出
还有一种方式是直接使用hive命令,加入参数-e来导出到本地文件。
[[email protected] datas]# sudo -u hive hive -e "select * from default.emp ;" > /opt/datas/exp_res.txt
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/jars/hive-common-1.1.0-cdh5.15.2.jar!/hive-log4j.properties
OK
Time taken: 3.743 seconds, Fetched: 14 row(s)
[[email protected] datas]# ls
dept.txt emp.txt exp_res.txt hive_exp_emp hive_exp_emp2
[[email protected] datas]# cat exp_res.txt
empno ename job mgr hiredate sal comm deptno
7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30
7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
可以看出,通过命令的方式也可以将hive表中的结果导出到本地表。
3.导出到hdfs文件系统
第三种方式是将查询结果保存在hdfs文件系统中,只需要将local关键字去除即可。
hive (default)> insert overwrite directory '/user/hive/user1/hive_exp_emp'
> select * from default.emp ;
Query ID = hive_20190217173939_19c69a15-2435-409f-b4d4-b12fa12def18
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1550060164760_0007, Tracking URL = http://node1:8088/proxy/application_1550060164760_0007/
Kill Command = /opt/cloudera/parcels/CDH-5.15.2-1.cdh5.15.2.p0.3/lib/hadoop/bin/hadoop job -kill job_1550060164760_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-02-17 17:40:09,266 Stage-1 map = 0%, reduce = 0%
2019-02-17 17:40:34,981 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.51 sec
MapReduce Total cumulative CPU time: 2 seconds 510 msec
Ended Job = job_1550060164760_0007
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://node1:8020/user/hive/user1/hive_exp_emp/.hive-staging_hive_2019-02-17_17-39-22_500_7423280990058179421-1/-ext-10000
Moving data to: /user/hive/user1/hive_exp_emp
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.51 sec HDFS Read: 4693 HDFS Write: 661 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 510 msec
OK
empno ename job mgr hiredate sal comm deptno
Time taken: 75.012 seconds
hive (default)> dfs -ls /user/hive/user1/hive_exp_emp ;
Found 1 items
-rwxrwxr-t 3 hive hive 661 2019-02-17 17:40 /user/hive/user1/hive_exp_emp/000000_0
需要将导出的文件从hdfs上下载到本地才能够进行查看:
[[email protected] datas]# sudo -u hive hdfs dfs -get /user/hive/user1/hive_exp_emp/000000_0
[[email protected] datas]# ls
000000_0 dept.txt emp.txt exp_res.txt hive_exp_emp hive_exp_emp2
[[email protected] datas]# cat 000000_0
7369SMITHCLERK79021980-12-17800.0\N20
7499ALLENSALESMAN76981981-2-201600.0300.030
7521WARDSALESMAN76981981-2-221250.0500.030
7566JONESMANAGER78391981-4-22975.0\N20
7654MARTINSALESMAN76981981-9-281250.01400.030
7698BLAKEMANAGER78391981-5-12850.0\N30
7782CLARKMANAGER78391981-6-92450.0\N10
7788SCOTTANALYST75661987-4-193000.0\N20
7839KINGPRESIDENT\N1981-11-175000.0\N10
7844TURNERSALESMAN76981981-9-81500.00.030
7876ADAMSCLERK77881987-5-231100.0\N20
7900JAMESCLERK76981981-12-3950.0\N30
7902FORDANALYST75661981-12-33000.0\N20
7934MILLERCLERK77821982-1-231300.0\N10
4.通过Sqoop工具
另外还可以使用Sqoop工具将hive中的数据导入到型数据库中,该工具也可以将关系型数据导入到hive或hdfs中,后面会详细介绍该工具。
更多有关大数据的内容请关注微信公众号:大数据与人工智能初学者
扫描下面的二维码即可关注: