hive介绍

hive介绍

hive底层依赖与hdfs和MapReduce，数据存放在hdfs上，运算实现是通过MapReduce。hive本身起到“翻译”的作用，同时也管理一部分元数据metastore。driver里边用来翻译sql语句成为MR执行。

hive使用之前，需要配置MySQL来存储元数据，因为hive在不同目录下创建数据表，则生成的数据表是在当前路径下，所以，为了避免不同的人物操作过程出现的失误，需要将数据统一到MySQL中操作。

hive操作语句：

1、create table page_view(viewTime Int,userid BigInt)

后边还需要指定以什么为分隔符等

row format delimited

fields terminated by '\t'；

2、create table tab_ip_seq(id int ,name string,ip string,country string)

3、insert overwrite table tab_ip_seq select * from tab_ext;

4、drop table 表名

5、load data local inpath '文件地址' into table 表名

6、select count(*) from 表名：通过调用MapReduce程序来统计

外部表建立：

create external table 表名 (id int,name string ,rongliang string,price double)

row format delimited fields terminated by '\t'

location '本地路径';

drop external :drop外部表则只是删除该表的元数据，表内的数据保留。

分区：

create table 表名(id int,name string , ip string ,country string )

partitioned by (part_flag string) 里边可以是表中定义的任何一个字段

row format delimited fields terminated by ' , ';