02.Hive的特点和基本操作

Hive的访问方式

1、在Hive客户端，配置hive到环境变量的前提下，在节点的任意位置直接数据hive + 回车
02.Hive的特点和基本操作 2、启动hiveserver2 服务
在节点上写入下面命令开启服务
hive --service hiveserver2

输入命令之后第一个窗口呈现加载状态开启新窗口
进行连接

进入beelin的shell窗口：
cd /export/servers/hive-1.1.0-cdh5.14.0/bin/
beeline
连接hiveserver2服务
!connect jdbc:hive2://node01:10000
输入root 和密码 123456
0: jdbc:hive2://node01:10000> 0: jdbc:hive2://node01:10000>
0: jdbc:hive2://node01:10000> show databases;
02.Hive的特点和基本操作

Hive传选项
hive -e ‘操作命令’
hive -e’show databases;’
hive -f 文件名（文件内是操作命令）
hive -f test.sql

数据库的基本操作(进入hive)创建数据库并指定hdfs存储位置(在hive里面每个命令结尾都需要;否则默认这个命令没有输完)
create database myhive2 location ‘/myhive2’;
1、数据库的增删改查
增： create database [ if not exists ] myhive ;
删： drop database myhive ; (数据库内没有表可以删除，有表不能删除)
改 :数据库不允许修改
查：show databases;
查看数据库基本信息：
desc database myhive2;
查看数据库更多详细信息：
desc database extended myhive2;
数据库的切换：
use （数据库名）；
hive的数据库、表、分区在HDFS的表现形式是文件夹
数据库的默认路径：/user/hive/warehouse
自定义hive数据库的路径：create database myhive2 location’/myhive2’;
数据表的基本操作（增删改查）
创建基本数据表（内部表）：
create table tableName(字段名称字段类型，字段名称字段类型)
create table tableName(字段名称字段类型，字段名称字段类型)
ROW FORMAT DELIMITED IELDS TERMINATED BY char （char分隔符）
指定数据中字段与字段的分隔符‘\t’或’,‘或’|'或其他
创建外部数据表：
create EXTERNAL table tableName(字段名称字段类型，字段名称字段类型)
建外部表需要指定数据的存储路径。通过LOCATION进行指定。内部表与外部表的区别：
在删除内部表时：内部表删除将表的元数据和数据同时删除。
在删除外部表时：外部表的元数据被删除，数据本身不删除。
删除表
drop table tablename;
修改表
alter tablename

查询表
show tables;
desc tablename;

Hive的数据类型
基本数据类型
INT BIGINT FLOAT DOUBLE DEICIMAL STRING VARCHAR CHAR BINARY TIMESTAMP DATE INTERVAL ARRAY
复杂数据类型
MAP STRUCT UNION
create table 旧表名 as select*from 新表名; 复制数据复试表结构
create table 旧表名like 新表名; 不复制数据复试表结构。
加载数据
从linux中加载数据到hive
load data local inpath‘数据路径’into table 表名；
从hdfs中加载数据到hive,并覆盖
load data inpath‘数据路径’overwrite into table 表名；
外部表
create external table techer (t_id string,t_name string) row format delimited fifields terminated by ‘\t’;
加载数据load data local inpath ‘/export/servers/hivedatas/techer .csv’ into table techer ;
在hdfs查看表中的数据
hadoop fs -ls /user/hive/warehouse/myhive.db/techer
在hive中查询
select * from techer
删除数据表techer
drop table techer;
再次查看
hadoop fs -ls /user/hive/warehouse/myhive.db/techer（数据依然存在）
内部表
create table student(t_id string,t_name string) row format delimited fifields terminated by ‘\t’;
加载数据
load data local inpath ‘/export/servers/hivedatas/student .csv’ into table student;
在hdfs查看表中的数据
hadoop fs -ls /user/hive/warehouse/myhive.db/student
在hive中查询
select * from student
删除数据表techer
drop table student;
再次查看
hadoop fs -ls /user/hive/warehouse/myhive.db/student（数据不存在）
分区表
企业常见的分区规则：按天进行分区（一天一个分区）
创建分区表的语句
create table score(s_id string,c_id string,s_score int) partitioned by (month string) row format delimitedfifield sterminated by ‘\t’;
create table score2 (s_id string,c_id string,s_score int) partitioned by (year string,month string,day string) row formatdelimited fifields terminated by ‘\t’;
数据加载
load data local inpath ‘/opt/hive/score.csv’ into table score partition (month=‘201806’);
load data local inpath ‘/opt/hive/score.csv’ into table score2 partition(year=‘2018’,month=‘06’,day=‘02’);
特别强调：
分区字段绝对不能出现在数据表以有的字段中。
作用：
将数据按区域划分开，查询时不用扫描无关的数据，加快查询速度。
分桶表
是在已有的表结构之上新添加了特殊的结构
开启hive的桶表功能
set hive.enforce.bucketing=true;
设置桶(reduce)的个数
set mapreduce.job.reduces=3;
建分桶表
create table course (c_id string,c_name string,t_id string) clustered by(c_id) into 3 buckets row format
delimited fifields terminated by ‘\t’;
创建基本表
create table course_common (c_id string,c_name string,t_id string) row format delimited fifields terminated by ‘\t’;
基本表添加数据
load data local inpath ‘/export/servers/hivedatas/course.csv’ into table course_common;
在基本表中查询数据插入到分桶表
insert overwrite table course select * from course_common cluster by(c_id);
确认分桶内的数据
[[email protected] hive]# hadoop fs -cat /user/hive/warehouse/course/000000_0 03 英语 03 [[email protected] hive]# hadoop fs -cat /user/hive/warehouse/course/000001_0 01 语文 02 [[email protected] hive]# hadoop fs -cat /user/hive/warehouse/course/000002_0 02 数学 01
特别强调：
分桶字段必须是表中的字段。
分桶逻辑：
对分桶字段求哈希值，用哈希值与分桶的数量取余，余几，这个数据就放在那个桶内

02.Hive的特点和基本操作

Hive的访问方式

相关推荐