Hadoop(Day09) -- Hive
一.Preface:
https://en.wikipedia.org/wiki/Apache_Hive
二.Installation:
1.Transfer tar.gz under /usr/local
2.tar xzvf
3.vi /etc/profile
4.source /etc/profile
5.mysql -uroot -proot
CREATE USER 'hive' IDENTIFIED BY 'hive';
grant all privileges on *.* to [email protected]"%" identified by "hive" with grant option;
FLUSH PRIVILEGES;
6.start-all.sh
7.hdfs dfs -mkdir /usr
hdfs dfs -mkdir /usr/hive/
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -chmod g+w /usr/hive/warehouse
hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp
8.schematool -initSchema -dbType mysql
Success:
三.Table:
1.Partition:
create table student_p(Sno int,Sname string,Sex string,Sage int,Sdept string) partitioned by(part string) row format delimited fields terminated by ','stored as textfile;
2.Bucket:***The distribution of value is determined by your value of clustered by() mod the number of buckets***
for example:
user table:
id(int) name(string)
0 Nat
2 Joe
3 Kay
4 Ann
a bucketed table with 4 buckets:
bucketed_users:
we will find that in 000000_0: 0 Nat
4 Joe
since 0 mod 4 = 0 and 4 mod 4 = 0;
in this way, there is no data in 000001_0;
Command:
create table stu_buck(sno int,sname string,sex string,sage int,sdept string)
clustered by(sno) sorted by(sno DESC) into 4 buckets
row format delimited fields terminated by ',';
set hive.enforce.bucketing = true;
insert into table stu_buck
select sno,sname,sex,sage,sdept from student distribute by(sno) sort by(sno asc);
四.Beeline:
1.hiveserver2
2.(1) beeline -u jdbc:hive2://hadoop:10000 -n root
(2) beeline --> !connect jdbc:hive2//hadoop:10000
五.Group by:
select * from test;
select id , sum(count) from test group by id;