Hadoop(Day09) -- Hive

一.Preface:

https://en.wikipedia.org/wiki/Apache_Hive

Hadoop(Day09) -- Hive


二.Installation:

1.Transfer tar.gz under /usr/local

2.tar xzvf 

3.vi /etc/profile

4.source /etc/profile

5.mysql -uroot -proot

CREATE USER 'hive' IDENTIFIED BY 'hive';

grant all privileges on *.* to [email protected]"%" identified by "hive" with grant option;  

FLUSH PRIVILEGES;

6.start-all.sh

7.hdfs dfs -mkdir /usr
hdfs dfs -mkdir /usr/hive/

hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -chmod g+w /usr/hive/warehouse

hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp

8.schematool -initSchema -dbType mysql

Success:

Hadoop(Day09) -- Hive


三.Table:

1.Partition:

create table student_p(Sno int,Sname string,Sex string,Sage int,Sdept string) partitioned by(part string) row format delimited fields terminated by ','stored as textfile;

2.Bucket:

Hadoop(Day09) -- Hive


***The distribution of value is determined by your value of clustered by() mod the number of buckets***

for example:

user table:

id(int) name(string)

0        Nat

2        Joe

3        Kay

4        Ann


a bucketed table with 4 buckets:

bucketed_users:

we will find that in 000000_0:    0 Nat

                                                   4 Joe

since 0 mod 4 = 0 and 4 mod 4 = 0;

in this way, there is no data in 000001_0;


Command:

create table stu_buck(sno int,sname string,sex string,sage int,sdept string)
clustered by(sno) sorted by(sno DESC) into 4 buckets
row format delimited fields terminated by ',';

set hive.enforce.bucketing = true;
insert into table stu_buck
select sno,sname,sex,sage,sdept from student distribute by(sno) sort by(sno asc);


四.Beeline:

1.hiveserver2

2.(1) beeline -u jdbc:hive2://hadoop:10000 -n root

(2) beeline --> !connect jdbc:hive2//hadoop:10000


五.Group by:

select * from test;

select  id , sum(count) from test group by id;


Hadoop(Day09) -- Hive

Hadoop(Day09) -- Hive