蜂巢查询 - 三个加盟与操作条件或连接两个表
问题描述:
我面临着一个错误蜂巢查询 - 三个加盟与操作条件或连接两个表
“FAILED: Error in semantic analysis: Line 1:101 OR not supported in JOIN currently dob
”
在运行下面提到的查询..
Insert Overwrite Local Directory './Insurance_Risk/Merged_Data' Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation from sample_member_detail s Join fb_member_detail f
On s.email=f.email or
s.dob=f.dob
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation)
where s.email is not null and f.email is not null;
谁能告诉我认为,在蜂房“OR
”运营商可以使用或不? 如果不是,那么应该是什么样的查询将给出与上述查询给出的结果相同的结果。 我有2个表,我想加入两个表中的任何一个与运算符的三个条件。 请帮忙..
答
对不起Hive只支持equi连接。你可以尝试从(你必须在非严格模式)这些表的全笛卡尔积选择:
Select f.name,s.age,f.gender,f.loc,f.marital_status,f.habits1,f.habits2,s.employement_status,s.occupation_class,s.occupation_subclass,s.occupation
from sample_member_detail s join fb_member_detail f
where (s.email=f.email
or s.dob=f.dob
or (f.name=s.name and f.loc = s.loc and f.occupation=s.occupation))
and s.email is not null and f.email is not null;
答
你也可以使用UNION来得到相同的结果:
INSERT OVERWRITE LOCAL DIRECTORY './Insurance_Risk/Merged_Data'
-- You can only UNION on subqueries
SELECT * FROM (
SELECT f.name,
s.age,
f.gender,
f.loc,
f.marital_status,
f.habits1,
f.habits2,
s.employement_status,
s.occupation_class,
s.occupation_subclass,
s.occupation
FROM sample_member_detail s
JOIN fb_member_detail f
ON s.email=f.email
WHERE s.email IS NOT NULL AND f.email IS NOT NULL;
UNION
SELECT f.name,
s.age,
f.gender,
f.loc,
f.marital_status,
f.habits1,
f.habits2,
s.employement_status,
s.occupation_class,
s.occupation_subclass,
s.occupation
FROM sample_member_detail s
JOIN fb_member_detail f
ON s.dob=f.dob
WHERE s.email IS NOT NULL AND f.email IS NOT NULL;
UNION
SELECT f.name,
s.age,
f.gender,
f.loc,
f.marital_status,
f.habits1,
f.habits2,
s.employement_status,
s.occupation_class,
s.occupation_subclass,
s.occupation
FROM sample_member_detail s
JOIN fb_member_detail f
ON f.name=s.name AND f.loc = s.loc AND f.occupation=s.occupation
WHERE s.email IS NOT NULL AND f.email IS NOT NULL;
) subquery;
你会必须在外层添加_distinct_以获得相同的结果。否则,您将得到满足多个条件的行的重复项。 – 2014-05-02 18:34:06