您的位置: 首页 > 文章 > hive 自定义函数 udf

hive 自定义函数 udf

分类: 文章 • 2024-06-02 08:19:34

为什么80%的码农都做不了架构师？>>> hive 自定义函数 udf

1：编写 udf 函数 , eclipse build path 导入依赖包 hive-exec-xxxx.jar

hive 自定义函数 udf

依赖包尽量和生产环境保持一致，不要开发环境依赖这个版本，测试环境依赖那个版本，生产再依赖另外一个版本，如果出现问题只能呵呵呵了。

2：编译java文件，生成class文件，

/usr/java/jdk1.7.0_67-cloudera/bin/javac -cp /home/cloudreajiaxiu/cm-5.13.0/share/cmf/common_jars/hive-exec-1.1.0-cdh5.4.0.jar source/AddRepayNum.java -d classes/

-d 指定编译后class文件存放的位置

hive 自定义函数 udf

如果觉得 javac命令麻烦，可以在eclipse中进行编译。

编写一个main函数，调用下evaluate

hive 自定义函数 udf

执行Main函数，项目bin目录下应该会生成相应的class文件

hive 自定义函数 udf

3：打包

hive 自定义函数 udf

jar cvfM target/com.lizhifeng.hive.jar -C ./classes/

4: 测试自定义函数

CREATE temporary FUNCTION add_repay_num AS 'com.lizhifeng.hive.AddRepayNum' USING JAR '/root/com.lizhifeng.hive.jar';

5：测试没有问题后将jar上传到 hdfs

hadoop fs -put /root/com.lizhifeng.hive.jar /hivefunction/

6：创建永久自定义函数

CREATE FUNCTION add_repay_num AS 'com.lizhifeng.hive.AddRepayNum' USING JAR 'hdfs://nameservice1/hivefunction/com.lizhifeng.hive.jar';

遇到一个十分奇葩的问题

hive 自定义函数 udf

同一个函数 root用户和其他用户能正常使用，但是hadoop 用户不能够使用！！！！！！

java 环境一致。

hive 自定义函数 udf

hive 内置的函数在 package org.apache.hadoop.hive.ql.udf下，想看看min和avg函数是如何实现的，结果懵逼了

@Description 函数说明。

hive 自定义函数 udf

hive 自定义函数 udf

hive 自定义函数 udf

统计文件列数

head -1 192836473812121274201711133372_contract_state.csv | awk -F'^' '{print NF}'

统计文件行数

wc -l 192836473812121274201711133372_contract_state.csv | awk '{print $1}'

转载于:https://my.oschina.net/qidis/blog/1576054