HDFS中JavaAPI对文件的上传、查询
Ubuntu + Hadoop2.7.3集群搭建:https://blog.****.net/qq_38038143/article/details/83050840
Ubuntu配置Eclipse + Hadoop环境:https://blog.****.net/qq_38038143/article/details/83412196
操作环境:Hadoop集群,4个DataNode。
1.创建项目
注:在 Ubuntu 上的 eclipse 操作:
项目组成:
PutFile.java:上传本地文件到HDFS
代码:
package pack1;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* @author: Gu Yongtao
* @Description: HDFS
* @date: 2018年10月24日
* FileName: PutFile.java
*/
public class PutFile {
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration conf = new Configuration();
URI uri = new URI("hdfs://master:9000");
FileSystem fs = FileSystem.get(uri, conf);
// 本地文件
Path src = new Path("/home/hadoop/file");
//HDFS存放位置
Path dst = new Path("/");
fs.copyFromLocalFile(src, dst);
System.out.println("Upload to " + conf.get("fs.defaultFS"));
//相当于hdfs dfs -ls /
FileStatus files[] = fs.listStatus(dst);
for (FileStatus file:files) {
System.out.println(file.getPath());
}
}
}
TextFileDetail.java:查看文件详细信息
代码:
package pack1;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
/**
* @author: Gu Yongtao
* @Description:
* @date: 2018年10月26日 下午7:20:08
* @Filename: TextFileDetail.java
*/
public class TextFileDetail {
public static void main(String[] args) throws IOException, URISyntaxException {
FileSystem fileSystem = FileSystem.get(new URI("hdfs://master:9000"), new Configuration());
Path fpPath = new Path("/file/english.txt");
FileStatus fileStatus = fileSystem.getFileStatus(fpPath);
/*
* 获取文件在HDFS集群的位置:
* FileStatus.getFileBlockLocation(FileSystem file, long start, long len)
* 查找指定文件在HDFS集群上的位置,file为文件完整路径,start和len为标识路径
*/
BlockLocation[] blockLocations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
fileStatus.getAccessTime();
// 输出块所在IP
for (int i=0; i<blockLocations.length; i++) {
String[] hosts = blockLocations[i].getHosts();
// 拥有备份
if (hosts.length>=2) {
System.out.println("--------"+"block_"+i+"_location's replications:"+"---------");
for (int j=0; j<hosts.length; j++) {
System.out.println("replication"+(j+1)+": "+hosts[j]);
}
System.out.println("------------------------------");
} else {// 没有备份
System.out.println("block_"+i+"_location: "+hosts[0]);
}
}
// 格式化输出日期
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
// 获取文件访问时间,返回long
long accessTime = fileStatus.getAccessTime();
System.out.println("access: "+formatter.format(accessTime));
// 获取文件修改时间,返long
long modificationTime = fileStatus.getModificationTime();
System.out.println("modification: "+formatter.format(modificationTime));
// 获取文件大小,单位B
long blockSize = fileStatus.getBlockSize();
System.out.println("blockSize: "+blockSize);
// 获取文件大小
long len = fileStatus.getLen();
System.out.println("length: "+len);
// 获取文件所在用户组
String group = fileStatus.getGroup();
System.out.println("group: "+group);
// 获取文件拥有者
String owner = fileStatus.getOwner();
System.out.println("owner: "+owner);
// 文件拷贝份数
short replication = fileStatus.getReplication();
System.out.println("replicatioin: "+replication);
}
}
log4j.properties:hadoop输出配置(控制警告、调试等)
代码:
# Configure logging for testing: optionally with log file
#可以设置级别:debug>info>error
#debug:可以显式debug,info,error
#info:可以显式info,error
#error:可以显式error
#log4j.rootLogger=debug,appender1
#log4j.rootLogger=info,appender1
log4j.rootLogger=error,appender1
#输出到控制台
log4j.appender.appender1=org.apache.log4j.ConsoleAppender
#样式为TTCCLayout
log4j.appender.appender1.layout=org.apache.log4j.TTCCLayout
2.执行:
查看HDFS系统:没有file目录
执行 PutFile.java 文件:
结果:
执行 TextFileDetail.java 文件:
结果:
--------block_0_location's replications:---------
replication1: slave4
replication2: slave6
replication3: slave5
------------------------------
--------block_1_location's replications:---------
replication1: slave6
replication2: slave5
replication3: slave
------------------------------
--------block_2_location's replications:---------
replication1: slave4
replication2: slave5
replication3: slave
------------------------------
--------block_3_location's replications:---------
replication1: slave6
replication2: slave5
replication3: slave4
------------------------------
--------block_4_location's replications:---------
replication1: slave4
replication2: slave
replication3: slave5
------------------------------
access: 2018-10-26 20:15:41
modification: 2018-10-26 20:18:27
blockSize: 134217728
length: 664734060
group: supergroup
owner: hadoop
replicatioin: 3
在浏览器端检验:
端口: http://master:50070
点击english.txt
从上图发现Block0,备份位置有 slave4, slave5, slve6.
与 TextFileDetail.java 运行结果相同:
--------block_0_location's replications:---------
replication1: slave4
replication2: slave6
replication3: slave5
------------------------------