FileSystem
Firstly:
FileSystem---Method---->(FsDatInputStream)open();
open:get the input stream,
FsDatainputStream----extends---DataInputStream,
DataInputStream----extends---InputStream。
Anatomy of a File Read
1.The client opens the file by calling open() on the Filesystem object, which is an instance
of DistributedFileSystem for HDFS.
2.DistributedFileSystem calls the namenode ,using PRC,to determine the locations of the blocks for the first few blocks in the file.For each block, the namenode returns the addresses of datanodes that have a copy of that block.
1.The DistributedFileSystem returns an FSDataInputStreamto the client for it to read data from.FSDataInputStream wraps a DFSInputStream ,which manages the datanode and namenode I/O.
3.The client then calls read() on the stream. DFSInputStream ,which has sorted the
datanode addresses for the first few blocks in the file,then connects to the first closest
datanode for the first block in the file.
4.Data is streamed from the datanode back the client ,which calls read() repeatedly
on the stream .When the end of the block is reached ,DFSInputStream will close the
connection to the datanode.
5.DFSInputStream find the best datanode for the next block.
6.Blocks are read in order,with the DFSInputStream opening new connections to datanodes as the client reads through the stream.It will also call the namenode to retrieve the datanode locations for the next batch(批) of blocks as needed.When the client has
finished reading,it calls close() on the FSDataInputStream.
In sum,the client contacts datanodes directly to retrieve data and is guided by the namenode to the best datanode for each block.
Secondly
FileSystem---Method---->(FsDataOutputStream)create();
create:return the output stream
FsDataOutputStream----extends---DataOutputStream,
DataOutputStream----extends---OutputStream。
Anatomy of a File Write
1.The client creats the file by calling create() on DistributedFileSystem.
2.DiatributedFileSystem makes an RPC call to the namenode to create a new file
in the filesystem's namespace,with no blocks associated with it.The namenode makes
a record of the new file.
1.The DistributedFilesystem returns an FSDataOutputStream for the client to start writing
data to.
3.Just as in the read case,FSDataOutputStream wraps a DFSOutputStream,which handles communication with the datanodes and namenode.
4.As the client writes data,DFSOutputStream splits it into packets ,which it writes to an
internal queue,called the data queue.The data queue is consumed by theDataStreamer,
which is resopnsible for asking the namenodes to allocate new blocks by picking a list of
suitable datanodes to store the replics.The list of datanodes forms a pipeline, and here we will assume the replication level is three,so there are three nodes in the pipeline.The DataStreamer streames the pcakets to the first datanode in the pipeline,which stores the packet and forwads it to the second datanode in the pipeline.similarly,the second stores the packet and forwads it to the third datanode in the pipeline.
5.DFSOutputStream also maintains an internal queue of packets that are waiting
acknowledged by datanodes,called the ack queue.A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
6.When the client has finsihed writing data,it calls close() on the stream.
7.This action flushes all the remaining packets to the datanode pipeline and waits for
acknowledements before contacting the namenode to signal that the file is complete.The
namenode already konws which blocks the file is made up of (via Data Streamer
asking forblock allocations),so it only has to wait for blocks to beminimally replicated
before returning successfully.
转载于:https://blog.51cto.com/8841087/1400225