FileSystem

Firstly:

FileSystem---Method---->(FsDatInputStream)open()；

open:get the input stream,

FsDatainputStream----extends---DataInputStream，

DataInputStream----extends---InputStream。

Anatomy of a File Read

1.The client opens the file by calling open() on the Filesystem object, which is an instance

of DistributedFileSystem for HDFS.

2.DistributedFileSystem calls the namenode ,using PRC,to determine the locations of the blocks for the first few blocks in the file.For each block, the namenode returns the addresses of datanodes that have a copy of that block.

1.The DistributedFileSystem returns an FSDataInputStreamto the client for it to read data from.FSDataInputStream wraps a DFSInputStream ,which manages the datanode and namenode I/O.

3.The client then calls read() on the stream. DFSInputStream ,which has sorted the

datanode addresses for the first few blocks in the file,then connects to the first closest

datanode for the first block in the file.

4.Data is streamed from the datanode back the client ,which calls read() repeatedly

on the stream .When the end of the block is reached ,DFSInputStream will close the

connection to the datanode.

5.DFSInputStream find the best datanode for the next block.

6.Blocks are read in order,with the DFSInputStream opening new connections to datanodes as the client reads through the stream.It will also call the namenode to retrieve the datanode locations for the next batch(批) of blocks as needed.When the client has

finished reading,it calls close() on the FSDataInputStream.

In sum,the client contacts datanodes directly to retrieve data and is guided by the namenode to the best datanode for each block.

Secondly

FileSystem---Method---->(FsDataOutputStream)create()；

create:return the output stream

FsDataOutputStream----extends---DataOutputStream，

DataOutputStream----extends---OutputStream。

Anatomy of a File Write

1.The client creats the file by calling create() on DistributedFileSystem.

2.DiatributedFileSystem makes an RPC call to the namenode to create a new file

in the filesystem's namespace,with no blocks associated with it.The namenode makes

a record of the new file.

1.The DistributedFilesystem returns an FSDataOutputStream for the client to start writing

data to.

3.Just as in the read case,FSDataOutputStream wraps a DFSOutputStream,which handles communication with the datanodes and namenode.

4.As the client writes data,DFSOutputStream splits it into packets ,which it writes to an

internal queue,called the data queue.The data queue is consumed by theDataStreamer,

which is resopnsible for asking the namenodes to allocate new blocks by picking a list of

suitable datanodes to store the replics.The list of datanodes forms a pipeline, and here we will assume the replication level is three,so there are three nodes in the pipeline.The DataStreamer streames the pcakets to the first datanode in the pipeline,which stores the packet and forwads it to the second datanode in the pipeline.similarly,the second stores the packet and forwads it to the third datanode in the pipeline.

5.DFSOutputStream also maintains an internal queue of packets that are waiting

acknowledged by datanodes,called the ack queue.A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.

6.When the client has finsihed writing data,it calls close() on the stream.

7.This action flushes all the remaining packets to the datanode pipeline and waits for

acknowledements before contacting the namenode to signal that the file is complete.The

namenode already konws which blocks the file is made up of (via Data Streamer

asking forblock allocations),so it only has to wait for blocks to beminimally replicated

before returning successfully.

转载于:https://blog.51cto.com/8841087/1400225

相关推荐