HDFS中create函数的作用是什么
HDFS中create函数的作用是什么,很多新手对此不是很清楚,为了帮助大家解决这个难题,下面小编将为大家详细讲解,有这方面需求的人可以来学习下,希望你能有所收获。
client通过exists()函数得知目前的namenode那边不存在此文件后,
则通过namenode.create函数创建一个文件。具体细节如下:
这里意味着:clientMachine的clientName创建了src文件。
clientMachine只用来选择目标DataNode.
public LocatedBlock create(String src, String clientName, String clientMachine, boolean overwrite) throws IOException {
Object results[] = namesystem.startFile(new UTF8(src), new UTF8(clientName), new UTF8(clientMachine), overwrite);//调用文件系统的startFile函数,返回值为block信息和目标datanode信息
if (results == null) {
throw new IOException("Cannot create file " + src + " on client " + clientName);
} else {
Block b = (Block) results[0];//取回block
DatanodeInfo targets[] = (DatanodeInfo[]) results[1];//获取DatanodeInfo数组信息
return new LocatedBlock(b, targets);//组合返回最终信息
}
}
====================================
下面开始学习
public synchronized Object[] startFile(UTF8 src, UTF8 holder, UTF8 clientMachine, boolean overwrite) {
对此函数的分析如下:
public synchronized Object[] startFile(UTF8 src, UTF8 holder, UTF8 clientMachine, boolean overwrite) {
//背景知识:参数有holder和clientMachine.比如一个例子如下:
Holder:DFS_CLIENT_xxxx
clientMachine:Machine66.
也就是说一个clientMachine上面可以有多个Holder.
一个clientMachine上的Holder发出了一个上传的请求。
下面的代码中哪里用到了holder和哪里用到了clientMachine,
还请读者自己注意思考。
Object results[] = null;
if (pendingCreates.get(src) == null) {//说明pendingCreates记录了正在创建的文件
boolean fileValid = dir.isValidToCreate(src);//文件路径也确实不存在,需要这一句吗?
if (overwrite && ! fileValid) {//如果可以覆盖的话,目前都是不可以覆盖
delete(src);
fileValid = true;
}
if (fileValid) {//确实可以的话,继续执行
results = new Object[2];//创建返回结果的数组
// Get the array of replication targets
DatanodeInfo targets[] = chooseTargets(this.desiredReplication, null, clientMachine);
//根据clientMachine和备份数目选择多个目标datanode
if (targets.length < this.minReplication) {
LOG.warning("Target-length is " + targets.length +
", below MIN_REPLICATION (" + this.minReplication+ ")");
return null;
}//如果长度达不到备份数,则返回失败
// Reserve space for this pending file
pendingCreates.put(src, new Vector());//表明这个文件正在create!!!
synchronized (leases) {//开始处理租约系统
Lease lease = (Lease) leases.get(holder);//查找租约系统
if (lease == null) {//如果不存在
lease = new Lease(holder);//创建
leases.put(holder, lease);//存储到leases
sortedLeases.add(lease);//存储到sortedLeases
} else {//如果存在的话,则lease本身刷新时间且重新加入到sortedLeases.
//注意,这里有一个sort过程。
sortedLeases.remove(lease);
lease.renew();
sortedLeases.add(lease);
}
lease.startedCreate(src);//lease的本身creates保存了文件名
}
// Create next block
results[0] = allocateBlock(src);//主要是记录文件对应的Block信息
results[1] = targets;//分配的datanode信息
} else { // ! fileValid
LOG.warning("Cannot start file because it is invalid. src=" + src);
}
} else {
LOG.warning("Cannot start file because pendingCreates is non-null. src=" + src);
}
return results;//返回结果!
}
-------------------------------------------------------------
DatanodeInfo[] chooseTargets(int desiredReplicates, TreeSet forbiddenNodes, UTF8 clientMachine) {
TreeSet alreadyChosen = new TreeSet();//初始化空的已经选择的机器
Vector targets = new Vector();//真的无语。这里为啥还要再创建一个targets,浪费内存,直接传到chooseTarget一样的好吧!崩溃!
for (int i = 0; i < desiredReplicates; i++) {//根据备份数来选择执行次数
DatanodeInfo target = chooseTarget(forbiddenNodes, alreadyChosen, clientMachine);//选择单个机器
if (target != null) {//选择好了一个,就加到targets和alreadyChosen.崩溃,加2次有啥意思!!!
targets.add(target);
alreadyChosen.add(target);
} else {
break; // calling chooseTarget again won't help
}
}
return (DatanodeInfo[]) targets.toArray(new DatanodeInfo[targets.size()]);//返回执行的结果
}
---------------
=======================
DatanodeInfo chooseTarget(TreeSet forbidden1, TreeSet forbidden2, UTF8 clientMachine) {
//
// Check if there are any available targets at all
//
int totalMachines = datanodeMap.size();//获取当前已知的所有数据节点个数
if (totalMachines == 0) {//为0就不用说了,返回null
LOG.warning("While choosing target, totalMachines is " + totalMachines);
return null;
}
//
// Build a map of forbidden hostnames from the two forbidden sets.
//
TreeSet forbiddenMachines = new TreeSet();
if (forbidden1 != null) {//这里forbidden1是初始化禁止的节点,此处为null
for (Iterator it = forbidden1.iterator(); it.hasNext(); ) {
DatanodeInfo cur = (DatanodeInfo) it.next();
forbiddenMachines.add(cur.getHost());
}
}
if (forbidden2 != null) {//是已经选择的节点,因为已经选择的就不会再返回了,你懂的
for (Iterator it = forbidden2.iterator(); it.hasNext(); ) {
DatanodeInfo cur = (DatanodeInfo) it.next();
forbiddenMachines.add(cur.getHost());
}
}
//
// Build list of machines we can actually choose from
//
Vector targetList = new Vector();//从总的节点中去掉不可以选择的节点,得到剩下的可选的节点
for (Iterator it = datanodeMap.values().iterator(); it.hasNext(); ) {
DatanodeInfo node = (DatanodeInfo) it.next();
if (! forbiddenMachines.contains(node.getHost())) {
targetList.add(node);
}
}
Collections.shuffle(targetList);//本来不知道干嘛的,百度了一下,用来洗牌的
//为啥?因为DFSShell采用计算机组成原理的菊花链的方式来上传数据。剩下的我就不用解释了
//
// Now pick one
//
if (targetList.size() > 0) {//如果还剩下确实可以选择的节点,并且clientMachine也在里面
//并且容量大于5块,就直接返回clientMachine.我猜是为了本地加速
//毕竟上传到本地和上传到远程主机是不一样的。
// If the requester's machine is in the targetList,
// and it's got the capacity, pick it.
//
if (clientMachine != null && clientMachine.getLength() > 0) {
for (Iterator it = targetList.iterator(); it.hasNext(); ) {
DatanodeInfo node = (DatanodeInfo) it.next();
if (clientMachine.equals(node.getHost())) {
if (node.getRemaining() > BLOCK_SIZE * MIN_BLOCKS_FOR_WRITE) {
return node;
}
}
}
}
//
// Otherwise, choose node according to target capacity
//否则,就从中选择一个容量大于5块的节点
for (Iterator it = targetList.iterator(); it.hasNext(); ) {
DatanodeInfo node = (DatanodeInfo) it.next();
if (node.getRemaining() > BLOCK_SIZE * MIN_BLOCKS_FOR_WRITE) {
return node;
}
}
//
// That should do the trick. But we might not be able
// to pick any node if the target was out of bytes. As
// a last resort, pick the first valid one we can find.
//否则,就选择一个至少大于1块的节点
for (Iterator it = targetList.iterator(); it.hasNext(); ) {
DatanodeInfo node = (DatanodeInfo) it.next();
if (node.getRemaining() > BLOCK_SIZE) {
return node;
}
}
LOG.warning("Could not find any nodes with sufficient capacity");
return null;//否则返回null
} else {
LOG.warning("Zero targets found, forbidden1.size=" +
( forbidden1 != null ? forbidden1.size() : 0 ) +
" forbidden2.size()=" +
( forbidden2 != null ? forbidden2.size() : 0 ));
return null;//一个可用来查找的节点都没有!
}
}
看完上述内容是否对您有帮助呢?如果还想对相关知识有进一步的了解或阅读更多相关文章,请关注行业资讯频道,感谢您对的支持。