导航：首页 > 服务器 >

HBase Compaction算法之ExploringCompactionPolicy怎么用

发表于：2025-02-02 作者：千家信息网编辑

千家信息网最后更新 2025年02月02日，这篇文章主要为大家展示了"HBase Compaction算法之ExploringCompactionPolicy怎么用"，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究

千家信息网最后更新 2025年02月02日HBase Compaction算法之ExploringCompactionPolicy怎么用

这篇文章主要为大家展示了"HBase Compaction算法之ExploringCompactionPolicy怎么用"，内容简而易懂，条理清晰，希望能够帮助大家解决疑惑，下面让小编带领大家一起研究并学习一下"HBase Compaction算法之ExploringCompactionPolicy怎么用"这篇文章吧。

在0.98版本中，默认的compaction算法换成了ExploringCompactionPolicy，之前是RatioBasedCompactionPolicy

ExploringCompactionPolicy继承RatioBasedCompactionPolicy，重写了applyCompactionPolicy方法，applyCompactionPolicy是对minor compaction的选择文件的策略算法。

applyCompactionPolicy方法内容：

public List applyCompactionPolicy(final List candidates,       boolean mightBeStuck, boolean mayUseOffPeak, int minFiles, int maxFiles) {    //此ratio为后面算法使用，可设置非高峰时间段的ratio（默认:5.0）从而合并更多的数据    final double currentRatio = mayUseOffPeak        ? comConf.getCompactionRatioOffPeak() : comConf.getCompactionRatio();    // Start off choosing nothing.    List bestSelection = new ArrayList(0);    List smallest = mightBeStuck ? new ArrayList(0) : null;    long bestSize = 0;    long smallestSize = Long.MAX_VALUE;    int opts = 0, optsInRatio = 0, bestStart = -1; // for debug logging    // Consider every starting place.    for (int start = 0; start < candidates.size(); start++) {      // Consider every different sub list permutation in between start and end with min files.      for (int currentEnd = start + minFiles - 1;          currentEnd < candidates.size(); currentEnd++) {        List potentialMatchFiles = candidates.subList(start, currentEnd + 1);        // Sanity checks        if (potentialMatchFiles.size() < minFiles) {          continue;        }        if (potentialMatchFiles.size() > maxFiles) {          continue;        }        // Compute the total size of files that will        // have to be read if this set of files is compacted.        long size = getTotalStoreSize(potentialMatchFiles);        // Store the smallest set of files.  This stored set of files will be used        // if it looks like the algorithm is stuck.        if (mightBeStuck && size < smallestSize) {          smallest = potentialMatchFiles;          smallestSize = size;        }        if (size > comConf.getMaxCompactSize()) {          continue;        }        ++opts;        if (size >= comConf.getMinCompactSize()            && !filesInRatio(potentialMatchFiles, currentRatio)) {          continue;        }        ++optsInRatio;        if (isBetterSelection(bestSelection, bestSize, potentialMatchFiles, size, mightBeStuck)) {          bestSelection = potentialMatchFiles;          bestSize = size;          bestStart = start;        }      }    }    if (bestSelection.size() == 0 && mightBeStuck) {      LOG.debug("Exploring compaction algorithm has selected " + smallest.size()          + " files of size "+ smallestSize + " because the store might be stuck");      return new ArrayList(smallest);    }    LOG.debug("Exploring compaction algorithm has selected " + bestSelection.size()        + " files of size " + bestSize + " starting at candidate #" + bestStart +        " after considering " + opts + " permutations with " + optsInRatio + " in ratio");    return new ArrayList(bestSelection);

从代码得知，主要算法如下：

从头到尾遍历文件，判断所有符合条件的组合
选择组合的文件数必须 >= minFiles（默认值：3）
选择组合的文件数必须 <= maxFiles（默认值：10）
计算组合的文件总大小size，size必须 <= maxCompactSize(通过hbase.hstore.compaction.max.size配置，默认值：LONG.MAX_VALUE，相当于没起作用，官方文档里面说只有觉得compaction经常发生并且没有多大的用时，可以修改这个值)
组合的文件大小 < minCompactSize 则是符合要求，如果 >= minCompactSize ，还需要判断filesInRatio

filesInRatio算法：FileSize(i) <= ( Sum(0,N,FileSize(_)) - FileSize(i) ) * Ratio，也就是说组合里面的所有单个文件大小都必须满足 singleFileSize <= (totalFileSize - singleFileSize) * currentRatio，此算法的意义是为了限制太大的compaction，选择出来的文件不至于有一个很大的，应该尽可能先合并一些小的大小相差不大的文件，代码如下

private boolean filesInRatio(final List files, final double currentRatio) {    if (files.size() < 2) {      return true;    }    long totalFileSize = getTotalStoreSize(files);    for (StoreFile file : files) {      long singleFileSize = file.getReader().length();      long sumAllOtherFileSizes = totalFileSize - singleFileSize;      if (singleFileSize > sumAllOtherFileSizes * currentRatio) {        return false;      }    }    return true;  }

寻找最有解，优先选择文件组合文件数多的，当文件数一样多时选择文件数小的，此目的是为了尽可能合并更多的文件并且产生的IO越少越好

private boolean isBetterSelection(List bestSelection,      long bestSize, List selection, long size, boolean mightBeStuck) {    if (mightBeStuck && bestSize > 0 && size > 0) {      // Keep the selection that removes most files for least size. That penaltizes adding      // large files to compaction, but not small files, so we don't become totally inefficient      // (might want to tweak that in future). Also, given the current order of looking at      // permutations, prefer earlier files and smaller selection if the difference is small.      final double REPLACE_IF_BETTER_BY = 1.05;      double thresholdQuality = ((double)bestSelection.size() / bestSize) * REPLACE_IF_BETTER_BY;      return thresholdQuality < ((double)selection.size() / size);    }    // Keep if this gets rid of more files.  Or the same number of files for less io.    return selection.size() > bestSelection.size()      || (selection.size() == bestSelection.size() && size < bestSize);  }

主要算法至此结束，下面说说其他细节及其优化部分：

步骤6的ratio默认值是1.2，但是打开了非高峰时间段的优化时，可以有不同的值，非高峰的ratio默认值是5.0，此优化目的是为了在业务低估时可以合并更多的数据，目前此优化只能是天的小说时间段，还不算灵活。

算法中关于mightBeStuck的逻辑部分，这个参数是用来表示是否有可能compaction会被卡住，它的状态是待选文件数 - 正在做compaction的文件数 + futureFiles(默认值是0，有正在做compaction的文件时是1) >= hbase.hstore.blockingStoreFiles （默认是10，此配置在flush中也会用到，以后分析flush的时候会补充），如果是true时：

选择文件算法还会去寻找一个最小解。在上文步骤4之前，会记录一个文件大小最小的组合
isBetterSelection部分，算法改为 (bestSelection.size() / bestSize) * 1.05 < selection.size() / size，通过文件大小和文件数的比值去选择一个合适的解
返回结果时，没有合适的最优解或返回一个最小解。

mightBeStuck的优化部分，相当于保证在很多的文件数的情况下，也可以选出一个最小解去做compaction，而不用再让文件继续增长下去直到有一个合适的组合出现。

此算法跟RatioBasedCompactionPolicy的区别，简单的说就是RatioBasedCompactionPolicy是简单的从头到尾遍历StoreFile列表，遇到一个符合Ratio条件的序列就选定执行Compaction。而ExploringCompactionPolicy则是从头到尾遍历的同时记录下当前最优，然后从中选择一个全局最优列表。

以上是"HBase Compaction算法之ExploringCompactionPolicy怎么用"这篇文章的所有内容，感谢各位的阅读！相信大家都有了一定的了解，希望分享的内容对大家有所帮助，如果还想学习更多知识，欢迎关注行业资讯频道！

很赞哦！