MultiCore
http://www.cs.otago.ac.nz/staffpriv/hzy/publications.php
MultiCore的工作,由新西兰Otago大学的Huang Zhiyi教授领导,链接。
先是在TOPD上发表1,然后在2016年PCAF发表在B类会议ICPP(internation confe on parallel processing)2,最近(2018)扩展到A类期刊Transactions on Computers(TOC)上3。
Manycore的背景
2011-2012年左右,4中的作者在其论文PPT(下载)上放了一个Intel研究团队([email protected])的图。并提到,GPU和multicore CPU代表了更为巨大的计算能力,但是适合多核平台的算法和数据结构需要重新考虑。fundamental rethinking of algorithms and data structures
https://arstechnica.com/information-technology/2011/06/research-intel-day-2011-a-brief-glimpse-of-the-clouds-future-according-to-intel/
Cayton5提到,线序是一个比较适合并行的方法,
传统AkNN的问题
However, these approximate algorithms do not work
efficiently on multicore systems [11], [12] due to memory
latency and bandwidth issues (also known as the memory
wall). 这些方法(approximate algorithms)在多核系统上效率并不高6,7,因为有存储延迟和带宽问题(也被称为memory wall)
关于原因
8 提到,most of the existing algorithms are hard to parallelize either due to the sequential nature of the algorithm or due to the inherent complexity of the algorithm. ①算法本身是为顺序设计的,而且设计的太复杂
PCA-based filtering (PCAF)
PCA
主分量分析(Principal Components Analysis, PCA)是一种流行的降维算法,在某种环境下能够降低维度灾难(curse of dimensionality)的影响。PCA使用一种正交变换能够将一组各个分量十分相关的数据转化成一组各个分量线性不相关的数据,不相关的分量被称为主分量。
It uses an orthogonal transformation to convert a set of data values of possibly correlated variables into a set of data values of linearly uncorrelated variables called principal components.
对于一个高维数据集来说,如果维度之间的相关性很强,那么这个数据集的信息就能够被一小部分主分量代替,这些主分量可能只有非常低的维度。使用主分量,能够为一个数据集找到一个投影(数据集),它具有和其主分量相同的维度,同时包含了绝大部分原有数据集的信息。换句话说,如果在原始空间里面,查询点到某个参考点间距离很小,那么在投影空间二者之间的距离也会是相比例的很小的(proportionally small)。
在很多实际应用中,特征维度或多或少都会是有一些相关的9
PCAF
PCAF使用主分量下的映射值来估计查询点与各个参考点距离的一种排序。注意,我们估计的是距离的排序,而不是原始距离,因为依据投影值计算出的距离和原始距离不一样,它属于一个低维空间。由于主分量是决定原始点间相互关系的主要因素,那么基于相关投影计算出的排序应当十分类似于原始空间下的距离排序,不过前提是主分量降维没有丢掉太多数据。
评价与问题
详细看了简介、方法和部署,看了部分实验。感觉和问题如下:
- 感觉技术和方法上直观地很简单,当然肯定有他的道理,但是真的传统精妙没有突破吗?
- 数据集的难度普遍偏小,规模最大是SIFT128,56074个参考点;最高维度是HAR561维,7352个点;
- Fig 3,4的分析内存延迟的方法值得学习
下一步
- 看PCAF[J]
- 看代码,跑下实验;尤其数据集需要扩展下
- 综述这个领域其他方法
- 探究近邻图的潜力
PCAF[J]
扩充了实验,在GIST1M上,BF的Intel加速比为28,BF的AMD加速比为70
参考文献
- Tang X., Huang Z., Eyers D., Mills S., and Guo M., Scalable Multicore k-NN search via Subspace Clustering for Filtering, IEEE Transactions on Parallel and Distributed Computing Systems (TPDS), Vol. 26(12), DOI: 10.1109/TPDS.2014.2372755. ↩
- Huan Feng, David Eyers, Steven Mills, Yongwei Wu and Zhiyi Huang. PCAF: Scalable, High Precision k-NN Search Using Principal Component Analysis Based Filtering. In Proceedings of the 45th International Conference on Parallel Processing (ICPP 2016), 2016. ↩
- Feng H., Eyers D., Mills S., Wu Y., and Huang Z., Principal Component Analysis based Filtering for Scalable, High Precision k-NN Search, IEEE Transactions on Computers, Vol. 67:2, 2018. ↩
- L. Cayton, “Accelerating nearest neighbour search on manycore systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2012, pp. 402–413. ↩
- L. Cayton, “Accelerating nearest neighbour search on manycore systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2012, pp. 402–413. ↩
- M. Al Hasan, H. Yildirim, and A. Chakraborty, “Sonnet: Efficient approximate nearest neighbor using multi-core,” in Proc. IEEE 10th Int. Conf. Data Mining, 2010, pp. 719–724. ↩
- L. Cayton, “Accelerating nearest neighbour search on manycore systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2012, pp. 402–413. ↩
- M. Al Hasan, H. Yildirim, and A. Chakraborty, “Sonnet: Efficient approximate nearest neighbor using multi-core,” in Proc. IEEE 10th Int. Conf. Data Mining, 2010, pp. 719–724. ↩
- K. Mikolajczyk and C. Schmid. A performance evaluation of local de- scriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(10):1615–1630, 2005. ↩