gStore 之 VS*-Tree
gStore:
gStore是由北京大学计算机所数据管理实验室研发面向RDF知识图谱的开源图数据库系统(通常称为Triple Store)。不同于传统基于关系数据库的知识图谱数据管理方法,gStore是一种原生基于图数据模型(Native Graph Model)的RDF数据管理系统,维持了原始RDF知识图谱的图结构;其数据模型是有标签、有向的多边图,每个顶点对应着一个主体或客体。我们将面向RDF的SPARQL查询,转换为面向RDF图的子图匹配查询,利用所提出的基于图结构的索引(VS*-tree)来加速查询的性能。
gStore 拥有以下特性:
- gStore从图数据库角度存储和检索RDF知识图谱数据;
- gStore支持W3C定义的SPARQL 1.1标准,包括含有Union,OPTIONAL,FILTER和聚集函数的查询;gStore支持有效的增删改操作;
- gStore单机可以支持1Billion(十亿)三元组规模的RDF知识图谱的数据管理任务;
我们组准备将用户商品评论数据使用gStore存储,在此基础上开发出适合用于检索用户特性的功能,这篇博客是对gStore中用于加速查询效率的VS*-tree树进行介绍,内容来自北大邹磊教授2014 VLDB论文《gStore:a graph-based SPARQL query engine》
(博客虽然大多英文,但稍有总结概述)
VS*-tree:
VS*-tree is 是一个作用于G*上的索引结构,用于高效地完成SPARQL查询,即找到Q*在G*上的matches;
Background:
Definition of match of Q* over G*:Q*= {v1… vn},A set of n distinct vertices {u1… un} in G*;
Solution:
There is a straightforward method consisting of two steps,its first step is a classical inclusion query and the second step is a multiway join process;
Problem need to be solved:reduce the search space for inclusion query
S-tree is proposed:
- A height-balanced tree similar to B+ tree;
- Each intermediate node is formed by ORing all child signatures in S-tree;
Problem need to be solved: S-tree cannot support the second step, which is a NP-hard;
VS*-tree(based on S-tree) is proposed:
VS*-tree is a multi-resolution summary graph based on S-tree that can be used to reduce the search space of subgraph query processing;
Properties:
- Each path from the root to any leaf node has the same length h, which is the height of the VS∗-tree;
- Each leaf node corresponds to a vertex in G∗ and has a pointer to it.
- … …
… …
The example of building “super-edge”:
Definition of summary match of Q* over G^I:
Function:
Summary matches can be used to reduce the search space for subgraph search over G*;
Problem need to be solved:
The vertex encoding strategy may lead to some vertex signatures having too many 1,these
vertices can match any query vertex signature;
Solution:
We partition all of u’s neighbors into n groups {g1,…, gn} and each group gi corresponds to one instance of vertex u, denoted as u[i];