The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs
作者
- Maoyuan Sun
- Jian Zhao
- Hao Wu
- Kurt Luther
- Chris North
- Naren Ramakrishnan
摘要
探索协调关系(例如两组实体之间的共享关系)是现实中各种应用程序中重要的分析任务,比如在生物信息学中发现行为相似的基因,检测网络安全中的恶意软件合谋以及确定营销中的产品包。协调的关系可以形式化为二元组。为了支持对双聚类的可视探索,已有基于二部图的可视化,并使用边捆绑来显示双聚类。但是由于双峰可能重叠,因此会导致边交叉,并且对用户在二部图中探索双峰的影响还缺乏深入的理解。为了解决这一问题,我们提出了一种基于双聚类的序列技术,该技术可以减少二部图中的边交叉。本文进行了用户实验研究边捆绑的效果,并提出了双聚类的可视化。我们发现边捆绑可以帮助用户找到更合理的答案。此外,我们确定了四个关键的权衡取舍,这些权衡可以为将来的双集群可视化设计提供参考。研究结果表明,边捆绑对于探索二部图中双聚类至关重要,这有助于减少低级的感知问题并支持高级别的推理。
Introduction
Coordinated relationship exploration
bicluster: a grouped relationships between two sets of entities (e.g., persons and locations), where each entity in one set is related to all entities in the other
Trade-off
- relationship-centric
- relationship-centric
BiSet
本文贡献:
- 提出了一个新颖的双聚类顺序排列技术
- 对用户实验进行了详细的研究设计
- 提出了四点关键的权衡
- 发现边绑定对于探索二部图中的双聚类至关重要
Background
- Bicluster
- CHARM
- LCM
- BiSet
- Seriation: 使得模式能够更好地被揭示的排列顺序 (heuristically)
- Bertifier
- BiVoc
- Termite
- Related Evaluation
- Matrix
- Edge bundling
Seriation in Bipartite Graphs
- Biadjacency matrices preparation
- 构造两个 entity-list 到 bicluster-list 的邻接矩阵
- Matices fusion
- 将两个矩阵拼到一起
- Seriation on a fused matrix
- 对融合矩阵进行 Correspondence Analysis,得到 seriated order
- Local order generation
- 根据类别划分,保持全局顺序不变
- Visual mapping
- bicluster 的位置由它所链接的 entities 的平均位置决定
User Experiment Design Rational
三个问题:
- 计算出的biclusters是如何帮助用户发现复杂的 domain specific biclusters?
- 与传统的视图比较,这种方法有助于改善用户在探索bicluster的表现吗?
- 有没有trade-offs
用户任务设计
- closed biclusters
- 算法得出
- merged biclusters
- 需要领域知识
Factors Affecting Task Complexity
- The entity and group level factor: entity number
- The bicluster level factors: size, overlap and number
- The chain and schema level factor: domain number
Evaluation
- Participants and apparatus
- 20 位研究生,9男11女,年龄24-33,来自不同专业
- 15.4-inch Macbook Pro
- a mouse and a keyboard
- Synthetic data
- Task
- Working experience based on companies that they worked for
- Travel preference based on their travel history
- Shopping style based on their shopping records
- Learning interests based on the courses they have taken
Visualization and User Interaction
- Highlight Propagation
Data Collation
- interaction logs (time stamp, interaction type, target object type and target object ID)
- mouse-over or out an entity or a bicluster
- selecting or unselecting an entity or a bicluster
- adding or removing an entity to or from answers
- screen recording
- observations
- interviews
Measures and metrics
- Variance of Findings
- Accuracy of Findings
- Connection Based Envidence
- Inference Based Evidence
- Exploration Cost
User Performance Results
在相同的顺序的条件下,边捆绑显著减少了entity的访问,产生了更合理的结果
除了减少entity的访问,顺序对于答案准确率和时间耗费没有影响
在随机顺序的条件下,边捆绑使用户更倾向于发现closed biclusters, merged cluster 的发现率则较低
无论是边绑定还是顺序排列都不影响找到合理答案的时间,这意味着除了entity 访问外,其他因素(如布局)可能会影响时间耗费。
Four Trade-Offs
- View Simplicity versus Task Complexity
- Similarity: Connection-Based versus Semantic-Driven
- Connectedness versus Coordinatedness
- Highlight Propagation Driven by: Entity versus Bundle
思考
Critical thinking
对于高密度的网络,边绑定的效果可能不如矩阵形式展示好
Creative thinking
对不同规模的图进行分析比较
How to apply it to our work
可以采用 BiSet 和 Seriation 的方法来简化二部图
思考能否扩展到多部图