The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs

论文传送门
视频

作者

  • Maoyuan Sun
  • Jian Zhao
  • Hao Wu
  • Kurt Luther
  • Chris North
  • Naren Ramakrishnan

摘要

探索协调关系(例如两组实体之间的共享关系)是现实中各种应用程序中重要的分析任务,比如在生物信息学中发现行为相似的基因,检测网络安全中的恶意软件合谋以及确定营销中的产品包。协调的关系可以形式化为二元组。为了支持对双聚类的可视探索,已有基于二部图的可视化,并使用边捆绑来显示双聚类。但是由于双峰可能重叠,因此会导致边交叉,并且对用户在二部图中探索双峰的影响还缺乏深入的理解。为了解决这一问题,我们提出了一种基于双聚类的序列技术,该技术可以减少二部图中的边交叉。本文进行了用户实验研究边捆绑的效果,并提出了双聚类的可视化。我们发现边捆绑可以帮助用户找到更合理的答案。此外,我们确定了四个关键的权衡取舍,这些权衡可以为将来的双集群可视化设计提供参考。研究结果表明,边捆绑对于探索二部图中双聚类至关重要,这有助于减少低级的感知问题并支持高级别的推理。

Introduction

Coordinated relationship exploration
bicluster: a grouped relationships between two sets of entities (e.g., persons and locations), where each entity in one set is related to all entities in the other

Trade-off

  • relationship-centric
  • relationship-centric

BiSet

本文贡献:

  • 提出了一个新颖的双聚类顺序排列技术
  • 对用户实验进行了详细的研究设计
  • 提出了四点关键的权衡
  • 发现边绑定对于探索二部图中的双聚类至关重要

Background

  • Bicluster
    • CHARM
    • LCM
  • BiSet
  • Seriation: 使得模式能够更好地被揭示的排列顺序 (heuristically)
    • Bertifier
    • BiVoc
    • Termite
  • Related Evaluation
    • Matrix
    • Edge bundling

Seriation in Bipartite Graphs

The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs

  • Biadjacency matrices preparation
    • 构造两个 entity-list 到 bicluster-list 的邻接矩阵
  • Matices fusion
    • 将两个矩阵拼到一起
  • Seriation on a fused matrix
    • 对融合矩阵进行 Correspondence Analysis,得到 seriated order
  • Local order generation
    • 根据类别划分,保持全局顺序不变
  • Visual mapping
    • bicluster 的位置由它所链接的 entities 的平均位置决定

User Experiment Design Rational

三个问题:

  • 计算出的biclusters是如何帮助用户发现复杂的 domain specific biclusters?
  • 与传统的视图比较,这种方法有助于改善用户在探索bicluster的表现吗?
  • 有没有trade-offs

用户任务设计

  • closed biclusters
    • 算法得出
  • merged biclusters
    • 需要领域知识

Factors Affecting Task Complexity

  • The entity and group level factor: entity number
  • The bicluster level factors: size, overlap and number
  • The chain and schema level factor: domain number

Evaluation

  • Participants and apparatus
    • 20 位研究生,9男11女,年龄24-33,来自不同专业
    • 15.4-inch Macbook Pro
    • a mouse and a keyboard
  • Synthetic data
  • Task
    • Working experience based on companies that they worked for
    • Travel preference based on their travel history
    • Shopping style based on their shopping records
    • Learning interests based on the courses they have taken

The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs

Visualization and User Interaction

  • Highlight Propagation

The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs
Data Collation

  • interaction logs (time stamp, interaction type, target object type and target object ID)
    • mouse-over or out an entity or a bicluster
    • selecting or unselecting an entity or a bicluster
    • adding or removing an entity to or from answers
  • screen recording
  • observations
  • interviews

Measures and metrics

  • Variance of Findings
  • Accuracy of Findings
  • Connection Based Envidence
  • Inference Based Evidence
  • Exploration Cost

The Effect of Edge Bundling and Seriation on Sensemaking of Biclusters in Bipartite Graphs

User Performance Results

在相同的顺序的条件下,边捆绑显著减少了entity的访问,产生了更合理的结果
除了减少entity的访问,顺序对于答案准确率和时间耗费没有影响
在随机顺序的条件下,边捆绑使用户更倾向于发现closed biclusters, merged cluster 的发现率则较低
无论是边绑定还是顺序排列都不影响找到合理答案的时间,这意味着除了entity 访问外,其他因素(如布局)可能会影响时间耗费。

Four Trade-Offs

  • View Simplicity versus Task Complexity
  • Similarity: Connection-Based versus Semantic-Driven
  • Connectedness versus Coordinatedness
  • Highlight Propagation Driven by: Entity versus Bundle

思考

Critical thinking
对于高密度的网络,边绑定的效果可能不如矩阵形式展示好

Creative thinking
对不同规模的图进行分析比较

How to apply it to our work
可以采用 BiSet 和 Seriation 的方法来简化二部图
思考能否扩展到多部图