缺陷定位之路在何方?论文阅读:Revisiting the practical use of automated software fault localization techniques
文章目录
前言
在此阅读缺陷定位论文:[IWPD 2017] Revisiting the practical use of automated software fault localization techniques。
在读缺陷定位论文:[Rui Abreu] [IJCAI 2018] Leveraging Qualitative Reasoning to Improve SFL 的时候,在参考文献中看到这篇文章,而我也想迫切的了解SFL为什么还不能投入工业应用,为什么一直处于研究阶段;也想知道当前SFL 到底还有什么缺陷。
1 基本信息
引用:
Ang A, Perez A, van Deursen A, et al. Revisiting the Practical Use of Automated Software Fault Localization Techniques[C]//Software Reliability Engineering Workshops (ISSREW), 2017 IEEE International Symposium on. IEEE, 2017: 175-182.
下载地址:
https://repositorio.inesctec.pt/bitstream/123456789/6190/1/P-00N-84Y.pdf
如下(没有什么值得补充的,那么就一切从简):
不过,我之前好像读过这篇论文。
但是没关系,再读一遍就是。
2 文章内容
先讲现状,缺陷定位未能实用的问题:
In the last two decades, a great amount of effort has been put in researching automated debugging techniques to support developers in the debugging process. However, in a widely cited user study published in 2011, Parnin and Orso found that research in automated debugging techniques made assumptions that do not hold in practice, and suggested four research directions to remedy this: absolute evaluation metrics, result comprehension, ecosystems, and user studies.
然后讲自己的工作 & 结论:
In this study, we revisit the research directions proposed by the authors, offering an overview of the progress that the research community has made in addressing them since 2011. We observe that new absolute evaluation metrics and result comprehension techniques have been proposed, while research in ecosystems and user studies remains mostly unexplored. We analyze what is hard about these unexplored directions and propose avenues for further research in the area of fault localization
大意是:过去二十年,大量的努力被投入到自动debug技术中,但是呢,在一个widely cited user study published in 2011 (即 ISTTA 2011的文章),Parnin 和 Orso 发现 自动修复技术基于的假设根本不现实,并且给了四个研究方向来纠正之:绝对而非相对的度量指标;结果理解;生态系统;以及用户研究。
而后,本文重新访问(revisit,有点像回顾的意思,温故而知新)了这些方向,给了一个2011年之后的overview。我们发现前两个做的还可以,但是后两个remain mostly unexplored。我们给出为什么还没有相关研究的原因,并对缺陷定位领域这方面进一步研究propose avenues(给出指点)。
3 QA环节之Q(即Questions)
3.1 Q1
想进一步了解一下2011年ISSTA那篇文章的一些细节(Parnin and Orso)
3.2 Q2
作者说,ISSTA2011 文章指出的后两个方向:生态系统和user study,还没有很好的得到研究。
我想知道作者给出的原因在哪里,给出的方向又在哪里
3.3 Q3
想知道所谓的前两个方向都发展到什么程度了
4 QA环节之A(即Answers)
4.1 A1
While advanced fault localization techniques have proven to be able to pinpoint faults in code, many studies have ignored their practical effectiveness [7]. This issue was raised in 2011 in a study by Parnin and Orso [1], in which they perform a preliminary user study and show evidence that many assumptions made by advanced fault localization techniques do not hold in practice. For example, many studies adopt a metric that is relative to the size of the codebase to evaluate the performance of a debugging technique. If a faulty statement is assigned a rank of 83, while the total lines of code amounts to 4408, then the evaluation metric suggests that the developer has to inspect 1.8% of the codebase, which appears as a positive result. However, Parnin and Orso observed in their user experiment that developers were not able to translate the results into a successful debugging activity [1].
这一段还是很尖锐的。
大意是:尽管先进的缺陷定位技术被认为是可以在代码中找到错误的,但是很多研究却忽略了这些技术的实际有效性。这一段就以有效性度量为例,进行了一番说明。所谓的只需要检查1.8%的代码,就能找到错误,实际上在工业中是不现实的,not a positive result。
4.2 A2
对于Ecosystems
:
1)As mentioned in Section IV-C, Gouveia et al. [27] have developed the GZoltar toolset, which is available as an Eclipse plug-in
2)Another tool that was created as a response to Parnin and Orso’s findings is AVA.
作者建议:
We observe that the SFL research community has not yet put a lot of effort in creating tools that can be used by developers. Therefore, we suggest that more effort should be spent on developing tools that facilitate automated debugging techniques
感觉有点,,,空。
对于User Studies
:
1)Xie et al. [45] reproduced a similar user study to Parnin and Orso’s work [1] that differs in the size of involved participants and debugging tasks, namely 207 participants and 17 debugging tasks.
2)Kochhar et al. [36] performed a user study by means of a survey involving 386 practitioners from more than 30 countries.
3)Wang et al. [35] evaluated IR-based fault localization techniques by means of an analytical study and one involving human subjects.
4)Xia et al. [44] reproduced a similar user study to the work of Parnin and Orso as well as Xie et al…
作者建议:
To summarize, we observe that the research community has performed a couple of user studies to understand the users’ needs. However, besides the mentioned user studies, almost no study evaluates their technique with a user study, which
意思是:建议所有的缺陷定位技术都做一个user study。
我认为这不太现实。这个工作量太大了。
但是呢,确实是通向工业化的一个很难躲开的工作,就看哪个实验室有决心做这个了,还是可以做的。
4.3 A3
对于Absolute Evaluation Metrics
,有如下:
1)wasted effort
2)accuracy at n ([email protected])
3)Wasted effort at n ([email protected])
4)Mean Average Precision (MAP)
5)Mean Reciprocal Rank (MRR)
6)the NCP metric, the number of candidate patches
对于Result Comprehension
,有如下:
1)Gouveia et al. [27] implemented GZoltar
2)Wang and Liu [42] presented an automated debugging technique using disparities of dynamic invariants, named FDDI
3)As mentioned in Section IV-B, Wen et al. [43] performed fault localization based on software changes, resulting in a list of suspicious change hunks.
4)Wang and Huang [41] proposed the use of weighted control flow subgraphs (WCFSs) to provide contextual information on the suspicious components in the diagnosis report.
5)Li et al. [39] proposed an SFL technique, named Swift, that involves the developer in the fault localization process.
6)Zuddas et al. [34] proposed a prototype tool, called MIMIC, that identifies potential causes of a failure.
7)Pastore and Mariani [29] proposed AVA, a fault localization technique that generates an explanation about why diagnosed components are considered suspicious.
5 文章结构
I. INTRODUCTION
II. BACKGROUND
III. PARNIN AND ORSO’S STUDY
IV. IMPACT OF PARNIN AND ORSO’S STUDY
V. RESEARCH IMPLICATIONS
VI. CONCLUSION
6 总结
1)感觉一篇顶会不是凭空而来的,确实还是有很多积累的。(当然也不一定)
这篇文章感觉是给 [IJCAI 2018] Leveraging Qualitative Reasoning to Improve SFL 铺路了。
6.1 好的地方
1)第一段写的真的挺好的. 有理有据,基本功很扎实。
Software systems are complex and error-prone, likely to expose failures to the end user. When a failure occurs, the developer has to debug the system to eliminate the failure. This debugging process can be described in three phases: fault localization, fault understanding, and fault correction [1]. This process is time-consuming and can account for 30% to 90% of the software development cycle [2]–[4].
[1] C. Parnin and A. Orso, “Are automated debugging techniques actually helping programmers?” in Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 2011, pp. 199–209.
[2] J. Robbins, Debugging Applications for Microsoft .NET and Microsoft Windows. Microsoft Press, 2003.
[3] B. Beizer, Software testing techniques. Dreamtech Press, 2003.
[4] T. Britton, L. Jeng, G. Carver, P. Cheak, and T. Katzenellenbogen, “Reversible debugging software,” Judge Bus. School, Univ. Cambridge, Cambridge, UK, Tech. Rep, 2013.
大意是:debugging process分为三个步骤:缺陷定位,缺陷理解,缺陷修正。
这个过程非常耗时,大概占软件开发循环(circle)过程 30%-90%的时间。
2)对缺陷定位技术进行分类,非常大胆:
Today’s most important fault localization techniques can be grouped into four categories: slice-based, spectrum-based, model-based, and information retrieval-based techniques. The first three techniques are discussed because most research has been performed on them compared to other techniques [5]. We discuss information retrieval-based fault localization techniques because they are inherently designed to work on natural languages, which can be useful in providing more context to developers when using SFL techniques in practice.
大意是:现在的主流缺陷定位技术大概有四种:基于切片,基于谱,基于模型,基于信息提取。
非常大胆的分类。
静态切片:不相关的组件(component)被移除,得到一个缩减的可执行的形式。(这样search domain就变小了。)
动态切片:在静态切片基础之上的改进,根据一个specific input 的执行信息来构造切片。(In dynamic slicing, a slice is constructed based on the execution information of a program for a specific input
)
基于谱:原来是1997年被提出来的。但是也可以追溯到1987年。
A spectrum was first introduced by Reps et al. [6]. A program spectrum consists of execution information from a perspective of interest.
[6] T. Reps, T. Ball, M. Das, and J. Larus, “The use of program profiling for software maintenance with applications to the year 2000 problem,” in Software EngineeringESEC/FSE’97. Springer, 1997, pp. 432–449
还指出Tarantula是一种流行的SFL技术。如下:
A popular SBFL technique to compute the suspiciousness score of each system component is Tarantula, proposed by Jones et al. [11]. Tarantula was developed to visualize fault localization results based on suspiciousness scores to improve the developer’s ability to locate faults.
基于模型:一开始用在硬件中,后来用在软件中。不符合模型的就是错误的。但是呢,在软件中,模型是基于程序语言的语义从源代码中生成的,所以的话,有可能模型就是错误的。
所以的话,测试用例集及其结果也被用在生成模型中。(还是离不开测试用例)如下:
In 1999, Mateis et al. [15] performed the first study where MBD is applied to Java, an imperative programming language. As opposed to models for physical systems, software programs written in an imperative language seldom come with a complete and up-to-date behavioral model. Therefore, for software systems, the model is generated from source code based on the semantics of the programming language. However, this model can be faulty as the source code is likely to contain bugs. Hence, expected results from a test case and its execution are used together with the generated model to diagnose bugs [16].
对于模型的理解,有待进一步加强。
基于信息提取:这里的话,我感觉作者讲的不是很清楚,很可能也是因为我对这个IRSFL本来了解就不多。这里简单给出如下介绍(来自原文):
Information retrieval (IR) has been most apparent in web search engines but has recently been applied to SFL. The purpose of IR is to retrieve relevant documents given a query [17]. In IR-based SFL (IRBSFL), bug reports are used as a search query and source code represents the document collection. To retrieve relevant documents, IRBSFL techniques make use of retrieval models, that essentially return documents that are most similar to the search query. Specifically, retrieval models define how documents and queries are characterized such that, ultimately, the representation of a document and query can be compared to find the most relevant documents. The five generic retrieval models that are used to perform SFL are [18]: Vector Space Model (VSM) [19] , Smoothed Unigram Model (SUM) [18], Latent Dirichlet Allocation (LDA) [20], [21], Latent Semantic Indexing (LSI) [22], [23], Cluster Based Document Model (CBDM) [18].
大意是:在IRB SFL 中,缺陷报告被用作是search query(搜索查询),同时,源代码被当做document collection(文档集合)。为了提取相关的文档(源代码),IRB SFL技术会利用提取模型,来返回和搜索查询最相似的源代码。
大概明白了。有点复用的感觉?相似度?
学科交叉?
3)最后给的建议还是很靠谱的:(虽然目前感觉还不太可能有人会做,但确实是未来的趋势,缺陷定位工业化路上的一个步骤。)
6.2 不足之处
1)讲真,很佩服作者,调查的真的很仔细,基本上没漏重要论文引用。
太厉害了。
但是这里的最后一句,语气感觉太弱了。
很难凸显自己和related work的不同。
感觉可以换一种说法会不会更好一些呢?
Souza et al. [24] presented a fault localization survey, where they addressed the shortcomings of current SBFL techniques to be applied in industry. The authors do this by addressing five aspects of fault localization: techniques, faults, benchmarks, testing information, and practical use. Although Souza et al. focused on the practicality of SBFL, which is similar to this survey, we also survey studies that propose SFL ecosystems.
感想
本来不打算写这么多的,这篇文章之前看过的,但是没忍住,很多知识都有必要记下来,所以写了这么多。
写到后面已经有点累了,大概连续学习了一个多小时吧。
先就这样。