ai @ wvu: Paper

Testing large-scale systems is expensive in terms of both time and money. Running simulations early in the process is a proven method of ﬁnding the design faults likely to lead to critical system failures, but determining the exact cause of those errors is still time-consuming and requires access to a limited number of domain experts. It is desirable to ﬁnd an automated method that explores the large number of combinations and is able to isolate likely fault points. Treatment learning is a subset of minimal contrast-set learning that, rather than classifying data into distinct categories, focuses on ﬁnding the unique factors that lead to a particular classiﬁcation. That is, they ﬁnd the smallest change to the data that causes the largest change in the class distribution. These treatments, when imposed, are able to identify the settings most likely to cause a mission-critical failure. This research benchmarks two treatment learning methods against standard optimization techniques across three complex systems, including two pro jects from the Robust Software Engineering (RSE) group within the National Aeronautics and Space Administration (NASA) Ames Research Center. It is shown that these treatment learners are both faster than traditional methods and show demonstrably better results.

New draft: Finding Robust Solutions in Requirements Models

Gregory Gay , Tim Menzies , Omid Jalali , Gregory Mundy, Beau Gilkerson, Martin Feather, and James Kiper

Solutions to non-linear requirements engineering problems may be “brittle”; i.e. small changes may dramatically alter solution effectiveness. Therefore, it is not enough to merely generate solutions to requirements problems- we must also assess the robustness of that solution. This paper introduces the KEYS2 algorithm that can generate decision ordering diagrams. Once generated, these diagrams can assess solution robustness in linear time. In experiments with real-world requirements engineering models, we show that KEYS2 can generate decision ordering diagrams in O(N 2 ). When assessed in terms of terms of (a) reducing inference times, (b) increasing solution quality, and (c) decreasing the variance of the generated solution, KEYS2 out-performs other search algorithms (simulated annealing, ASTAR, MaxWalkSat).

New draft: Controlling Randomized Unit Testing With Genetic Algorithms

James H. Andrews and Tim Menzies and Felix C. H. Li

Randomized testing is an effective method for testing software units. Thoroughness of randomized unit testing varies widely according to the settings of certain parameters, such as the relative frequencies with which methods are called. In this paper, we describe Nighthawk, a system which uses a genetic algorithm (GA) to ﬁnd parameters for randomized unit testing that optimize test coverage. Designing GAs is somewhat of a black art. We therefore use a feature subset selection (FSS) tool to assess the size and content of the representations within the GA. Using that tool, we can prune back 90% of our GA’s mutators while still achieving most of the coverage found using all the mutators. Our pruned GA achieves almost the same results as the full system, but in only 10% of the time. These results suggest that FSS for mutator pruning could signiﬁcantly optimize meta-heuristic search-based software engineering tools.

Tuesday, August 11, 2009

Paper accepted to ICSM 2009

On the use of Relevance Feedback in IR-based Concept Location
Gregory Gay, Sonia Haiduc, Andrian Marcus, Tim Menzies

Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The efficiency of such techniques is strongly dependent on the ability of the developer to write good queries. We propose an approach to augment information retrieval (IR) based concept location via an explicit relevance feedback (RF) mechanism. RF is a two-part process in which the developer judges existing results returned by a search and the IR system uses this information to perform a new search, returning more relevant information to the user. A set of case studies performed on open source software systems reveals the impact of RF on the IR based concept location.

Note: ICSM has a 21.6% acceptance rate.

Paper accepted to ISSRE'09

Cost Curve Evaluation of Fault Prediction Models

Yue Jiang, Bojan Cukic, Tim Menzies

Prediction of fault prone software components is one of the most researched problems in software engineering. Many statistical techniques have been proposed but there is no consensus on the methodology to select the "best model" for the specific project. In this paper, we introduce and discuss the merits of cost curve analysis of fault prediction models. Cost curves allow software quality engineers to introduce project-specific cost of module misclassification into model evaluation. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, also associated with cost implications. Through the analysis of sixteen projects from public repositories, we observe that software quality does not necessarily benefit from the prediction of fault prone components. The inclusion of misclassification cost in model evaluation may indicate that even the "best" models achieve performance no better than trivial classification. Our results support a recommendation favoring the use of cost curves in practice with the hope they will become a standard tool for software quality model performance evaluation.

(Short) Paper accepted to ASE'09

Assessing the Relative Merits of Agile vs Traditional Software Development.

Bryan Lemon, Aaron Riesbeck, Tim Menzies, Justin Price, Joseph D’Alessandro, Rikard Carlsson, Tomi Priﬁti, Fayola Peters, Hiuhua Lu, Dan Port

We implemented Boehm-Turner’s model of agile and plan-based software development. That tool is augmented with an AI search engine to ﬁnd the key factors that predict for the success of agile or traditional plan-based software developments. According to our simulations and AI search engine: (1) in no case did agile methods perform worse than plan-based approaches; (2) in some cases, agile performed best. Hence, we recommend that the default development practice for organizations be an agile method. The simplicity of this style of analysis begs the question: why is so much time wasted on evidence-less debates on software process when a simple combination of simulation plus automatic search can mature the dialogue much faster?

Paper accepted to ASE'09

Understanding the Value of Software Engineering Technologies .

Phillip Green II, Tim Menzies, Steven Williams, Oussama El-Rawas

SEESAW combines AI search tools, a Monte Carlo simulator, and some software process models. We show here that, when selecting technologies for a software project, SEESAW out-performs a variety of other search engines. SEESAW’s recommendations are greatly affected by the business context of its use. For example, the automatic defect reduction tools explored by the ASE community are only relevant to a subset of software projects, and only according to certain value criteria. Therefore, when arguing for the value of a particular technology, that argument should include a description of the value function of the target user community.

Note: ASE has a 17% acceptance rate.