ai @ wvu: August 2009

Monday, August 24, 2009

Oussama El-Rawas

Known by his colleagues as simply "Ous" (pronounced like moose "without" the "m"), Oussama originally did his undergrad at the American University of Beirut in Computer and Communications Engineering. Shortly after, he enrolled at WVU in the Electrical Engineering Graduate program, where he conducted his research in software engineering models, search based AI, and software project management. His research produced, in addition to his final Masters thesis, several papers that have been accepted at conferences such as ASE, ICSE and ICSP among others.

Currently Oussama is commencing his PhD studies, working as and GTA and continuing his research work with Dr. Tim Menzies. When not occupied with his work, Oussama enjoys reading technology news, researching open source software and its use, listening to various genres of metal music, and spending time with his beautiful wife.

Creating the New Generation of Forensic Interpretation Models

In their report, "Strengthening Forensic Science in the United States: A Path Forward" the National Academy of Sciences (NAS) laid out not only the challenges facing the forensic science community but also put forward recommendations for improvements required alleviate the problems, disparities, and lack of mandatory standards currently facing the community.

Highlighted in this report is the concern that faulty forensic science can contribute to the wrongful convictions of innocent people. One the other side of the coin, someone who is guilty of a crime can be wrongfully acquitted.

Take for instance a hypothetical case of a suspect being connected to a crime scene by trace evidence such as glass. It would be very easy to challenge the forensic analyses by simply requesting that the analysis be done by at least five(5) different forensic labs. One can almost be 100% certain that they would have no choice but to throw out the evidence because of the inconsistency of the respective results.

It is for this reason we have taken up the NAS's challenge to develop a standard forensic interpretation methods to convincingly "demonstrate a connection between evidence and a specific individual or source". It is our hope that this project will not only supply the analyst with an interpretation of the data, but also a measure of what minimal changes in the data can result in a different interpretation.

For this project we studied current methods of forensic interpretation of glass evidence. In this subfield of forensics we have so far identified at least three separate branches of research leading to three different 'best' models: the 1995 Evett model, the 1996 Walsh model, and the 2002 Koons model. To date our research has shown that the former models are 'broken'. There are two root causes for the broken mathematical models:

'Brittle' interpretations where small input changes can lead to large output changes.
Inappropriate assumptions about the distribution of data.

To deal with these two issues, our project proposes the use of (a) clustering algorithms (meso and ant), and (b) treatment learning. The clustering algorithms will allow us to reason about crime scene data without the knowledge of standard statistical distributions, while treatment learning offers the analyst a measure of how strongly an interpretation should be believed.

Our project will also boast of a plotting tool (CLIFF) which offers the visualization of data. The software features four(4) of the current models used in the forensic community, namely the 1995 Evett model, the 1996 Walsh model, the Seheult model and the Grove model. For each model, data can be generated randomly and plotted. Other features of CLIFF include:

the ability to perform dimensionality reduction by applying the Fastmap algorithm
and the ability to determine if the results gained from a particular region of space is important and well supported.

In the end, it must be made clear that our purpose is not to determine innocence or guilt, or give a 100% guarantee of a match/non-match. Instead our goal is to aid in the decision making process of forensic interpretation. We want to provide the analyst with a standard dependable tool/model which reports to them an interpretation of the crime scene data as well as the treatment learner's analysis of what minimal 'treatment' could change the interpretation. What happens after this is in the hands of the analyst.

Ultimately we want to be able to rely on the results presented in a court of law, results based on correct models accepted and used by the entire forensic community. So, if a suspect demands that test be done by five(5) different forensic labs, there should be no dramatic differences in the results.

Friday, August 14, 2009

Adam Nelson

Adam is a Computer Science graduate student at WVU who studies data mining as well as other applications for machine learning. His interests off-campus include (but are not limited to) playing MMOs, going to see movies frequently, sleeping, eating, etc.

Wednesday, August 12, 2009

MILL student spends summer at NASA AMES

Greg Gay, a masters student at the MILL, spent summer'09 working at NASA AMES (Silicon Valley, California) with model-based simulation experts.

He was testing some tools developed at the MILL on flight control problems at the NASA. His results, available on-line, showed that some models are better analyzed with non-continuous contrast set learning that traditional continuous methods.

Here are some photos from his time as NASA:

Tuesday, August 11, 2009

The Diagnosis of Mission-Critical Failures

About Us

PhD Students

Ekrem Kocaguneli

Fayola Peters

Masters Students

Projects

Forensics:

Creating the New Generation of Forensic Interpreation Models

Software

Software Models

POM2:

A software development process model.

Model Optimization

Cliff:

Visualization tool for finding "brittle" regions inside a model.

KEYS and KEYS2:

Exploits the natural key variables in a model to find the optimal input settings.

Toolkits

Ourmine:

A toolkit used for the teaching and researching of advanced Data Mining concepts.

Treatment Learners

TAR3:

Generates treatments to manipulate the class distribution of a dataset using life and support.

TAR4.1:

Generates treatments to manipulate the class distribution of a dataset using Bayesian probabilities.

Data for AI + SE experiments

The MILL is very active in running the PROMISE repository for repeatable experiments in SE.

Any data used in our experiments is stored there and is freely available for others to use.

Share and enjoy

Paper accepted to ICSM 2009

On the use of Relevance Feedback in IR-based Concept Location
Gregory Gay, Sonia Haiduc, Andrian Marcus, Tim Menzies

Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The efficiency of such techniques is strongly dependent on the ability of the developer to write good queries. We propose an approach to augment information retrieval (IR) based concept location via an explicit relevance feedback (RF) mechanism. RF is a two-part process in which the developer judges existing results returned by a search and the IR system uses this information to perform a new search, returning more relevant information to the user. A set of case studies performed on open source software systems reveals the impact of RF on the IR based concept location.

Note: ICSM has a 21.6% acceptance rate.

Todos: Aug 11 '09

I'm not doing meetings this week (writing subjects). But in the meantime:

All of you:

I want to run the weekly MILL meeting tuesday 11am. Please advise if you can't make it.
Please make sure i've given you an account on the MIL blog
Add your page with a description and picture of you! See GregG's page for an example : http://ai-at-wvu.blogspot.com/2009/08/gregory-gay_11.html

Fayola, Zach:

in "software" can write a post with screen snaps of the tool. and write a small blurb on it.
what where your goals while i was away? plz email me and advise

Ous:

looking forward to the paper
when are you back from honeymoon? can we organize a teleconf with australia week of aug14.

Phil and Ous and Andres: you need to read jackie keung's stuff

http://nicta.com.au/people/keungj
In particular, his recent TSE article http://www.jackykeung.com/media/papers/TSE-0109-0506-1.pdf

Greg:

looking forward to the paper
can you add to "software" pages on tar3, tar4, keys, and your keys experimental toolkit

Bryan:

looking forward to the paper
can you add to "pom2" pages notes on ourmine

AdamN:

looking forward to the paper
can you add to "software" pages notes on ourmine

AdamB: call me. tell me where we are up to.

New version of Cliff

Fayola and Zach coded an amazing new version of the forensics tool. Which Timm used to write an NIJ proposal. So cross fingers!

Gregory Gay

Computer Science researcher by day, gaming journalist by night - is there anything that Greg can’t do? Hint, the answer is lots. Greg is a pop-culture junkie with an entertainment addiction. He’s also likely to quote Oscar Wilde at any given moment. Greg loves quirky Japanese video games, esoteric role-playing games, Tetris, film noir, Lovecraftian horror, and Irish whiskey. He isn’t wild about Russian literature or DC crossover events, both of which result in confusing, mind-shattering crises.

Greg is a master's student at WVU. When not researching the latest information retrieval and parametric testing methods, he writes about video games for 4 Color Rebellion. He used to head up WVU's ACM chapter.

For more info, see his personal site, twitter, or facebook.

People

Here are the people at the Modeling Intelligence Lab

Director:

Tim Menzies

Ph.D. students

Masters students

Teaching

The MILL services the following subjects:

cs472/cs572: data mining and advanced data mining (offered in the fall).
cs473/cs573: artifical intelligence and advanced AI (offered in the spring).
700-level special topic: search-based software engineering
700-level special topic: agent-oriented software development

Employment

For graduate students interested in positions at the Modeling Intelligence Lab, we offer the following advice:

Before you buy, try renting.

That is, before taking on work here at the MILL:

either take a MILL-taught graduate subject
or come along to the MILL meetings, learn who is doing what, then see if you can add anything to that project.

After that, you (and us) can better understand what you do best and we (and you) can make the right decision about your employment.

Paper accepted to ISSRE'09

Cost Curve Evaluation of Fault Prediction Models

Yue Jiang, Bojan Cukic, Tim Menzies

Prediction of fault prone software components is one of the most researched problems in software engineering. Many statistical techniques have been proposed but there is no consensus on the methodology to select the "best model" for the specific project. In this paper, we introduce and discuss the merits of cost curve analysis of fault prediction models. Cost curves allow software quality engineers to introduce project-specific cost of module misclassification into model evaluation. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, also associated with cost implications. Through the analysis of sixteen projects from public repositories, we observe that software quality does not necessarily benefit from the prediction of fault prone components. The inclusion of misclassification cost in model evaluation may indicate that even the "best" models achieve performance no better than trivial classification. Our results support a recommendation favoring the use of cost curves in practice with the hope they will become a standard tool for software quality model performance evaluation.

(Short) Paper accepted to ASE'09

Assessing the Relative Merits of Agile vs Traditional Software Development.

Bryan Lemon, Aaron Riesbeck, Tim Menzies, Justin Price, Joseph D’Alessandro, Rikard Carlsson, Tomi Priﬁti, Fayola Peters, Hiuhua Lu, Dan Port

We implemented Boehm-Turner’s model of agile and plan-based software development. That tool is augmented with an AI search engine to ﬁnd the key factors that predict for the success of agile or traditional plan-based software developments. According to our simulations and AI search engine: (1) in no case did agile methods perform worse than plan-based approaches; (2) in some cases, agile performed best. Hence, we recommend that the default development practice for organizations be an agile method. The simplicity of this style of analysis begs the question: why is so much time wasted on evidence-less debates on software process when a simple combination of simulation plus automatic search can mature the dialogue much faster?

Paper accepted to ASE'09

Understanding the Value of Software Engineering Technologies .

Phillip Green II, Tim Menzies, Steven Williams, Oussama El-Rawas

SEESAW combines AI search tools, a Monte Carlo simulator, and some software process models. We show here that, when selecting technologies for a software project, SEESAW out-performs a variety of other search engines. SEESAW’s recommendations are greatly affected by the business context of its use. For example, the automatic defect reduction tools explored by the ASE community are only relevant to a subset of software projects, and only according to certain value criteria. Therefore, when arguing for the value of a particular technology, that argument should include a description of the value function of the target user community.

Note: ASE has a 17% acceptance rate.

Getting started at the MILL

Get room access to 1011.
Get an account on wisp.
Get an account on stuff.
Get an account on this blog.
1. Find a picture of yourself. Upload it to the MILL photostream.
2. Write 200 words about yourself.
3. Add both to a page names YourFirstNameX where "X" is the first initial of your last name. When adding your picture, use the web-based URL of your pic from the photostream.
4. Make sure you add YourFirstNameX to the labels of that post (and any future post that mentions you).
Get server stack access. (Join the ai group @ csee -- CC TimM)

Andres Orrego

Andres Orrego is Director of Innovations at Global Science Technology and at Ph.D. student in the WVU Modeling Intelligence Lab.

Tim Menzies

Assoc. Prof. Tim Menzies, (tim@menzies.us) has been working on advanced modeling and AI since 1986. He received his PhD from the University of New South Wales, Sydney, Australia and is the author of over 164 refereeed papers .

A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University's Lane Department of Computer Science and Electrical Engineering and director of the Modeling Intelligence Lab.

For more information, visit his web page at http://menzies.us.

Treatment learning outperforms other methods

Greg went to NASA AMES this summer and found that treatment learning (tar3 and tar4) beat the heck out of a range of standard methods (simulated annealing and gradient descent) for optimizing NASA simulators.

Meanwhile, he wowwed the hell out of the NASA folks. Maybe we can send grad students there every year?

For more, see his on-line talk: The Diagnosis of Mission-Critical Failures.

Saturday, August 1, 2009

MILL photostream

fans of alan

Name: a.turing

Magic word: 1ihateapples!