Monday, August 24, 2009
Oussama El-Rawas
Known by his colleagues as simply "Ous" (pronounced like moose "without" the "m"), Oussama originally did his undergrad at the American University of Beirut in Computer and Communications Engineering. Shortly after, he enrolled at WVU in the Electrical Engineering Graduate program, where he conducted his research in software engineering models, search based AI, and software project management. His research produced, in addition to his final Masters thesis, several papers that have been accepted at conferences such as ASE, ICSE and ICSP among others.
Currently Oussama is commencing his PhD studies, working as and GTA and continuing his research work with Dr. Tim Menzies. When not occupied with his work, Oussama enjoys reading technology news, researching open source software and its use, listening to various genres of metal music, and spending time with his beautiful wife.
Creating the New Generation of Forensic Interpretation Models
Highlighted in this report is the concern that faulty forensic science can contribute to the wrongful convictions of innocent people. One the other side of the coin, someone who is guilty of a crime can be wrongfully acquitted.
Take for instance a hypothetical case of a suspect being connected to a crime scene by trace evidence such as glass. It would be very easy to challenge the forensic analyses by simply requesting that the analysis be done by at least five(5) different forensic labs. One can almost be 100% certain that they would have no choice but to throw out the evidence because of the inconsistency of the respective results.
It is for this reason we have taken up the NAS's challenge to develop a standard forensic interpretation methods to convincingly "demonstrate a connection between evidence and a specific individual or source". It is our hope that this project will not only supply the analyst with an interpretation of the data, but also a measure of what minimal changes in the data can result in a different interpretation.
For this project we studied current methods of forensic interpretation of glass evidence. In this subfield of forensics we have so far identified at least three separate branches of research leading to three different 'best' models: the 1995 Evett model, the 1996 Walsh model, and the 2002 Koons model. To date our research has shown that the former models are 'broken'. There are two root causes for the broken mathematical models:
- 'Brittle' interpretations where small input changes can lead to large output changes.
- Inappropriate assumptions about the distribution of data.
To deal with these two issues, our project proposes the use of (a) clustering algorithms (meso and ant), and (b) treatment learning. The clustering algorithms will allow us to reason about crime scene data without the knowledge of standard statistical distributions, while treatment learning offers the analyst a measure of how strongly an interpretation should be believed.
Our project will also boast of a plotting tool (CLIFF) which offers the visualization of data. The software features four(4) of the current models used in the forensic community, namely the 1995 Evett model, the 1996 Walsh model, the Seheult model and the Grove model. For each model, data can be generated randomly and plotted. Other features of CLIFF include:
- the ability to perform dimensionality reduction by applying the Fastmap algorithm
- and the ability to determine if the results gained from a particular region of space is important and well supported.
In the end, it must be made clear that our purpose is not to determine innocence or guilt, or give a 100% guarantee of a match/non-match. Instead our goal is to aid in the decision making process of forensic interpretation. We want to provide the analyst with a standard dependable tool/model which reports to them an interpretation of the crime scene data as well as the treatment learner's analysis of what minimal 'treatment' could change the interpretation. What happens after this is in the hands of the analyst.
Ultimately we want to be able to rely on the results presented in a court of law, results based on correct models accepted and used by the entire forensic community. So, if a suspect demands that test be done by five(5) different forensic labs, there should be no dramatic differences in the results.
Friday, August 14, 2009
Adam Nelson
Wednesday, August 12, 2009
MILL student spends summer at NASA AMES
He was testing some tools developed at the MILL on flight control problems at the NASA. His results, available on-line, showed that some models are better analyzed with non-continuous contrast set learning that traditional continuous methods.
Here are some photos from his time as NASA:
Tuesday, August 11, 2009
Software
Software Models
POM2:- A software development process model.
Model Optimization
Cliff:
- Visualization tool for finding "brittle" regions inside a model.
KEYS and KEYS2:
- Exploits the natural key variables in a model to find the optimal input settings.
Toolkits
Ourmine:
- A toolkit used for the teaching and researching of advanced Data Mining concepts.
Treatment Learners
TAR3:
- Generates treatments to manipulate the class distribution of a dataset using life and support.
TAR4.1:
- Generates treatments to manipulate the class distribution of a dataset using Bayesian probabilities.
Data for AI + SE experiments
Any data used in our experiments is stored there and is freely available for others to use.
Share and enjoy
Paper accepted to ICSM 2009
Gregory Gay, Sonia Haiduc, Andrian Marcus, Tim Menzies
Concept location is a critical activity during software evolution as it produces the location where a change is to start in response to a modification request, such as, a bug report or a new feature request. Lexical based concept location techniques rely on matching the text embedded in the source code to queries formulated by the developers. The efficiency of such techniques is strongly dependent on the ability of the developer to write good queries. We propose an approach to augment information retrieval (IR) based concept location via an explicit relevance feedback (RF) mechanism. RF is a two-part process in which the developer judges existing results returned by a search and the IR system uses this information to perform a new search, returning more relevant information to the user. A set of case studies performed on open source software systems reveals the impact of RF on the IR based concept location.
Note: ICSM has a 21.6% acceptance rate.
Todos: Aug 11 '09
All of you:
- I want to run the weekly MILL meeting tuesday 11am. Please advise if you can't make it.
- Please make sure i've given you an account on the MIL blog
- Add your page with a description and picture of you! See GregG's page for an example : http://ai-at-wvu.blogspot.com/2009/08/gregory-gay_11.html
- in "software" can write a post with screen snaps of the tool. and write a small blurb on it.
- what where your goals while i was away? plz email me and advise
- looking forward to the paper
- when are you back from honeymoon? can we organize a teleconf with australia week of aug14.
- http://nicta.com.au/people/keungj
- In particular, his recent TSE article http://www.jackykeung.com/media/papers/TSE-0109-0506-1.pdf
- looking forward to the paper
- can you add to "software" pages on tar3, tar4, keys, and your keys experimental toolkit
- looking forward to the paper
- can you add to "pom2" pages notes on ourmine
- looking forward to the paper
- can you add to "software" pages notes on ourmine
New version of Cliff
Gregory Gay
Greg is a master's student at WVU. When not researching the latest information retrieval and parametric testing methods, he writes about video games for 4 Color Rebellion. He used to head up WVU's ACM chapter.
For more info, see his personal site, twitter, or facebook.
People
Director:
Ph.D. students
Masters students
Teaching
- cs472/cs572: data mining and advanced data mining (offered in the fall).
- cs473/cs573: artifical intelligence and advanced AI (offered in the spring).
- 700-level special topic: search-based software engineering
- 700-level special topic: agent-oriented software development
Employment
- Before you buy, try renting.
- either take a MILL-taught graduate subject
- or come along to the MILL meetings, learn who is doing what, then see if you can add anything to that project.
Paper accepted to ISSRE'09
Yue Jiang, Bojan Cukic, Tim Menzies
Prediction of fault prone software components is one of the most researched problems in software engineering. Many statistical techniques have been proposed but there is no consensus on the methodology to select the "best model" for the specific project. In this paper, we introduce and discuss the merits of cost curve analysis of fault prediction models. Cost curves allow software quality engineers to introduce project-specific cost of module misclassification into model evaluation. Classifying a software module as fault-prone implies the application of some verification activities, thus adding to the development cost. Misclassifying a module as fault free carries the risk of system failure, also associated with cost implications. Through the analysis of sixteen projects from public repositories, we observe that software quality does not necessarily benefit from the prediction of fault prone components. The inclusion of misclassification cost in model evaluation may indicate that even the "best" models achieve performance no better than trivial classification. Our results support a recommendation favoring the use of cost curves in practice with the hope they will become a standard tool for software quality model performance evaluation.
(Short) Paper accepted to ASE'09
Bryan Lemon, Aaron Riesbeck, Tim Menzies, Justin Price, Joseph D’Alessandro, Rikard Carlsson, Tomi Prifiti, Fayola Peters, Hiuhua Lu, Dan Port
We implemented Boehm-Turner’s model of agile and plan-based software development. That tool is augmented with an AI search engine to find the key factors that predict for the success of agile or traditional plan-based software developments. According to our simulations and AI search engine: (1) in no case did agile methods perform worse than plan-based approaches; (2) in some cases, agile performed best. Hence, we recommend that the default development practice for organizations be an agile method. The simplicity of this style of analysis begs the question: why is so much time wasted on evidence-less debates on software process when a simple combination of simulation plus automatic search can mature the dialogue much faster?
Paper accepted to ASE'09
Understanding the Value of Software Engineering Technologies .
Phillip Green II, Tim Menzies, Steven Williams, Oussama El-Rawas
SEESAW combines AI search tools, a Monte Carlo simulator, and some software process models. We show here that, when selecting technologies for a software project, SEESAW out-performs a variety of other search engines. SEESAW’s recommendations are greatly affected by the business context of its use. For example, the automatic defect reduction tools explored by the ASE community are only relevant to a subset of software projects, and only according to certain value criteria. Therefore, when arguing for the value of a particular technology, that argument should include a description of the value function of the target user community.
Note: ASE has a 17% acceptance rate.
Getting started at the MILL
- Get room access to 1011.
- Get an account on wisp.
- Get an account on stuff.
- Get an account on this blog.
- Find a picture of yourself. Upload it to the MILL photostream.
- Write 200 words about yourself.
- Add both to a page names YourFirstNameX where "X" is the first initial of your last name. When adding your picture, use the web-based URL of your pic from the photostream.
- Make sure you add YourFirstNameX to the labels of that post (and any future post that mentions you).
- Find a picture of yourself. Upload it to the MILL photostream.
- Get server stack access. (Join the ai group @ csee -- CC TimM)
Andres Orrego
Tim Menzies
A former research chair for NASA, Dr. Menzies is now a associate professor at the West Virginia University's Lane Department of Computer Science and Electrical Engineering and director of the Modeling Intelligence Lab.
For more information, visit his web page at http://menzies.us.
Treatment learning outperforms other methods
Meanwhile, he wowwed the hell out of the NASA folks. Maybe we can send grad students there every year?
For more, see his on-line talk: The Diagnosis of Mission-Critical Failures.