1. For the first release of a new project, how can we learn quality prediction models on cross-project data?
Burak proposed to use NN-filter to help cross-project defect prediction and made some promising results (JASE 2010). Now Fayola is also working on this and do it much better (mostly on small size test sets). There are so much worth being further investigated.
1.1 can we do better on cross-learning performance ?
-- some more effective ways to understand the training and test data, like TEAK?
1.2 can we generalize cross-learning strategies to other data?
-- not only small test sets and static code measures.
1.3 can we make cross-learning algorithms more scalable?
-- maybe low dimensionality can help?
2. For the following releases, we have to:
2.1 handle the data shift or concept drift problem caused by development environment change to produce better prediction results.
-- now I'm going on this direction, see my post for last week's group meeting.
2.2 know whether cross-project data can further improve the prediction performance even though we have local data. If cross-project data works, how should we make use of it?
transductive transfer learning for research question 1, and inductive transfer learning for research question 2.
-- transductive transfer learning: plenty of labeled data in the source domain, no labeled data in target domain is available.
-- inductive transfer learning: plenty of labeled data in the source domain and a small number of labeled data in the target domain are available.
1-2 publication on this topic.
-- FSE conference, JESE, or somewhere better.
All comments are welcome.