Current Research Projects

Transfer Learning in Software Engineering

2013 -- 2017

NSF, SHF: Medium: Collaborative, #1302216

The goal of the research is to enable software engineers to find software development best practices from past empirical data. The increasing availability of software development project data, plus new machine learning techniques, make it possible for researchers to study the generalizability of results across projects using the concept of transfer learning. Using data from real software projects, the project will determine and validate best practices in three areas: predicting software development effort; isolating software detects; effective code inspection practices.

This research will deliver new data mining technologies in the form of transfer learning techniques and tools that overcome current limitations in the state-of-the-art to provide accurate learning within and across projects. It will design new empirical studies, which apply transfer learning to empirical data collected from industrial software projects. It will build an on-line model analysis service, making the techniques and tools available to other researchers who are investigating validity of principles for best practice.

The broader impacts of the research will be to make empirical software engineering research results more transferable to practice, and to improve the research processes for the empirical software engineering community. By providing a means to test principles about software development, this work stands to transform empirical software engineering research and enable software managers to rely on scientifically obtained facts and conclusions rather than anecdotal evidence and one-off studies. Given the immense importance and cost of software in commercial and critical systems, the research has long-term economic impacts.

Model construction work for NASA Software effort estimation

2014, 2015

Dr Menzies and JPL will work on the following tasks:


Human Health and Obesity

2013, 2014

Background: Collecting data can be cumbersome and expensive. Lack of relevant, accurate and timely data for research to inform policy may negatively impact public health. The aim of this study was to test if the careful removal of items from two community nutrition surveys guided by a data mining technique called feature selection, can (a) identify a reduced dataset, while (b) not damaging the signal inside that data.

Methods: While there are many potentially important variables in any domain, the most useful set may only be a small subset. The use of feature selection in the initial phase of data collection to identify the most influential variables may be a useful tool to greatly reduce the amount of data needed thereby reducing cost.

Case Studies: The Nutrition Environment Measures Surveys for stores (NEMS-S) and restaurants (NEMS-R) were completed on retail food outlets in counties in West Virginia between May and November of 2011. Using that data, it may be possible to find a reduced dataset was identified for each outlet type using feature selection.