QA Financial Forum New York | 15 May 2024 | BOOK TICKETS
Search
Close this search box.

Research article review: Defect prediction for noisy imbalanced datasets

230926-research-article-review--improving-software-defect-prediction-in-noisy-imbalanced-datasets-1695890477

“Improving Software Defect Prediction in Noisy Imbalanced Datasets” a paper, authored by Shi Haoxiang [pictured] et al. and published in the 13th edition of the Applied Sciences journal (19th September 2023), proposes a method for dealing with noisy or imbalanced datasets when training AI based software defect prediction models. 

The authors provide an overview of the issues associated with using noisy and imbalanced datasets for software defect prediction and conclude that their proposed method provides “significant improvement” over existing approaches.

The paper begins with an overview of the conventional approaches to dealing with such datasets when training software defect prediction models, before proposing a novel method: “US-PONR” (Undersampling, Propensity-Score-Matching-Based Oversampling and Noise Reduction). 

The researchers found that US-PONR achieved improved performance compared with  benchmark software defect prediction methods: “The results indicate that no matter which dataset or method was used, the introduction of noise always reduced a method’s prediction performance. However, under most noise ratios of most datasets, the prediction performance of US-PONR was the best, while its decrease in performance as the noise ratio rose was minimal.”

[Image Source: ResearchGate]