Write a Blog >>
MSR 2021
Mon 17 - Wed 19 May 2021
co-located with ICSE 2021
Wed 19 May 2021 10:11 - 10:15 at MSR Room 2 - Dependencies and OSS Chair(s): Luca Pascarella

A high imbalance exists between technical debt and non-technical debt source code comments. Such imbalance affects Self Admitted Technical Debt (SATD) detection performance, and existing literature lacks empirical evidence on the choice of balancing technique. In this work, we evaluate the impact of multiple balancing techniques, including Data level, Classifier level, and Hybrid, for SATD detection in Within-Project and Cross-Project setup. Our results show that the Data level balancing technique SMOTE or Classifier level Ensemble approaches with Random Forest or XGBoost are reasonable choices depending on whether the goal is to maximize Precision, Recall, F1, or AUC-ROC. We compared our best-performing model with the previous SATD detection benchmark (cost-sensitive Convolution Neural Network). Interestingly the top-performing XGBoost with SMOTE sampling improved the Within-project F1 score by 10% but fell short in Cross-Project set up by 9%. This supports the higher generalization capability of deep learning in Cross-Project SATD detection, yet while working within individual projects, classical machine learning algorithms can deliver better performance. We also evaluate and quantify the impact of duplicate source code comments in SATD detection performance. Finally, we employ SHAP and discuss the interpreted SATD features. We have included the replication package and shared a web-based SATD prediction tool with the balancing techniques in this study.

Conference Day
Wed 19 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:00 - 10:50
Dependencies and OSSTechnical Papers / Registered Reports at MSR Room 2
Chair(s): Luca PascarellaDelft University of Technology
10:01
3m
Talk
Identifying Critical Projects via PageRank and Truck Factor
Technical Papers
Rolf-Helge PfeifferIT University of Copenhagen
Pre-print
10:04
4m
Talk
Revisiting Dockerfiles in Open Source Software Over Time
Technical Papers
Kalvin EngUniversity of Alberta, Abram HindleUniversity of Alberta
Pre-print
10:08
3m
Talk
Does the First-Response Matter for Future Contributions? A Study of First Contributions
Registered Reports
Noppadol AssavakamhaenghanNara Institute of Science and Technology, Supatsara WattanakriengkraiNara Institute of Science and Technology, Naomichi ShimadaNara Institute of Science and Technology, Raula Gaikovina KulaNAIST, Takashi IshioNara Institute of Science and Technology, Kenichi MatsumotoNara Institute of Science and Technology
Pre-print
10:11
4m
Talk
Data Balancing Improves Self-Admitted Technical Debt Detection
Technical Papers
Murali SridharanUniversity of Oulu, Leevi RantalaUniversity of Oulu, Maëlick ClaesUniversity of Oulu, Mika MäntyläUniversity of Oulu
Pre-print
10:15
35m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Info for MSR Room 2: