Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data
Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantified how often such data is dirty. Depending on the research task and method used, including such dirty data could affect the research results. This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 690 technical track and data papers published in MSR 2004–2020, we saw 35% of papers utilized time-based data. We also used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty commit timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data sources.
Tue 18 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
03:10 - 04:00 | Time series dataData Showcase / Technical Papers at MSR Room 2 Chair(s): Shane McIntosh University of Waterloo | ||
03:11 3mTalk | AndroCT: Ten Years of App Call Traces in Android Data Showcase Pre-print Media Attached | ||
03:14 4mTalk | Mining Workflows for Anomalous Data Transfers Technical Papers Huy Tu North Carolina State University, USA, George Papadimitriou University of Southern California, Mariam Kiran ESnet, LBNL, Cong Wang Renaissance Computing Institute, Anirban Mandal Renaissance Computing Institute, Ewa Deelman University of Southern California, Tim Menzies North Carolina State University, USA Pre-print | ||
03:18 4mTalk | Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data Technical Papers Samuel W. Flint University of Nebraska-Lincoln, Jigyasa Chauhan University of Nebraska-Lincoln, Robert Dyer University of Nebraska-Lincoln Pre-print Media Attached | ||
03:22 4mPaper | On the Naturalness and Localness of Software Logs Technical Papers Pre-print | ||
03:26 4mTalk | How Do Software Developers Use GitHub Actions to Automate Their Workflows? Technical Papers Timothy Kinsman University of Adelaide, Mairieli Wessel University of Sao Paulo, Marco Gerosa Northern Arizona University, USA, Christoph Treude University of Adelaide Pre-print | ||
03:30 30mLive Q&A | Discussions and Q&A Technical Papers |
Go directly to this room on Clowdr