Write a Blog >>
MSR 2021
Mon 17 - Wed 19 May 2021
co-located with ICSE 2021
Tue 18 May 2021 03:18 - 03:22 at MSR Room 2 - Time series data Chair(s): Shane McIntosh

Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantified how often such data is dirty. Depending on the research task and method used, including such dirty data could affect the research results. This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 690 technical track and data papers published in MSR 2004–2020, we saw 35% of papers utilized time-based data. We also used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty commit timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data sources.

Conference Day
Tue 18 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:10 - 04:00
Time series dataData Showcase / Technical Papers at MSR Room 2
Chair(s): Shane McIntoshUniversity of Waterloo
03:11
3m
Talk
AndroCT: Ten Years of App Call Traces in Android
Data Showcase
Wen Li, Xiaoqin FuWashington State University, Haipeng CaiWashington State University, USA
Pre-print Media Attached
03:14
4m
Talk
Mining Workflows for Anomalous Data Transfers
Technical Papers
Huy TuNorth Carolina State University, USA, George PapadimitriouUniversity of Southern California, Mariam KiranESnet, LBNL, Cong WangRenaissance Computing Institute, Anirban MandalRenaissance Computing Institute, Ewa DeelmanUniversity of Southern California, Tim MenziesNorth Carolina State University, USA
Pre-print
03:18
4m
Talk
Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data
Technical Papers
Samuel W. FlintUniversity of Nebraska-Lincoln, Jigyasa ChauhanUniversity of Nebraska-Lincoln, Robert DyerUniversity of Nebraska-Lincoln
Pre-print Media Attached
03:22
4m
Paper
On the Naturalness and Localness of Software Logs
Technical Papers
Sina GholamianUniversity of Waterloo, Paul A. S. WardUniversity of Waterloo
Pre-print
03:26
4m
Talk
How Do Software Developers Use GitHub Actions to Automate Their Workflows?
Technical Papers
Timothy KinsmanUniversity of Adelaide, Mairieli WesselUniversity of Sao Paulo, Marco GerosaNorthern Arizona University, USA, Christoph TreudeUniversity of Adelaide
Pre-print
03:30
30m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Info for MSR Room 2: