The International Conference on Mining Software Repositories (MSR) has hosted a mining challenge since 2006. With this challenge, we call upon everyone interested to apply their tools to a common dataset. The challenge is for researchers and practitioners to bravely use their mining tools and approaches on a dare.
Mon 17 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
| 02:00 - 02:50 | Opening/AwardsTutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 02:50 - 03:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 03:10 - 04:00 | Welcome Event	 	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  The MSR welcoming sessions will feature informal networking opportunities for newcomers to meet each other, learn about the MSR conference series, and interact with some established MSR veterans. All are welcome! | ||
| 10:00 - 10:50 | Resources for MSR ResearchTechnical Papers / Data Showcase at MSR Room 1  Chair(s): Felipe Ebert Eindhoven University of Technology | ||
| 10:013m Talk | PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code Technical Papers Egor Spirin JetBrains Research; National Research University Higher School of Economics, Egor Bogomolov JetBrains Research, Vladimir Kovalenko JetBrains Research, Timofey Bryksin JetBrains Research, Saint Petersburg State UniversityPre-print | ||
| 10:043m Talk | Mining DEV for social and technical insights about software development Technical Papers Maria Papoutsoglou Aristotle University of Thessaloniki, Johannes Wachs Vienna University of Economics and Business & Complexity Science Hub Vienna, Georgia Kapitsaki University of CyprusPre-print | ||
| 10:073m Talk | TNM: A Tool for Mining of Socio-Technical Data from Git Repositories Technical Papers Nikolai Sviridov ITMO University, Mikhail Evtikhiev JetBrains Research, Vladimir Kovalenko JetBrains ResearchPre-print | ||
| 10:103m Talk | Identifying Versions of Libraries used in Stack Overflow Code Snippets Technical Papers Ahmed Zerouali Vrije Universiteit Brussel, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit BrusselPre-print Media Attached | ||
| 10:133m Talk | Sampling Projects in GitHub for MSR Studies Data Showcase Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italianaPre-print | ||
| 10:163m Talk | gambit – An Open Source Name Disambiguation Tool for Version Control Systems Technical Papers Christoph Gote Chair of Systems Design, ETH Zurich, Christian Zingg Chair of Systems Design, ETH ZurichPre-print Media Attached | ||
| 10:1931m Live Q&A | Discussions and Q&A Technical Papers | ||
| 10:00 - 10:50 | Testing and code reviewTechnical Papers / Data Showcase / Registered Reports at MSR Room 2  Chair(s): Jürgen Cito TU Wien and Facebook | ||
| 10:013m Talk | A Traceability Dataset for Open Source Systems Data Showcase Mouna Hammoudi JOHANNES KEPLER UNIVERSITY LINZ, Christoph Mayr-Dorn Johannes Kepler University, Linz, Atif Mashkoor Johannes Kepler University Linz, Alexander Egyed Johannes Kepler UniversityMedia Attached | ||
| 10:044m Talk | How Java Programmers Test Exceptional Behavior Technical Papers Diego Marcilio USI Università della Svizzera italiana, Carlo A. Furia Università della Svizzera italiana (USI)Pre-print | ||
| 10:084m Talk | An Exploratory Study of Log Placement Recommendation in an Enterprise System Technical Papers Jeanderson Cândido Delft University of Technology, Jan Haesen Adyen N.V., Maurício Aniche Delft University of Technology, Arie van Deursen Delft University of Technology, NetherlandsPre-print Media Attached | ||
| 10:123m Talk | Does Code Review Promote Conformance? A Study of OpenStack Patches Technical Papers Panyawut Sri-iesaranusorn Nara Institute of Science and Technology, Raula Gaikovina Kula NAIST, Takashi Ishio Nara Institute of Science and TechnologyPre-print | ||
| 10:154m Talk | A Replication Study on the Usability of Code Vocabulary in Predicting Flaky Tests Technical Papers Guillaume Haben University of Luxembourg, Sarra Habchi University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, LuxembourgPre-print Media Attached | ||
| 10:193m Talk | On the Use of Mutation in Injecting Test Order-Dependency Registered Reports Sarra Habchi University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, LuxembourgPre-print Media Attached | ||
| 10:2228m Live Q&A | Discussions and Q&A Technical Papers | ||
| 10:50 - 11:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 11:10 - 12:00 | Welcome Event	 	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  The MSR welcoming sessions will feature informal networking opportunities for newcomers to meet each other, learn about the MSR conference series, and interact with some established MSR veterans. All are welcome! | ||
| 17:50 - 18:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 18:10 - 19:00 | |||
Tue 18 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
| 02:00 - 02:50 | |||
| 02:50 - 03:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 03:10 - 04:00 | Time series dataData Showcase / Technical Papers at MSR Room 2  Chair(s): Shane McIntosh University of Waterloo | ||
| 03:113m Talk | AndroCT: Ten Years of App Call Traces in Android Data ShowcasePre-print Media Attached | ||
| 03:144m Talk | Mining Workflows for Anomalous Data Transfers Technical Papers Huy Tu North Carolina State University, USA, George Papadimitriou University of Southern California, Mariam Kiran ESnet, LBNL, Cong Wang Renaissance Computing Institute, Anirban Mandal Renaissance Computing Institute, Ewa Deelman University of Southern California, Tim Menzies North Carolina State University, USAPre-print | ||
| 03:184m Talk | Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data Technical Papers Samuel W. Flint University of Nebraska-Lincoln, Jigyasa Chauhan University of Nebraska-Lincoln, Robert Dyer University of Nebraska-LincolnPre-print Media Attached | ||
| 03:224m Paper | On the Naturalness and Localness of Software Logs Technical PapersPre-print | ||
| 03:264m Talk | How Do Software Developers Use GitHub Actions to Automate Their Workflows? Technical Papers Timothy Kinsman University of Adelaide, Mairieli Wessel University of Sao Paulo, Marco Gerosa Northern Arizona University, USA, Christoph Treude University of AdelaidePre-print | ||
| 03:3030m Live Q&A | Discussions and Q&A Technical Papers | ||
| 10:00 - 10:50 | ML and Deep LearningTechnical Papers / Data Showcase / Registered Reports at MSR Room 2  Chair(s): Hongyu Zhang The University of Newcastle | ||
| 10:014m Talk | Fast and Memory-Efficient Neural Code Completion Technical Papers Alexey Svyatkovskiy Microsoft, Sebastian Lee University of Oxford, Anna Hadjitofi Alan Turing Institute, Maik Riechert Microsoft Research, Juliana Franco Microsoft Research, Miltiadis Allamanis Microsoft Research, UKPre-print Media Attached | ||
| 10:054m Research paper | Comparative Study of Feature Reduction Techniques in Software Change Prediction Technical Papers Ruchika Malhotra Delhi Technological University, Ritvik Kapoor Delhi Technological University, Deepti Aggarwal Delhi Technological University, Priya Garg Delhi Technological UniversityPre-print | ||
| 10:094m Talk | An Empirical Study on the Usage of BERT Models for Code Completion Technical Papers Matteo Ciniselli Università della Svizzera Italiana, Nathan Cooper William & Mary, Luca Pascarella Delft University of Technology, Denys Poshyvanyk College of William & Mary, Massimiliano Di Penta University of Sannio, Italy, Gabriele Bavota Software Institute, USI Università della Svizzera italianaPre-print | ||
| 10:133m Talk | ManyTypes4Py: A benchmark Python dataset for machine learning-based type inference Data Showcase Amir Mir Delft University of Technology, Evaldas Latoskinas Delft University of Technology, Georgios Gousios Facebook & Delft University of TechnologyPre-print | ||
| 10:163m Talk | KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle Data Showcase Luigi Quaranta University of Bari, Italy, Fabio Calefato University of Bari, Filippo Lanubile University of Bari | ||
| 10:193m Talk | Exploring the relationship between performance metrics and cost saving potential of defect prediction models Registered Reports Steffen Herbold University of GöttingenPre-print | ||
| 10:2228m Live Q&A | Discussions and Q&A Technical Papers | ||
| 10:50 - 11:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 11:10 - 12:00 | |||
| 11:1050m Tutorial | PyDriller 1.0 -- Ready to grow together Tutorials Pre-print | ||
| 17:00 - 17:50 | HackathonTechnical Papers / Hackathon at MSR Room 1  Chair(s): Jim Herbsleb Carnegie Mellon University, Audris Mockus The University of Tennessee, Alexander Nolte University of Tartu | ||
| 17:012m | Welcome by the MSR Hackathon Co-Chairs Hackathon Jim Herbsleb Carnegie Mellon University, Audris Mockus The University of Tennessee, Alexander Nolte University of Tartu | ||
| 17:033m Talk | An Exploratory Study of Project Activity Changepoints in Open Source Software Evolution Hackathon | ||
| 17:063m Paper | The Diversity-Innovation Paradox in Open-Source Software Hackathon Mengchen Sam Yong Carnegie Mellon University, Pittsburgh, Pennsylvania, United States, Lavinia Francesca Paganini Federal University of Pernambuco, Huilian Sophie Qiu Carnegie Mellon University, Pittsburgh, Pennsylvania, United States, José Bayoán Santiago Calderón University of Virginia, USADOI Pre-print | ||
| 17:094m Talk | The Secret Life of Hackathon Code Technical Papers Ahmed Samir Imam Mahmoud University of Tartu, Tapajit Dey Lero - The Irish Software Research Centre and University of Limerick, Alexander Nolte University of Tartu, Audris Mockus The University of Tennessee, Jim Herbsleb Carnegie Mellon UniversityPre-print | ||
| 17:133m Talk | Tracing Vulnerable Code Lineage Hackathon David Reid University of Tennessee, Kalvin Eng University of Alberta, Chris Bogart Carnegie Mellon University, Adam Tutko University of Tennessee - KnoxvillePre-print | ||
| 17:163m Talk | Building the Collaboration Graph of Open-Source Software Ecosystem HackathonPre-print | ||
| 17:191m Talk | The Secret Life of Hackathon Code Hackathon Ahmed Samir Imam Mahmoud University of Tartu, Tapajit Dey Lero - The Irish Software Research Centre and University of LimerickPre-print | ||
| 17:2030m Live Q&A | Discussions and Q&A Technical Papers | ||
| 17:00 - 17:50 | |||
| 17:014m Talk | What Code Is Deliberately Excluded from Test Coverage and Why? Technical Papers Andre Hora UFMGPre-print Media Attached | ||
| 17:053m Talk | AndroR2: A Dataset of Manually-Reproduced Bug Reports for Android apps Data Showcase Tyler Wendland University of Minnesota, Jingyang Sun University of Bristish Columbia, Junayed Mahmud George Mason University, S M Hasan Mansur George Mason University, Steven Huang University of Bristish Columbia, Kevin Moran George Mason University, Julia Rubin University of British Columbia, Canada, Mattia Fazzini University of Minnesota | ||
| 17:083m Talk | Apache Software Foundation Incubator Project Sustainability Dataset Data Showcase Likang Yin University of California, Davis, Zhiyuan Zhang University of California, Davis, Qi Xuan Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China, Vladimir Filkov University of California at Davis, USA | ||
| 17:114m Talk | Leveraging Models to Reduce Test Cases in Software Repositories Technical PapersPre-print Media Attached | ||
| 17:154m Talk | Which contributions count? Analysis of attribution in open source Technical Papers Jean-Gabriel Young University of Vermont, amanda casari Open Source Programs Office, Google, Katie McLaughlin Open Source Programs Office, Google, Milo Trujillo University of Vermont, Laurent Hébert-Dufresne University of Vermont, James P. Bagrow University of VermontPre-print Media Attached | ||
| 17:194m Talk | On Improving Deep Learning Trace Analysis with System Call Arguments Technical Papers Quentin Fournier Polytechnique Montréal, Daniel Aloise Polytechnique Montréal, Seyed Vahid Azhari Ciena, François Tetreault CienaPre-print | ||
| 17:2327m Live Q&A | Discussions and Q&A Technical Papers | ||
| 17:50 - 18:10 | Break / Discussion RoomsTutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 18:10 - 19:00 | |||
| 18:1050m Tutorial | Crafting your next MSR paper: suggestions from my (good and bad) experiences Tutorials  Massimiliano Di Penta University of Sannio, ItalyPre-print | ||
Wed 19 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
| 02:00 - 02:50 | |||
| 02:014m Talk | Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions Technical Papers Sofonias Yitagesu Tianjin University, Xiaowang Zhang Tianjin University, Zhiyong Feng Tianjin University, Xiaohong Li TianJin University, Zhenchang Xing Australian National UniversityPre-print | ||
| 02:054m Talk | Attention-based model for predicting question relatedness on Stack Overflow Technical Papers Jiayan Pei South China University of Technology, Yimin Wu South China University of Technology, Research Institute of SCUT in Yangjiang, Zishan Qin South China University of Technology, Yao Cong South China University of Technology, Jingtao Guan Research Institute of SCUT in YangjiangPre-print | ||
| 02:094m Talk | Characterising the Knowledge about Primitive Variables in Java Code Comments Technical Papers Mahfouth Alghamdi The University of Adelaide, Shinpei Hayashi Tokyo Institute of Technology, Takashi Kobayashi Tokyo Institute of Technology, Christoph Treude University of AdelaidePre-print | ||
| 02:134m Talk | Googling for Software Development: What Developers Search For and What They Find Technical Papers Andre Hora UFMGPre-print Media Attached | ||
| 02:173m Talk | Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews Registered Reports Mohammad Abdul Hadi University of British Columbia, Fatemeh Hendijani Fard University of British ColumbiaPre-print | ||
| 02:203m Talk | Cross-status Communication and Project Outcomes in OSS Development–A Language Style Matching Perspective Registered Reports Yisi Han Nanjing University, Zhendong Wang University of California, Irvine, Yang Feng State Key Laboratory for Novel Software Technology, Nanjing University, Zhihong Zhao Nanjing Tech Unniversity, Yi Wang Beijing University of Posts and TelecommunicationsPre-print | ||
| 02:2327m Live Q&A | Discussions and Q&A Technical Papers | ||
| 02:50 - 03:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 03:10 - 04:00 | |||
| 03:1050m Tutorial | Elasticsearch Full-Text Search Internals Tutorials  Philipp Krenn Elastic | ||
| 10:00 - 10:50 | DatasetsData Showcase / Technical Papers at MSR Room 1  Chair(s): Sridhar Chimalakonda Indian Institute of Technology Tirupati | ||
| 10:013m Talk | AndroidCompass: A Dataset of Android Compatibility Checks in Code Repositories Data Showcase Sebastian Nielebock Otto-von-Guericke University Magdeburg, Germany, Paul Blockhaus Otto-von-Guericke-University Magdeburg, Germany, Jacob Krüger Otto von Guericke University Magdeburg, Frank Ortmeier Otto-von-Guericke-University Magdeburg, Faculty of Computer Science, Chair of Software EngineeringPre-print Media Attached | ||
| 10:043m Talk | GE526: A Dataset of Open Source Game Engines Data Showcase Dheeraj Vagavolu Indian Institute of Technology Tirupati, Vartika Agrahari Indian Institute of Technology Tirupati, Sridhar Chimalakonda Indian Institute of Technology Tirupati, Akhila Sri Manasa Venigalla IIT Tirupati, India | ||
| 10:073m Talk | Andromeda: A Dataset of Ansible Galaxy Roles and Their Evolution Data Showcase Ruben Opdebeeck Vrije Universiteit Brussel, Ahmed Zerouali Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel | ||
| 10:103m Talk | The Wonderless Dataset for Serverless Computing Data ShowcasePre-print | ||
| 10:133m Talk | DUETS: A Dataset of Reproducible Pairs of Java Library-Clients Data Showcase Thomas Durieux KTH Royal Institute of Technology, Sweden, César Soto-Valero KTH Royal Institute of Technology, Benoit Baudry KTH Royal Institute of TechnologyPre-print | ||
| 10:163m Talk | EQBENCH: A Dataset of Equivalent and Non-equivalent Program Pairs Data Showcase Sahar Badihi University of British Columbia, Canada, Yi Li Nanyang Technological University, Julia Rubin University of British Columbia, Canada | ||
| 10:1931m Live Q&A | Discussions and Q&A Technical Papers | ||
| 10:50 - 11:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 11:10 - 12:00 | Mini-Keynotes	 	Keynotes / Technical Papers at MSR Room 1  Chair(s): Kelly Blincoe University of Auckland | ||
| 11:1012m Keynote | Code review at speed: How can we use data to help developers do code review faster? Keynotes Patanamon Thongtanunam The University of Melbourne | ||
| 11:2212m Keynote | To Sustain a Smart, Dependent and Social Software Ecosystem Keynotes Raula Gaikovina Kula NAIST | ||
| 11:348m Keynote | Measure what matters – but don’t be creepy: The ethics of using data about people Keynotes | ||
| 11:4218m | Discussions and Q&A Keynotes | ||
| 17:00 - 17:50 | Energy, logging, and APIsTechnical Papers at MSR Room 1  Chair(s): Akond Rahman Tennessee Tech University | ||
| 17:013m Talk | S3M: Siamese Stack (Trace) Similarity Measure Technical Papers Aleksandr Khvorov JetBrains, ITMO University, Roman Vasiliev JetBrains, George Chernishev Saint-Petersburg State University, Irving Muller Rodrigues Polytechnique Montreal, Montreal, Canada, Dmitrij Koznov Saint-Petersburg State University, Nikita Povarov JetBrainsPre-print | ||
| 17:044m Talk | Mining the ROS ecosystem for Green Architectural Tactics in Robotics and an Empirical Evaluation Technical Papers Ivano Malavolta Vrije Universiteit Amsterdam, Katerina Chinnappan Vrije Universiteit Amsterdam, Stan Swanborn Vrije Universiteit Amsterdam, The Netherlands, Grace Lewis Carnegie Mellon Software Engineering Institute, Patricia Lago Vrije Universiteit AmsterdamPre-print Media Attached | ||
| 17:084m Talk | Mining Energy-Related Practices in Robotics Software Technical Papers Michel Albonico UTFPR, Ivano Malavolta Vrije Universiteit Amsterdam, Gustavo Pinto Federal University of Pará, Emitzá Guzmán Vrije Universiteit Amsterdam, Katerina Chinnappan Vrije Universiteit Amsterdam, Patricia Lago Vrije Universiteit AmsterdamPre-print Media Attached | ||
| 17:123m Talk | Mining API Interactions to Analyze Software Revisions for the Evolution of Energy Consumption Technical Papers Andreas Schuler University of Applied Sciences Upper Austria, Gabriele Anderst-Kotsis Johannes Kepler University, Linz, AustriaPre-print | ||
| 17:154m Talk | Can I Solve it? Identifying the APIs required to complete OSS tasks Technical Papers Fabio Marcos De Abreu Santos Northern Arizona University, USA, Igor Scaliante Wiese Federal University of Technology – Paraná - UTFPR, Bianca Trinkenreich Northern of Arizona Univeristy, Igor Steinmacher Northern Arizona University, USA, Anita Sarma Oregon State University, Marco Gerosa Northern Arizona University, USAPre-print | ||
| 17:1931m Live Q&A | Discussions and Q&A Technical Papers | ||
| 17:00 - 17:50 | Change Management and AnalysisTechnical Papers / Registered Reports at MSR Room 2  Chair(s): Sarah Nadi University of Alberta | ||
| 17:014m Talk | Studying the Change Histories of Stack Overflow and GitHub Snippets Technical PapersPre-print Media Attached | ||
| 17:054m Talk | Learning Off-By-One Mistakes: An Empirical Study Technical Papers Hendrig Sellik Delft University of Technology, Onno van Paridon Adyen N.V., Georgios Gousios Facebook & Delft University of Technology, Maurício Aniche Delft University of TechnologyPre-print | ||
| 17:094m Talk | Predicting Design Impactful Changes in Modern Code Review: A Large-Scale Empirical Study Technical Papers Anderson Uchôa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Caio Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Willian Oizumi Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Wesley Assunção Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Silvia Regina Vergilio Federal University of Paraná, Juliana Alves Pereira PUC-Rio, Anderson Oliveira PUC-Rio, Alessandro Garcia PUC-RioPre-print | ||
| 17:134m Talk | Rollback Edit Inconsistencies in Developer Forum Technical Papers Saikat Mondal University of Saskatchewan, Gias Uddin University of Calgary, Canada, Chanchal K. Roy University of SaskatchewanPre-print | ||
| 17:173m Talk | Assessing the Exposure of Software Changes: The DiPiDi Approach Registered ReportsPre-print | ||
| 17:204m Talk | On the Use of Dependabot Security Pull Requests Technical Papers Mahmoud Alfadel Concordia Univerisity, Diego Elias Costa Concordia University, Canada, Emad Shihab Concordia University, Mouafak Mkhallalati Concordia UniversityPre-print | ||
| 17:2426m Live Q&A | Discussions and Q&A Technical Papers | ||
| 17:50 - 18:10 | Break / Discussion Rooms	Tutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 18:10 - 19:00 | ClosingTutorials  / MIP Award  / FOSS Award / content / Mining Challenge / Hackathon / MSR Awards / Registered Reports / Data Showcase / Shadow PC / Keynotes / Technical Papers at MSR Room 1  | ||
| 18:1123m Awards | MIP Award 2011 MIP Award  | ||
| 18:3415m Live Q&A | Discussions and Q&A Technical Papers | ||
| 18:498m | MSR 2022 Technical Papers | ||
| 18:573m | Closing by the General Chair and the Program Co-Chair Technical Papers | ||
Accepted Papers
Call for Mining Challenge Papers
This year, the mining challenge is about ManySStuBs4J, a dataset of fixes to Java simple bugs. The dataset focuses on “SStuBs“, i.e., simple stupid bugs. SStuBs are bugs that appear on a single statement and the corresponding fix is within that statement.
The dataset was collected to facilitate the study of such bugs towards addressing empirical questions about these bugs and related program repair techniques. The included fixes are classified — where possible — into one of 16 syntactic templates, such as accidentally swapped method arguments, incorrect operator usage, or wrong variable usage. For each bug in the challenge dataset, we include the provenance of the bug (GitHub project, commit SHA), the diff between the buggy and fixed version of the files, and an annotation of which (if any) of the 16 SStuBs patterns it matches.
In this challenge, participants can use two variants of the dataset. A small version of the dataset that contains 25,539 SStuBs changes mined from 100 Java projects in GitHub that use Maven. These projects can be built and tested (if they contain tests) in an automated fashion. The large version of the data set contains 153,652 SStuBs mined from 1,000 popular open-source Java projects in GitHub, but not all of these projects use Maven, so it may not be feasible to build and test them.
The challenge is open-ended: participants can choose the research questions that they find most interesting. Our suggestions include:
- Bug detection: What methods are most effective for locating SStuBs?
- Program repair: What methods are most effective for proposing repairs to SStuBs? This could be a separate step or combined.
- Why SStubS occur: What context do SStuBs appear in? What are common root causes? Are there characteristics of the software project, the development team, or the individual source file that make SStuBs more likely to appear?
- What encourages fixing SStuBs: What factors characterize SStuBs that are more quickly found and fixed?
- Testing: How is testing related to SStuBs? Do projects with more or better unit tests have fewer or easier-to-fix SStuBs?
These are just some of the questions that could be answered using the ManySStuBs4J dataset. Participants may combine the SStuBs data with other code or metadata: for example, data about project popularity or contributor experience. We will not provide such data, but participants are encouraged to “bring their own data” (BYOD) by joining SStuBs data with data from other public, readily available, sources such as GHTorrent or GitHub. We ask the participants to carefully consider any ethical implications that stem from using other sources of data, such as the use of personally identifiable information.
How to Participate in the Challenge
First, familiarize yourself with the ManySStuBs4J dataset:
- Read the MSR 2020 paper about ManySStuBs4J.
- Study the download page of ManySStuBs4J, which includes the most recent version and links to download the dataset as well as the documentation page.
- Create a new issue here in case you have problems with the dataset or want to suggest ideas for improvements.
Finally, use the dataset to answer your research questions, report your findings in a four-page challenge paper (see information below), submit your abstract before January 19, 2021, and your final paper before January 26, 2021. If your paper is accepted, present your results at MSR 2021 in Madrid, Spain!
Join us on Slack for informal discussions among participants at http://msr2021challenge.slack.com/
Submission
A challenge paper should describe the results of your work by providing an introduction to the problem you address and why it is worth studying, the version of the dataset you used, the approach and tools you used, your results and their implications, and conclusions. Make sure your report highlights the contributions and the importance of your work. See also our open science policy regarding the publication of software and additional data you used for the challenge.
Challenge papers must not exceed 4 pages plus 1 additional page only with references and must conform to the MSR 2021 format and submission guidelines. Each submission will be reviewed by at least three members of the program committee. Submissions should follow the IEEE Conference Proceedings Formatting Guidelines, with title in 24pt font and full text in 10pt type. LaTEX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf option.
IMPORTANT: The mining challenge track of MSR 2021 follows the double-blind submission model. Submissions should not reveal the identity of the authors in any way. This means that authors should:
- leave out author names and affiliations from the body and metadata of the submitted pdf
- ensure that any citations to related work by themselves are written in the third person, for example “the prior work of XYZ” as opposed to “our prior work [2]”
- not refer to their personal, lab or university website; similarly, care should be taken with personal accounts on GitHub, BitBucket, Google Drive, etc.
- not upload unblinded versions of their paper on archival websites during bidding/reviewing, however uploading unblinded versions prior to submission is allowed and sometimes unavoidable (e.g., thesis)
Authors having further questions on double blind reviewing are encouraged to contact the Mining Challenge Chairs via email.
Papers must be submitted electronically through HotCRP, should not have been published elsewhere, and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policy and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship.
Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each accepted paper is expected to register and present the results at MSR 2021. All accepted contributions will be published in the electronic conference proceedings.
This year’s mining challenge can be cited as:
title={MSR Mining Challenge: The Life of Simple, Stupid Bugs (SStubS)},
author={Karampatsis, Rafael-Michael and Allamanis, Miltiadis and Sutton, Charles},
year={2021},
booktitle={Proceedings of the International Conference on Mining Software Repositories (MSR 2021)},
}
The dataset itself can be cited as:
@inproceedings{sstubs,
title={How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset},
author={Karampatsis, Rafael-Michael and Sutton, Charles},
year={2020},
booktitle={Proceedings of the International Conference on Mining Software Repositories (MSR 2020)},
preprint={https://arxiv.org/abs/1905.13334}
}
Open Science Policy
Openness in science is key to fostering progress via transparency, reproducibility and replicability. Our steering principle is that all research output should be accessible to the public and that empirical studies should be reproducible. In particular, we actively support the adoption of open data and open source principles. To increase reproducibility and replicability, we encourage all contributing authors to disclose:
- the source code of the software they used to retrieve and analyze the data
- the (anonymized and curated) empirical data they retrieved in addition to the SOTorrent dataset
- a document with instructions for other researchers describing how to reproduce or replicate the results
Already upon submission, authors can privately share their anonymized data and software on archives such as Zenodo or Figshare (tutorial available here). Zenodo accepts up to 50GB per dataset (more upon request). There is no need to use Dropbox or Google Drive. After acceptance, data and software should be made public so that they receive a DOI and become citable. Zenodo and Figshare accounts can easily be linked with GitHub repositories to automatically archive software releases. In the unlikely case that authors need to upload terabytes of data, Archive.org may be used.
We recognise that anonymising artifacts such as source code is more difficult than preserving anonymity in a paper. We ask authors to take a best effort approach to not reveal their identities. We will also ask reviewers to avoid trying to identify authors by looking at commit histories and other such information that is not easily anonymised. Authors wanting to share GitHub repositories may want to look into using https://anonymous.4open.science/ which is an open source tool that helps you to quickly double-blind your repository.
We encourage authors to self-archive pre- and postprints of their papers in open, preserved repositories such as arXiv.org. This is legal and allowed by all major publishers including ACM and IEEE and it lets anybody in the world reach your paper. Note that you are usually not allowed to self-archive the PDF of the published article (that is, the publisher proof or the Digital Library version).
Please note that the success of the open science initiative depends on the willingness (and possibilities) of authors to disclose their data and that all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. We encourage authors who cannot disclose industrial or otherwise non-public data, for instance due to non-disclosure agreements, to provide an explicit (short) statement in the paper.
Best Mining Challenge Paper Award
As mentioned above, all submissions will undergo the same review process independent of whether or not they disclose their analysis code or data. However, only accepted papers for which code and data are available on preserved archives, as described in the open science policy, will be considered by the program committee for the best mining challenge paper award.
Best Student Presentation Award
Like in the previous years, there will be a public voting during the conference to select the best mining challenge presentation. This award often goes to authors of compelling work who present an engaging story to the audience. Only students can compete for this award.
Call for Mining Challenge Proposals
One of the secret ingredients behind the success of the International Conference on Mining Software Repositories (MSR) is its annual Mining Challenge, in which MSR participants can showcase their techniques, tools and creativity on a common data set. In true MSR fashion, this data set is a “real” data set contributed by researchers in the community, solicited through an open call. There are many benefits of sharing a data set for the MSR Mining Challenge. The selected challenge proposal explaining the data set will appear in the MSR 2021 proceedings, and the challenge papers using the data set will be required to cite the challenge proposal or an existing paper of the researchers about the selected data set. Furthermore, the authors of the data set will become the official 2021 Mining Challenge Chairs, responsible for the reviewing process (e.g., composing a Challenge PC, managing the submissions and review assignments, etc.). Finally, it is not uncommon for challenge data sets to feature in MSR and other publications well after the conference is finished! If you would like to compete for a chance to have your data set featured in the 2021 MSR Mining Challenge and all the benefits that come with it, please submit a 1-page proposal with up to 3 pages of appendix at https://msr2021.hotcrp.com/, containing the following information:
- Title of data set.
- What does your data set contain?
- How large is it?
- How accessible is it and how can the data be obtained?
- How representative is it?
- Does it require specialized tools to mine it?
- What would challenge participants need to work with the data set?
- What kind of questions do you expect challenge participants to answer?
- A link to a small sample of the data (e.g., via dropbox, github, etc.).
The above conform to the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options). For more information see here: https://www.ieee.org/conferences/publishing/templates.html Once the winning proposal will be selected, its authors will become the MSR 2021 Challenge Chairs and will be responsible for choosing the MSR 2021 Mining Challenge program committee. The major deadline for this is the 15th of September 2020, at which time the Challenge CFP along with the PC will be announced, and the full challenge data set will be publicly released. By making the challenge data set available by early fall, we hope that many student teams will be able to use the challenge data set for their graduate class projects.
Timeline:
- Deadline for proposals: August 14th, 2020
- Proposal accepted and all submitters notified: August 21st, 2020
- Challenge CFP: September 16th, 2020
- Challenge PC formed: September 16th, 2020
- Challenge data made available: September 14th, 2020
- Challenge papers deadline: Feb 19th, 2021 (tentative)





















