Write a Blog >>
MSR 2021
Mon 17 - Wed 19 May 2021
co-located with ICSE 2021
Tue 18 May 2021 10:16 - 10:19 at MSR Room 2 - ML and Deep Learning Chair(s): Hongyu Zhang

Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. We describe how we built KGTorrent, and provide instructions on how to use it and refresh the collection to keep it up to date. Our vision is that the research community will use KGTorrent to study how data scientists, especially practitioners, use Jupyter Notebook in the wild and identify potential shortcomings to inform the design of its future extensions.

Tue 18 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:00 - 10:50
ML and Deep LearningTechnical Papers / Data Showcase / Registered Reports at MSR Room 2
Chair(s): Hongyu Zhang The University of Newcastle
10:01
4m
Talk
Fast and Memory-Efficient Neural Code Completion
Technical Papers
Alexey Svyatkovskiy Microsoft, Sebastian Lee University of Oxford, Anna Hadjitofi Alan Turing Institute, Maik Riechert Microsoft Research, Juliana Vicente Franco Microsoft Research, Miltiadis Allamanis Microsoft Research, UK
Pre-print Media Attached
10:05
4m
Research paper
Comparative Study of Feature Reduction Techniques in Software Change Prediction
Technical Papers
Ruchika Malhotra Delhi Technological University, Ritvik Kapoor Delhi Technological University, Deepti Aggarwal Delhi Technological University, Priya Garg Delhi Technological University
Pre-print
10:09
4m
Talk
An Empirical Study on the Usage of BERT Models for Code Completion
Technical Papers
Matteo Ciniselli Università della Svizzera Italiana, Nathan Cooper William & Mary, Luca Pascarella Università della Svizzera italiana (USI), Denys Poshyvanyk College of William & Mary, Massimiliano Di Penta University of Sannio, Italy, Gabriele Bavota Software Institute, USI Università della Svizzera italiana
Pre-print
10:13
3m
Talk
ManyTypes4Py: A benchmark Python dataset for machine learning-based type inference
Data Showcase
Amir Mir Delft University of Technology, Evaldas Latoskinas Delft University of Technology, Georgios Gousios Facebook & Delft University of Technology
Pre-print
10:16
3m
Talk
KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle
Data Showcase
Luigi Quaranta University of Bari, Italy, Fabio Calefato University of Bari, Filippo Lanubile University of Bari
10:19
3m
Talk
Exploring the relationship between performance metrics and cost saving potential of defect prediction models
Registered Reports
Steffen Herbold University of Göttingen
Pre-print
10:22
28m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Tue 18 May 2021 10:00 - 10:50 at MSR Room 2 - ML and Deep Learning Chair(s): Hongyu Zhang
Info for room MSR Room 2:

Go directly to this room on Clowdr