An Exploratory Study of Log Placement Recommendation in an Enterprise System (MSR 2021 - Technical Papers)

Who

Jeanderson Cândido, Jan Haesen, Maurício Aniche, Arie van Deursen

Track

MSR 2021 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 May 2021 10:08 - 10:12 at MSR Room 2 - Testing and code review Chair(s): Jürgen Cito

Abstract

Logging is a development practice that plays an important role in the operations and monitoring of complex systems. Developers place log statements in the source code and use log data to understand how the system behaves in production. Unfortunately, anticipating where to log during development is challenging. Previous studies show the feasibility of leveraging machine learning to recommend log placement despite the data imbalance since logging is a fraction of the overall code base. However, it remains unknown how those techniques apply to an industry setting, and little is known about the effect of imbalanced data and sampling techniques. In this paper, we study the log placement problem in the code base of Adyen, a large-scale payment company. We analyze 34,526 Java files and 309,527 methods that sum up +2M SLOC. We systematically measure the effectiveness of five models based on code metrics, explore the effect of sampling techniques, understand which features models consider to be relevant for the prediction, and evaluate whether we can exploit 388,086 methods from 29 Apache projects to learn where to log in an industry setting. Our best performing model achieves 79% of balanced accuracy, 81% of precision, 60% of recall. While sampling techniques improve recall, they penalize precision at a prohibitive cost. Experiments with open-source data yield under-performing models over Adyen’s test set; nevertheless, they are useful due to their low rate of false positives. Our supporting scripts and tools are available to the community.

Link to Preprint

https://arxiv.org/abs/2103.01755

Jeanderson Cândido

Delft University of Technology

Netherlands

Jan Haesen

Adyen N.V.

Maurício Aniche

Delft University of Technology

Netherlands

Arie van Deursen

Delft University of Technology, Netherlands

Netherlands

An Exploratory Study of Log Placement Recommendation in an Enterprise System

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 17 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:00 - 10:50	Testing and code reviewTechnical Papers / Data Showcase / Registered Reports at MSR Room 2 Chair(s): Jürgen Cito TU Wien and Facebook

10:01 3m Talk		A Traceability Dataset for Open Source Systems Data Showcase Mouna Hammoudi JOHANNES KEPLER UNIVERSITY LINZ, Christoph Mayr-Dorn Johannes Kepler University, Linz, Atif Mashkoor Johannes Kepler University Linz, Alexander Egyed Johannes Kepler University Media Attached
10:04 4m Talk		How Java Programmers Test Exceptional Behavior Technical Papers Diego Marcilio USI Università della Svizzera italiana, Carlo A. Furia Università della Svizzera italiana (USI) Pre-print
10:08 4m Talk		An Exploratory Study of Log Placement Recommendation in an Enterprise System Technical Papers Jeanderson Cândido Delft University of Technology, Jan Haesen Adyen N.V., Maurício Aniche Delft University of Technology, Arie van Deursen Delft University of Technology, Netherlands Pre-print Media Attached
10:12 3m Talk		Does Code Review Promote Conformance? A Study of OpenStack Patches Technical Papers Panyawut Sri-iesaranusorn Nara Institute of Science and Technology, Raula Gaikovina Kula NAIST, Takashi Ishio Nara Institute of Science and Technology Pre-print
10:15 4m Talk		A Replication Study on the Usability of Code Vocabulary in Predicting Flaky Tests Technical Papers Guillaume Haben University of Luxembourg, Sarra Habchi University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg Pre-print Media Attached
10:19 3m Talk		On the Use of Mutation in Injecting Test Order-Dependency Registered Reports Sarra Habchi University of Luxembourg, Luxembourg, Maxime Cordy University of Luxembourg, Luxembourg, Mike Papadakis University of Luxembourg, Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg Pre-print Media Attached
10:22 28m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Mon 17 May 2021 10:00 - 10:50 at MSR Room 2 - Testing and code review Chair(s): Jürgen Cito

Info for room MSR Room 2:

Go directly to this room on Clowdr