Characterising the Knowledge about Primitive Variables in Java Code Comments (MSR 2021 - Technical Papers)

Who

Mahfouth Alghamdi, Shinpei Hayashi, Takashi Kobayashi, Christoph Treude

Track

MSR 2021 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 May 2021 02:09 - 02:13 at MSR Room 2 - NLP Chair(s): Chunyang Chen

Abstract

Primitive types are fundamental components available in any programming language, which serve as the building blocks of data manipulation. Understanding the role of these types in source code is essential to write software. The most convenient way to express the functionality of these variables in the code is through describing them in comments. Little work has been conducted on how often these variables are documented in code comments and what types of knowledge the comments provide about variables of primitive types. In this paper, we present an approach for detecting primitive variables and their description in comments using lexical matching and semantic matching. We evaluate our approaches by comparing the lexical and semantic matching performance in terms of recall, precision, and F-score, against 600 manually annotated variables from a sample of GitHub projects. The performance of our semantic approach based on F-score was superior compared to lexical matching, 0.986 and 0.942, respectively. We then create a taxonomy of the types of knowledge contained in these comments about variables of primitive types. Our study showed that developers usually documented the variables’ identifiers of a numeric data type with their purpose (69.16%) and concept (72.75%) more than the variables’ identifiers of type String which were less documented with purpose (61.14%) and concept (55.46%). Our findings characterise the current state of the practice of documenting primitive variables and point at areas that are often not well documented, such as the meaning of boolean variables or the purpose of fields and local variables.

Link to Preprint

https://arxiv.org/abs/2103.12291

Mahfouth Alghamdi

The University of Adelaide

Australia

Shinpei Hayashi

Tokyo Institute of Technology

Japan

Takashi Kobayashi

Tokyo Institute of Technology

Japan

Christoph Treude

University of Adelaide

Australia

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

02:00 - 02:50	NLPRegistered Reports / Technical Papers at MSR Room 2 Chair(s): Chunyang Chen Monash University

02:01 4m Talk		Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions Technical Papers Sofonias Yitagesu Tianjin University, Xiaowang Zhang Tianjin University, Zhiyong Feng Tianjin University, Xiaohong Li TianJin University, Zhenchang Xing Australian National University Pre-print
02:05 4m Talk		Attention-based model for predicting question relatedness on Stack Overflow Technical Papers Jiayan Pei South China University of Technology, Yimin Wu South China University of Technology, Research Institute of SCUT in Yangjiang, Zishan Qin South China University of Technology, Yao Cong South China University of Technology, Jingtao Guan Research Institute of SCUT in Yangjiang Pre-print
02:09 4m Talk		Characterising the Knowledge about Primitive Variables in Java Code Comments Technical Papers Mahfouth Alghamdi The University of Adelaide, Shinpei Hayashi Tokyo Institute of Technology, Takashi Kobayashi Tokyo Institute of Technology, Christoph Treude University of Adelaide Pre-print
02:13 4m Talk		Googling for Software Development: What Developers Search For and What They Find Technical Papers Andre Hora UFMG Pre-print Media Attached
02:17 3m Talk		Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews Registered Reports Mohammad Abdul Hadi University of British Columbia, Fatemeh Hendijani Fard University of British Columbia Pre-print
02:20 3m Talk		Cross-status Communication and Project Outcomes in OSS Development–A Language Style Matching Perspective Registered Reports Yisi Han Nanjing University, Zhendong Wang University of California, Irvine, Yang Feng State Key Laboratory for Novel Software Technology, Nanjing University, Zhihong Zhao Nanjing Tech Unniversity, Yi Wang Beijing University of Posts and Telecommunications Pre-print
02:23 27m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Wed 19 May 2021 02:00 - 02:50 at MSR Room 2 - NLP Chair(s): Chunyang Chen

Info for room MSR Room 2:

Go directly to this room on Clowdr