Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews
Context: Mobile app reviews written by users on app stores or social media are significant resources for app developers. Analyzing app reviews have proved to be useful for many areas of software engineering (e.g. requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. identifying bugs versus usability issues or sentiment), new datasets should be labeled, which prevents the extensibility of the developed models for new desired classes/tasks in practice. Recent pre-trained neural language models (PTM) are trained on large corpora in an unsupervised manner and have found success in solving similar Natural Language Processing problems. However, the applicability of PTMs is not explored for app review classification.
Objective: We investigate the benefits of PTMs for app review classification compared to the existing models, as well as the transferability of PTMs in multiple settings.
Method: We empirically study the accuracy and time efficiency of PTMs compared to prior approaches using six datasets from literature. In addition, we investigate the performance of the PTMs trained on app reviews (i.e. domain-specific PTMs). We set up different studies to evaluate PTMs in multiple settings: binary vs. multi-class classification, zero-shot classification (when new labels are introduced to the model), multi-task setting, and classification of reviews from different resources. The datasets are manually labeled app review datasets from Google Play Store, Apple App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used and we will report the time required for training and prediction with the models.
Wed 19 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
02:00 - 02:50 | |||
02:01 4mTalk | Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions Technical Papers Sofonias Yitagesu Tianjin University, Xiaowang Zhang Tianjin University, Zhiyong Feng Tianjin University, Xiaohong Li TianJin University, Zhenchang Xing Australian National University Pre-print | ||
02:05 4mTalk | Attention-based model for predicting question relatedness on Stack Overflow Technical Papers Jiayan Pei South China University of Technology, Yimin Wu South China University of Technology, Research Institute of SCUT in Yangjiang, Zishan Qin South China University of Technology, Yao Cong South China University of Technology, Jingtao Guan Research Institute of SCUT in Yangjiang Pre-print | ||
02:09 4mTalk | Characterising the Knowledge about Primitive Variables in Java Code Comments Technical Papers Mahfouth Alghamdi The University of Adelaide, Shinpei Hayashi Tokyo Institute of Technology, Takashi Kobayashi Tokyo Institute of Technology, Christoph Treude University of Adelaide Pre-print | ||
02:13 4mTalk | Googling for Software Development: What Developers Search For and What They Find Technical Papers Andre Hora UFMG Pre-print Media Attached | ||
02:17 3mTalk | Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews Registered Reports Mohammad Abdul Hadi University of British Columbia, Fatemeh Hendijani Fard University of British Columbia Pre-print | ||
02:20 3mTalk | Cross-status Communication and Project Outcomes in OSS DevelopmentāA Language Style Matching Perspective Registered Reports Yisi Han Nanjing University, Zhendong Wang University of California, Irvine, Yang Feng State Key Laboratory for Novel Software Technology, Nanjing University, Zhihong Zhao Nanjing Tech Unniversity, Yi Wang Beijing University of Posts and Telecommunications Pre-print | ||
02:23 27mLive Q&A | Discussions and Q&A Technical Papers |
Go directly to this room on Clowdr