PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code
The application of machine learning algorithms for source code has grown in the past years. Since these algorithms are quite sensitive to input data, it is not surprising that researchers experiment with input representations. Nowadays, a popular starting point to represent code is by using abstract syntax trees.
Abstract syntax trees have been used for a long time in various SE domains, and in particular in IDEs. API of modern IDEs provides an ability to manipulate ASTs, traverse them, resolve references between code elements, etc. Such algorithms can enrich AST with new data, and therefore may be useful in ML-based code analysis.
In this work, we present PSIMiner — a tool for processing PSI trees from the IntelliJ Platform. PSI trees contain code’s syntax tree as well as functions to work with it. We use our tool to infer types of identifiers in Java ASTs and extend the code2seq model for the method name prediction problem.
Mon 17 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:00 - 10:50 | Resources for MSR ResearchTechnical Papers / Data Showcase at MSR Room 1 Chair(s): Felipe Ebert Eindhoven University of Technology | ||
10:01 3mTalk | PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code Technical Papers Egor Spirin JetBrains Research; National Research University Higher School of Economics, Egor Bogomolov JetBrains Research, Vladimir Kovalenko JetBrains Research, Timofey Bryksin JetBrains Research, Saint Petersburg State University Pre-print | ||
10:04 3mTalk | Mining DEV for social and technical insights about software development Technical Papers Maria Papoutsoglou Aristotle University of Thessaloniki, Johannes Wachs Vienna University of Economics and Business & Complexity Science Hub Vienna, Georgia Kapitsaki University of Cyprus Pre-print | ||
10:07 3mTalk | TNM: A Tool for Mining of Socio-Technical Data from Git Repositories Technical Papers Nikolai Sviridov ITMO University, Mikhail Evtikhiev JetBrains Research, Vladimir Kovalenko JetBrains Research Pre-print | ||
10:10 3mTalk | Identifying Versions of Libraries used in Stack Overflow Code Snippets Technical Papers Ahmed Zerouali Vrije Universiteit Brussel, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel Pre-print Media Attached | ||
10:13 3mTalk | Sampling Projects in GitHub for MSR Studies Data Showcase Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print | ||
10:16 3mTalk | gambit – An Open Source Name Disambiguation Tool for Version Control Systems Technical Papers Christoph Gote Chair of Systems Design, ETH Zurich, Christian Zingg Chair of Systems Design, ETH Zurich Pre-print Media Attached | ||
10:19 31mLive Q&A | Discussions and Q&A Technical Papers |
Go directly to this room on Clowdr