PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code
The application of machine learning algorithms for source code has grown in the past years. Since these algorithms are quite sensitive to input data, it is not surprising that researchers experiment with input representations. Nowadays, a popular starting point to represent code is by using abstract syntax trees.
Abstract syntax trees have been used for a long time in various SE domains, and in particular in IDEs. API of modern IDEs provides an ability to manipulate ASTs, traverse them, resolve references between code elements, etc. Such algorithms can enrich AST with new data, and therefore may be useful in ML-based code analysis.
In this work, we present PSIMiner — a tool for processing PSI trees from the IntelliJ Platform. PSI trees contain code’s syntax tree as well as functions to work with it. We use our tool to infer types of identifiers in Java ASTs and extend the code2seq model for the method name prediction problem.
Mon 17 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:00 - 10:50
|PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code|
Egor Spirin JetBrains Research; National Research University Higher School of Economics, Egor Bogomolov JetBrains Research, Vladimir Kovalenko JetBrains Research, Timofey Bryksin JetBrains Research, Saint Petersburg State UniversityPre-print
|Mining DEV for social and technical insights about software development|
Maria Papoutsoglou Aristotle University of Thessaloniki, Johannes Wachs Vienna University of Economics and Business & Complexity Science Hub Vienna, Georgia Kapitsaki University of CyprusPre-print
|TNM: A Tool for Mining of Socio-Technical Data from Git Repositories|
Nikolai Sviridov ITMO University, Mikhail Evtikhiev JetBrains Research, Vladimir Kovalenko JetBrains ResearchPre-print
|Identifying Versions of Libraries used in Stack Overflow Code Snippets|
Ahmed Zerouali Vrije Universiteit Brussel, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit BrusselPre-print Media Attached
|Sampling Projects in GitHub for MSR Studies|
Ozren Dabic Software Institute, Università della Svizzera italiana (USI), Switzerland, Emad Aghajani Software Institute, USI Università della Svizzera italiana, Gabriele Bavota Software Institute, USI Università della Svizzera italianaPre-print
|gambit – An Open Source Name Disambiguation Tool for Version Control Systems|
Christoph Gote Chair of Systems Design, ETH Zurich, Christian Zingg Chair of Systems Design, ETH ZurichPre-print Media Attached
|Discussions and Q&A|