Denchmark: A Bug Benchmark of Deep Learning-related Software
A growing interest in deep learning (DL) has instigated a concomitant rise in DL-related software (DLSW). Therefore, the importance of DLSW quality has emerged as a vital issue. Simultaneously, researchers have found DLSW more complicated than traditional SW and more difficult to debug owing to the black-box nature of DL. These studies indicate the necessity of automatic debugging techniques for DLSW. Although several validated debugging techniques exist for general SW, no such techniques exist for DLSW. There is no standard bug benchmark to validate these automatic debugging techniques. In this study, we introduce a novel bug benchmark for DLSW, Denchmark, consisting of 4,577 bug reports from 193 popular DLSW projects, collected through a systematic dataset construction process. These DLSW projects are further classified into eight categories: framework, platform, engine, compiler, tool, library, DL-based application, and others. All bug reports in Denchmark contain rich textual information and links with bug-fixing commits, as well as three levels of buggy entities, such as files, methods, and lines. Our dataset aims to provide an invaluable starting point for the automatic debugging techniques of DLSW.
Wed 19 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
Go directly to this room on Clowdr