Write a Blog >>
MSR 2021
Mon 17 - Wed 19 May 2021
co-located with ICSE 2021
Tue 18 May 2021 03:14 - 03:18 at MSR Room 2 - Time series data Chair(s): Shane McIntosh

Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the workflow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network performance is crucial to ensure reliable and efficient workflow execution. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection.

Tue 18 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:10 - 04:00
Time series dataData Showcase / Technical Papers at MSR Room 2
Chair(s): Shane McIntosh University of Waterloo
03:11
3m
Talk
AndroCT: Ten Years of App Call Traces in Android
Data Showcase
Wen Li , Xiaoqin Fu Washington State University, Haipeng Cai Washington State University, USA
Pre-print Media Attached
03:14
4m
Talk
Mining Workflows for Anomalous Data Transfers
Technical Papers
Huy Tu North Carolina State University, USA, George Papadimitriou University of Southern California, Mariam Kiran ESnet, LBNL, Cong Wang Renaissance Computing Institute, Anirban Mandal Renaissance Computing Institute, Ewa Deelman University of Southern California, Tim Menzies North Carolina State University, USA
Pre-print
03:18
4m
Talk
Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data
Technical Papers
Samuel W. Flint University of Nebraska-Lincoln, Jigyasa Chauhan University of Nebraska-Lincoln, Robert Dyer University of Nebraska-Lincoln
Pre-print Media Attached
03:22
4m
Paper
On the Naturalness and Localness of Software Logs
Technical Papers
Sina Gholamian University of Waterloo, Paul A. S. Ward University of Waterloo
Pre-print
03:26
4m
Talk
How Do Software Developers Use GitHub Actions to Automate Their Workflows?
Technical Papers
Timothy Kinsman University of Adelaide, Mairieli Wessel University of Sao Paulo, Marco Gerosa Northern Arizona University, USA, Christoph Treude University of Adelaide
Pre-print
03:30
30m
Live Q&A
Discussions and Q&A
Technical Papers


Information for Participants
Tue 18 May 2021 03:10 - 04:00 at MSR Room 2 - Time series data Chair(s): Shane McIntosh
Info for room MSR Room 2:

Go directly to this room on Clowdr