Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks - Wireshark

Published: 19 January 2022| Version 2 | DOI: 10.17632/ymtf9znmfz.2
Sefa Eren Şahin,


## Wireshark Vulnerability Prediction Dataset This dataset is constructed by a team of researchers in Istanbul Techical University Faculty of Computer and Informatics, and used in the paper entitled as "Predicting Vulnerability Inducing Function Versions Using Node Embeddings and Graph Neural Networks". Please see the GitHub repository for more details on usage. This dataset consists of two main parts: * AST dumps which can be used as inputs for any Machine Learning model. (ast_input) * Wireshark file changes and bugs (file_changes_and_bugs) ### ast_input asp_input folder contains three files: * This file is a compressed version of AST dumps in Python pickle format. You should use python pickle library to unpickle and use the data. * node_embeddings_by_kind.pkl: Embedding vectors corresponding to AST node kinds in python pickle format. * token_id_vocabulary.pkl: Map of token ids and their corresponding tokens in python pickle format. ### file_changes_and_bugs file_changes_and_bugs folder consists of five files: * wireshark_file_changes.csv: list of file changes made in wireshark repository. file changes are basicly commit-file pairs. * wireshark_cve_bug_matching.csv: this entity maps CVE entries to bug ids in wireshark bug repository. This is scraped from * additional_bugs.csv: additional security related bugs that our team manually identified by investigating security advisories and bug reports. * wireshark_bug_commit_matching.csv: this entity maps security bugs (vulnerabilities) to commits in wireshark source code repositry. * wireshark_bug_inducing_file_changes.csv: this entity maps vulnerabilities in wireshark source files in terms of in which commit a vulnerability is induced and fixed.



Istanbul Teknik Universitesi Bilgisayar Muhendisligi Bolumu


Software Security, Natural Language Processing, Machine Learning, Software Development, Open Source Software, Deep Learning, Graph Convolutional Network