Graph datasets for clustering

Name: Graph datasets for clustering
Creator: Xianbin Lu
Published: 2024-06-20T08:52:02.121Z
Keywords: Clustering

Lu, Xianbin

doi:10.17632/fzjyprkh3h.2

Graph datasets for clustering

Published: 20 June 2024| Version 2 | DOI: 10.17632/fzjyprkh3h.2

Contributor:

Xianbin Lu

Description

The CORA dataset consists of seven distinct categories of scientific papers. It comprises 2708 papers, with each paper represented as a node in the network. There are 5429 citation links, each representing a directed edge from one paper (node) to another, indicating a citation relationship. Each paper is represented by a 1433-dimensional feature vector, where each value is 0 or 1, indicating the absence or presence of specific words from a predefined dictionary. CITE is a citation network dataset consisting of papers from six distinct research categories: Agents, Artificial Intelligence (AI), Databases (DB), Information Retrieval (IR), Machine Learning (ML), and Human-Computer Interaction (HCI). The dataset comprises 3327 academic papers. Each paper is represented by a 3703-dimensional word vector, indicating the absence or presence of specific words from a predefined dictionary. Additionally, the dataset includes 4732 citation links between papers, reflecting the citation relationships among papers. The DBLP dataset is derived from the DBLP computer science bibliography and represents a co-authorship network. Each node corresponds to an author, and an edge between two nodes indicates that the corresponding authors have co-authored at least one paper together. It contains 4058 nodes and 3528 edges, with each author represented by an 334-dimensional feature vector that describes their research areas. The ACM dataset is a paper network, derived from the ACM database. It contains a total of 3025 papers categorized into three categories: database, wireless communication, and data mining. Each paper is represented by a 1870-dimensional vector based on the research area of the article. There is an edge between two papers if they are written by the same author.

Graph datasets for clustering

Description

Files

Categories

Licence