Synthetic dynamic attributed networks with ground-truth information

Published: 13 March 2023| Version 2 | DOI: 10.17632/fkz6mbpr2z.2
Renny Marquez


1. Synthetic network 1: graphs were built with 200 nodes and 20 snapshots. In the initialization, the nodes are divided into two groups, each one with 100 nodes. Then, at a randomly selected time period, 40 % of the nodes are chosen to migrate to a new community in Dataset 1, and 80 % in Dataset 2. The membership of each node is chosen according to a stochastic block model, where nodes within the same community are connected with probability 0.3, and the edges between communities are drawn with a probability of 0.1. For the generation of the attributes, we propose 4 cases using a univariate or multivariate normal distribution with a standard deviation of 0.1. In Case 1, a three-variate normal distribution is used to generate three attributes, where only the first attribute contributes to identifying the groups. In Case 2, we propose to use only one meaningful attribute. In Case 3, there are three irrelevant random attributes. In Case 4, there is one irrelevant random attribute. This data correspond to Data17 (change of 40 %), with R0M0 (one relevant attribute), R0M1 (three attributtes with one relevant), R1M0 (one irrelevant attribute), R1M1 (three irrelevant attributes). Data18 (change of 80 %) as a similar interpretation for attributes. 2. Synthetic network 2: we generated 10 timestamps with communities that grow and shrink. We designed 3 datasets with 3 groups of 80, 90 and 100 nodes. The network structure is generated by a degree-corrected stochastic block model. Three types of attributed networks were proposed, with different original link probabilities, to assess strongly assortative structures, weakly assortative structures, and disassortative structures. This data correspond to Data107PV0.21, Data108PV0.1, and Data109PV0.19. 3. Synthetic network 3: we design four datasets with different evolution of links over time, consisting of attributed networks of 100 nodes and 60 time points were generated. Intra- and inter-cluster edges are selected from a truncated Gaussian distribution in the range [0; 1]. Changes in the structure of the networks are generated between time t = 20 to t = 21 and t = 40 to t = 41. Attributes were generated similarly to Synthetic network 1. This data correspond to Data19, Data20, Data21, and Data22, with similar interpretation for attributes as in Synthetic network 1. 4. Synthetic network 4: We use the synthetic benchmark DANCer to create 12 attributed graphs with undirected edges that can change over time, where nodes are grouped into densely connected sets, relatively homogeneous according to the attributes. The number of nodes start at 1000 nodes, the number of communities at 10, and the number of edges at 5000, which increase over time, ending with a maximum value of 4814 nodes, 15 communities and 21908 edges among the built networks. This data correspond to Data122 to Data125, Data128 to Data131, and Data135 to Data138. For all datasets, 10 seeds where used. All data files were generated using Matlab.


Steps to reproduce

The described datasets were used in the paper "Dynamic community detection including node attributes". The code to reproduce the data will be available in future versions. Synthetic network 1 was based on Sheikholeslami and Giannakis (2018), doi:10.1109/TSP.2018.2871383. Synthetic network 2 was based on Tang et al. (2020), doi:10.1007/s00180-019-00909-8, with the addition of time steps. Synthetic network 3 was based on Al-sharoa et al. (2019), doi:10.1109/TBME.2018.2854676, with the addition of attributes. Synthetic network 4 were built with the synthetic benchmark proposed by Largeron et al. (2017), doi:10.1007/s10115-017-1028-2.


Data Science, Clustering, Social Networks