Synthetic mobile device data
Description
This repository contains two synthetic mobile device datasets, one for GPS location records ("input_case1_v2.csv") and the other for cellular location records ("input_case2_v2.csv"). The two datasets are stored in CSV files. In each CSV file, there are 12 data fields, explained in the "Dictionary.docx" file. The two datasets contain the location records for 582 individual mobile devices for a month. The GPS dataset ("input_case1_v2.csv") contains 668,939 location records, and the cellular dataset ("input_case2_v2.csv") contains 61,390 location records. Using the synthetic mobile data generation method developed by Chen et al. (2014), the two datasets are generated based on two real-world data sources. The first one is mobile app data, which comes from people using location-aware mobile apps. The mobile app data encompasses both GPS and cellular data, and covers the month of March in 2019 in the central Puget Sound region. It includes 582 individual mobile device users. The second data source is household travel survey data. It covers the month of March in 2017 in the central Puget Sound region, and includes 582 survey respondents. The 582 (mobile device) users and the 582 survey respondents are randomly linked. The visited locations in the household travel survey are viewed as the ground-truth stays. Four fields of information from the mobile app data are preserved in the synthetic location records: the number of location records, and the user ID (anonymized), timestamp of each location records, and location accuracy associated with a record. If the timestamp of a location record falls within the duration of a ground-truth stay, the location record will be associated to the stay. The latitudes and longitudes of synthetic location records are generated such that their spatial distribution is the same as that from the mobile app data for a given user on a given day. The spatial distribution is measured by the distance and angle from a location record to the corresponding stay. Methods to infer stays from the mobile data is described in Wang et al., (2019), which was developed using the method developed in (Chen et al. 2014). For synthetic location records not associated to any (ground-truth) stay, their locations are random deviates from locations evenly distributed on the straight line connecting the last and the next stays, as described in Chen et al. (2014).