Published: 12 December 2022| Version 1 | DOI: 10.17632/hjppt63s9b.1
This repository contains the Contributions of the article "ROI: A method for identifying organizations receiving personal data". The distribution of the datasets is the following: Privacy Policies dataset This dataset ["Policies_urls.csv"] contains 142 privacy policy URLs with the corresponding organization. These URLs were obtained with the two methods (Selenium & Google) described in the article. This is the reason for duplicated URLs. 300 Domain Holders This dataset ["300_domain_holders.xlsx"] contains three different sheets for each dataset used for the validations and described in the article i.e. Fortune 500, PII_receivers_1 (for the technique's evaluation), and PII_receivers_2 (for ROI's evaluation). Recipient Domains This dataset ["Domains_receiving_PII.csv"] contains 40,493 dataflows corresponding to the 1,112 unique domains along with the type of personal data which received from an Android app.



Natural Language Processing, Machine Learning, Applied Computer Science, Research Article, Privacy, Personal Data