AOL Dataset for Browsing History and Topics of Interest
This record provides the datasets of the paper The Privacy-Utility Trade-off in the Topics API.
The datasets generating code and the experimental results can be found in 10.5281/zenodo.11032231 (github.com/nunesgh/topics-api-analysis).
Files
AOL-treated.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies. It contains singletons (individuals with only one domain in their browsing histories) and one outlier (one user with 150.802 domain visits in three months) that are dropped in some analyses.
AOL-treated-unique-domains.csv: Auxiliary dataset containing all the unique domains from AOL-treated.csv.
Citizen-Lab-Classification.csv: Auxiliary dataset containing the Citizen Lab Classification data, as of commit ebd0ee8, treated for inconsistencies and filtered according to Mozilla's Public Suffix List, as of commit 5e6ac3a, extended by the discontinued TLDs: .bg.ac.yu, .ac.yu, .cg.yu, .co.yu, .edu.yu, .gov.yu, .net.yu, .org.yu, .yu, .or.tp, .tp, and .an.
AOL-treated-Citizen-Lab-Classification-domain-match.csv: Auxiliary dataset containing domains matched from AOL-treated-unique-domains.csv with domains and respective topics from Citizen-Lab-Classification.csv.
Google-Topics-Classification-v1.txt: Auxiliary dataset containing the Google Topics API taxonomy v1 data as provided by Google with the Chrome browser.
AOL-treated-Google-Topics-Classification-v1-domain-match.csv: Auxiliary dataset containing domains matched from AOL-treated-unique-domains.csv with domains and respective topics from Google-Topics-Classification-v1.txt.
AOL-reduced-Citizen-Lab-Classification.csv: This dataset can be used for analyses of browsing history vulnerability and utility, as enabled by third-party cookies, and for analyses of topics of interest vulnerability and utility, as enabled by the Topics API. It contains singletons and the outlier that are dropped in some analyses.This dataset can be used for analyses including the (data-dependent) randomness of trimming-down or filling-up the top-s sets of topics for each individual so each set has s topics. Privacy results for Generalization and utility results for Generalization, Bounded Noise, and Differential Privacy are expected to slightly vary with each run of the analyses over this dataset.
AOL-reduced-Google-Topics-Classification-v1.csv: This dataset can be used for analyses of browsing his...