Datasets accompanying the paper “Virtual Research Environments Ethnography: a Preliminary Study”, a systematic mapping study on the literature about Science gateways, Virtual Research Environments, and Virtual Laboratories. While for legal reasons we can not share the original datasets obtained by querying the databases, since they include copyrighted data, we can share the two datasets derived from the query results and the two topic modelling datasets. The dataset “main_dataset.csv” consists of the merged query results from ACM Digital Library, IEEEXplore, ScienceDirect, Scopus, and SpringerLink databases. It is structured into six columns: (i) doi; (ii) title; (iii) content_type; (iv) publication year; (v) keyword_search; (vi) DB. The ‘doi’, ‘title’, and ‘publication_year’ labels are self-describing, and are used for the DOIs, titles, and publication years (in the yyyy format) respectively. The ‘content_type’ label refers to the different and normalised typologies of resources: (a) Article; (b) Book, (c) Book Chapter; (d) Chapter; (e) Chapter ReferenceWorkEntry; (f) Conference Paper; (g) Conference Review; (h) Early Access Articles; (i) Editorial; (j) Erratum; (k) Letter; (l) Magazines; (m) Masters Thesis; (n) Note; (o) Ph.D. Thesis; (p) Retracted; (q) Review; (r) Short Survey; (s) Standards. (c) and (d) refer to the same type of entry (they are used in different databases), while in the case of (e) we observed that it is used in the Springer database to refer mainly to encyclopaedic entries. The ‘keyword_search’ label is used for identifying the keyword group used for formulating the query: (a) science gateway | scientific gateway; (b) virtual laboratory | Vlab; or (c) virtual research environment. The ‘DB’ label indicates the provenance of the entries from one of the five databases we selected for our study: (a) ACM; (b) IEEE; (c) ScienceDirect; (d) scopus; and (e) Springer, identifying the ACM Digital Library, IEEEXplore, ScienceDirect, Scopus, and SpringerLink respectively. The dataset “filtered_dataset.csv” consists of the deduplicated and filtered entries (journal articles and conference papers from 2010 onward, with a DOI assigned) from the “main_dataset.csv” we used as the final dataset for answering our research questions. It is structured into ten columns: (i) doi; (ii) title; (iii) venue; (iv) publication_...