DataCite Commons: Dissimilarity Between Scientific Fields

Datasets and supporting material used in the manuscript

"Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671

There are four types of information:

1. Classification

One file (classification.csv)

Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification.

2. Divergencies

Seven ".csv" files D_level_dimension.csv

The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties).

The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details.

3. Temporal evolution

One file (D_over_time.csv)

The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper.

4. List of words

The list of contractions was obtained from the Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions).

The list of stop word was constructed mixing the lists found in NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org).

List of Contractions:

"she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would ha...

Content published 2017 in Zenodo

Dataset

https://doi.org/10.5281/zenodo.816302

Laercio Dias	Max Planck INstitute for the Physics of Complex Systems, Dresden, Germany
Eduardo G. Altmann	School of Mathematics, The University of Sydney

Laercio Dias	Max Planck INstitute for the Physics of Complex Systems, Dresden, Germany
Eduardo G. Altmann	School of Mathematics, The University of Sydney

Dissimilarity Between Scientific Fields

Cite as

Download Reports

Dissimilarity Between Scientific Fields

Cite as

Download Reports

Dissimilarity Between Scientific Fields

Cite as

Download Reports

Share

Dissimilarity Between Scientific Fields

Cite as

Download Reports

Share