Datasets and supporting material used in the manuscript
"Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671
There are four types of information:
1. Classification
One file (classification.csv)
Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification.
2. Divergencies
Seven ".csv" files D_level_dimension.csv
The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties).
The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details.
3. Temporal evolution
One file (D_over_time.csv)
The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper.
4. List of words
The list of contractions was obtained from the Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions).
The list of stop word was constructed mixing the lists found in NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org).
List of Contractions:
"she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would ha...
"Using text analysis to quantify the similarity and evolution of scientific disciplines", by L. Dias, M. Gerlach, J. Scharloth and E. G. Altmann, available at https://arxiv.org/abs/1706.08671
There are four types of information:
1. Classification
One file (classification.csv)
Provides the classification of scientific fields in domains, disciplines, and specialties, according to the ISI-Web-of-Science/OECD classification.
2. Divergencies
Seven ".csv" files D_level_dimension.csv
The divergence between two scientific fields, as discussed in the manuscript (E.g., Fig. 1). The files correspond to the combinations between three dimensions (experts, citations, and language) and three levels of classification of scientific fields (domains, disciplines, and speciaties).
The first row and column in each file indicates the number of the scientific field, see the file "classficiation.csv" for details.
3. Temporal evolution
One file (D_over_time.csv)
The language divergence between two disciplines D_i,j computed at different years (y in [1991-2014]). The two first columns indicate the code of the disciplines i and j, see file classification.csv mentioned in point 1 above. The first row indicates the year. The entries of the table are D_i,j. The entry "nan" indicates that in that year the corpus of disciplines i and j were not long enough for the computation of D_i,j (less than 20,000 types), see Materials and Methods of the paper. The results of this table were used in Fig. 4 of the paper.
4. List of words
The list of contractions was obtained from the Wikipedia List of English Contractions (http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions).
The list of stop word was constructed mixing the lists found in NLTK (http://www.nltk.org/), Gensim (http://radimrehurek.com/gensim/index.html), Mallet (http://mallet.cs.umass.edu/) and the Python Machine Learning Toolkit (http://scikit-learn.org).
List of Contractions:
"she'll": 'she will', "shouldn't've": 'should not have', "she'll've": 'she will have', "don't": 'do not', "should've": 'should have', "won't": 'will not', "who'll've": 'who will have', "he's": 'he is', "when's": 'when is', "we've": 'we have', "he'd": 'he had', "ma'am": 'madam', "y'all're": 'you all are', "he'd've": 'he would ha...