DataCite Commons: Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

This dataset includes the raw frequency counts (classical_chinese_learners_vocabularies_raw_frequencies.zip) used in the article Thoughts on “Reliable” Learner’s Vocabularies for Classical and Literary Chinese. Corpus I – Micheal Loewe (1993)’s Early Chinese Texts
Corpus II – Official Histories (zhengshi 正史)
Corpus III Six Novels (xiaoshuo 小說), as defined in Hsia 1968 The download includes one folder per corpus, structured as follows: xx_corpus.csv > list of texts and sources / used versions, token and type counts xx_freq_1-1.csv > unigram / character frequencies and counts xx_freq_1-4.csv > 1 to 4 character word frequencies and counts, "words" according to Hanyu da cidian 漢語大詞典 (Luo 1986–1994)) xx_freq_2-4.csv > 2 to 4 character words Additionally, pca_zhengshi_vs_loewe_vs_xiaoshuo.html is an interactive version of the Principal Component Analysis (PCA) presented in the article, texts from the three corpora are represented using the 1.000 most frequent 1–4 character combinations from the dataset.

Version 1 of Content published 2021 in Zenodo

DatasetChinese

https://doi.org/10.5281/zenodo.5638881

Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

Cite as

Download Reports

Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

Cite as

Download Reports

Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

Cite as

Download Reports

Share

Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese

Cite as

Download Reports

Share