DataCite Commons: 100 Days of Tweet IDs and Most Frequent Terms in Tweets from_user_id

This is an Excel workbook containing two sheets. The first sheet contains 503 rows corresponding to 503 Tweet id strings from_user_id_str 25073877 and the following corresponding metadata:

created_at time
user_lang
in_reply_to_user_id_str f
from_user_id_str
in_reply_to_status_id_str
source
user_followers_count
user_friends_count

Tweet texts, URLs and other metadata such as profile_image_url, status_url and entities_str have not been included.

An attempt to remove duplicated entries was made but duplicates might have remained so further data refining might be required prior to analyses.

The second sheet contains 400 rows corresponding to the most frequent terms in the dataset's Tweets' texts. The text analysis was performed with the Terms Tool from Voyant Tools by Stéfan Sinclair & Geoffrey Rockwell (2017). An edited English stop words list was applied to remove Twitter data specific terms such as t.co, https, user names, etc. The analysed Tweets contained emojis and other special characters; due to character encoding these will be reflected in the terms list as character combinations.

Motivations to Share this Data

Archived Tweets can provide interesting insights for the study of contemporary history of media, politics, diplomacy, etc. The queried account is a public account widely agreed to be of exceptional national and international public interest. Though they provide public access to tweeted content in real time, Twitter Web and mobile clients are not suited for appropriate Tweet corpus analysis. For anyone researching social media, access to the data is absolutely essential in order to perform, review and reproduce studies.

Archiving Tweets of public interest due to their historic significance is a means to both preserve and enable reproducible study of this form of rapid online communication that otherwise can very likely become unretrievable as time passes. Due to Twitter's current business model and API limits, to date collecting in real time is the only relatively reliable method to archive Tweets at a small scale.

So far Twitter data analysis and visualisation has been done without researchers providing access to the source data that would allow reproducibility. It is appreciated that an Excel workbook is far from ideal as a file format, but due to the small scale the intention is to make this data hu...

Dataset published 2017 in figshare Academic Research System

DatasetPolitical scienceComputer and information sciences

https://doi.org/10.6084/m9.figshare.4955231

100 Days of Tweet IDs and Most Frequent Terms in Tweets from_user_id_str 25073877

Cite as

Download Reports

100 Days of Tweet IDs and Most Frequent Terms in Tweets from_user_id_str 25073877

Cite as

Download Reports