In France, the Levothyrox case is an extremely mediated pharmacovogilance case which received a lot of media coverage in August 2017 following the change, in March of the same year, of the pharmaceutical formulation of the specialty (modification of the type and content of excipients at the initial request of the french's national agency of drug safety - ANSM). Many patients have experienced adverse effects related to fluctuating thyroid function when they were already treated and stabilized with Levothyrox (3 million french patients with thyroid pathologies). This dataset presents patients posts collected from the French health forum Doctissimo, Thyroid and Endocrine Problems sub-forum using a web scraping algorithm. The main idea is to find out whether the result of the processing of the data exchanged on the Doctissimo forum can be used in pharmacovigilance and whether it would have allowed better reactivity from the public authorities and responsible laboratories.
Doctissimo was chosen since it is the most used health forum in France by drug consuming patients (ranking first with 61\% of users). Other sources of information could have been chosen such as Twitter which is the most used website in the world by drug consuming patients. Twitter brings together 52% of these patients against 27% for all discussion forums combined. However, access to Twitter data is chargeable, which is why Doctissimo has been selected. The extraction was performed on the ``Thyroide et Problemes Endocriniens (Thyroid and Endocrine Problems)'' sub-forum with the keyword ``levothyrox''. The choice of extracting information from this forum using the particular relevant keyword ``levothyrox'' was to limit the amount of extracted data. Indeed, during data extraction, Doctissimo blocks the scraping task when reaching a limit of 8,000 extracted discussion threads, since it detects an automatic machine activity. We collected the messages written between years 2000 and 2020. This resulted in a total of 110,260 comments written by a total of 7650 subjects. For each of the comments, we extract the date, pseudo of the person who wrote the comment, the comment's text, and URL link.
Doctissimo was chosen since it is the most used health forum in France by drug consuming patients (ranking first with 61\% of users). Other sources of information could have been chosen such as Twitter which is the most used website in the world by drug consuming patients. Twitter brings together 52% of these patients against 27% for all discussion forums combined. However, access to Twitter data is chargeable, which is why Doctissimo has been selected. The extraction was performed on the ``Thyroide et Problemes Endocriniens (Thyroid and Endocrine Problems)'' sub-forum with the keyword ``levothyrox''. The choice of extracting information from this forum using the particular relevant keyword ``levothyrox'' was to limit the amount of extracted data. Indeed, during data extraction, Doctissimo blocks the scraping task when reaching a limit of 8,000 extracted discussion threads, since it detects an automatic machine activity. We collected the messages written between years 2000 and 2020. This resulted in a total of 110,260 comments written by a total of 7650 subjects. For each of the comments, we extract the date, pseudo of the person who wrote the comment, the comment's text, and URL link.