This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Research Protocols, is properly cited. The complete bibliographic information, a link to the original publication on https://www.researchprotocols.org, as well as this copyright and license information must be included.
Cannabis use has increased in Canada since its legalization in 2018, including among pregnant women who may be motivated to use cannabis to reduce symptoms of nausea and vomiting. However, a growing body of research suggests that cannabis use during pregnancy may harm the developing fetus. As a result, patients increasingly seek medical advice from online sources, but these platforms may also spread anecdotal descriptions or misinformation. Given the possible disconnect between online messaging and evidence-based research about the effects of cannabis use during pregnancy, there is a potential for advice taken from social media to affect the health of mothers and their babies.
This study aims to quantify the volume and tone of English language posts related to cannabis use in pregnancy from January 2012 to December 2021.
Modeling published frameworks for scoping reviews, we will collect publicly available posts from Twitter that mention cannabis use during pregnancy and use the Twitter Application Programming Interface for Academic Research to extract data from tweets, including public metrics such as the number of likes, retweets, and quotes, as well as health effect mentions, sentiment, location, and users’ interests. These data will be used to quantify how cannabis use during pregnancy is discussed on Twitter and to build a qualitative profile of supportive and opposing posters.
The CHEO Research Ethics Board reviewed our project and granted an exemption in May 2021. As of December 2021, we have gained approval to use the Twitter Application Programming Interface for Academic Research and have developed a preliminary search strategy that returns over 3 million unique tweets posted between 2012 and 2021.
Understanding how Twitter is being used to discuss cannabis use during pregnancy will help public health agencies and health care providers assess the messaging patients may be receiving and develop communication strategies to counter misinformation, especially in geographical regions where legalization is recent or imminent. Most importantly, we foresee that our findings will assist expecting families in making informed choices about where they choose to access advice about using cannabis during pregnancy.
Open Science Framework 10.17605/OSF.IO/BW8DA; www.osf.io/6fb2e
PRR1-10.2196/34421
Recreational cannabis use has increased in Canada since its legalization in 2018, including among pregnant women [
Pregnant patients increasingly seek medical and health advice on online platforms, especially for emerging topics like cannabis use [
Given the possible disconnect between online messaging and evidence-based research about the effects of cannabis use during pregnancy, there is the possibility that advice taken from social media could have inaccuracies that may affect the health of mothers and their babies. Here, we propose a systematic search of Twitter to quantify the volume and tone of posts on the forum related to cannabis use in pregnancy. Twitter is a global platform, and our findings may have relevance in Canada, the United States, and other jurisdictions where access and availability to cannabis are increasing due to legalization. We will assess regional correlations in these data to determine if changes in the legalization of nonmedical cannabis affect online messaging of its use during pregnancy in Canada and states in the United States that have legalized recreational cannabis.
With reference to Arksey and O’Malley’s [
Identifying the research question
Identifying relevant Twitter posts
Selecting eligible Twitter posts
Charting the data
Collating, summarizing, and reporting the results
Past research from Cavazos-Rehg et al [
How is cannabis use during pregnancy discussed on Twitter regarding the volume, tone, content, and authors/users?
Our search strategy will follow an iterative approach according to our population, concept, and context of interest (
Twitter posts containing information relevant to pregnancy or pregnant individuals
Discussion or mention of cannabis use in relation to pregnancy or the developing fetus
All English language Twitter posts (tweets) made from January 2012 to December 2021. Geographical analyses will be restricted to Canada and states in the United States where recreational cannabis use is legal.
Pregnancy, pregnant, baby, fetus, fetal, prenatal, perinatal, womb, preggo, “pregnant life,” “baby bump,” “mom to be,” “mommy to be,” “baby on the way,” “preggers,” “pregnant af”
cannabis, weed, pot, marijuana, marihuana, MJ, ganja, purp, bud, keef, kief, dope, “mary jane,” thc, cbd, cannamom, opiate, mdma, ecstasy, mmj, medical marijuana, blunt, bong, budder, hash, hemp, indica, kush, reefer, sativa
Following the Twitter Archive search, we will preprocess the corpora to filter out content unrelated to
Note, that most irrelevant tweets are pruned out by Sentence BERT in the preprocessing phase (
Overview of the proposed data collection methodology, preprocessing, and analytical process for tweets about cannabis use during pregnancy. SBERT: Sentence Bidirectional Encoder Representations from Transformers.
Cannabis during pregnancy
Kids, children, and youth smoking cannabis
Smoking cannabis while pregnant
Medical cannabis for people
The effects of cannabis on pregnant women
Legalization of cannabis
Smoking or consuming drugs during pregnancy
Data charting will include an automated analysis of all tweets returned by our search. A manual analysis will then be conducted on the smaller subset of tweets included during the process outlined in Step 2.
Using the Twitter API for Academic Research [
Three independent reviewers will manually review the smaller subset of randomly sampled tweets. We will verify the number of favorites and retweets each tweet has received against the automatic data collection via the API. We will use publicly available user lists to determine the category of organization or individual user that posted the tweet (government or public health agency, obstetrical society/network, university, hospital, news outlet, cannabis industry source, or other individual) [
Separately, we will also extract CENSUS or population-level data on birth rates and maternal and infant mortality rates across the study period in Canada and the United States. It has been shown that Twitter is a good proxy to infer health-related statistics, including teenage birth rates [
We will first report the total number of tweets returned over the search period and temporal trends in the number of tweets posted over the study period. Next, the number of tweets sampled in the automated and manual analyses will be reported. From the automated analysis, we will report the number and percentage of the returned posts that discuss cannabis use during pregnancy positively or negatively as determined by our sentiment analysis. Subsequently, we will calculate the standardized mean difference in the number of favorites and retweets received by positive and negative tweets, and to compute the odds (ratio) that positive posts originate from each category of organization or individual and mention health effects. We will further calculate the number of times each health effect was mentioned as a percentage of the total health effect mentions. These statistics will be presented in tabular form.
The location-based component of our analysis will be restricted to tweets that offer location data and originate from Canada and legal states within the United States, as these are the only English-speaking regions that have legalized the sale of nonmedical cannabis. If any regions (eg, New Zealand or the United Kingdom) legalize cannabis before our analysis is conducted, this restriction will be changed to include them. We will match location data from these jurisdictions to the timestamp for each tweet to calculate the proportion of tweets originating from our predefined geographical regions for each week of the search period. Next, we will visualize each region on a line graph that plots time versus the volume of posts with a marker to indicate when that region legalized cannabis. A line graph that plots time versus percentage of positive posts will be plotted using the same process. We will then use a repeated cross-sectional design to analyze the correlation of these data with population-level vital statistics data and determine if trends in cannabis messaging on Twitter correlate with birth rates and maternal and infant mortality rates.
In addition to these numerical analyses, we will develop qualitative profiles of influential accounts. These profiles will include elements such as the user’s background (eg, political leaning, socioeconomic status, or education/interests); their Twitter following; whether Twitter has verified their account as “authentic, notable, and active” [
This study was exempted from ethics review on the basis that it will collect and synthesize publicly available data. Therefore, the research does not require ethical approval.
Using our data collection method, combing the search_all_tweets function from Tweepy [
Geographic distribution of geotagged tweets containing pregnancy and cannabis-related keywords posted between January 1, 2012, to December 31, 2021.
Number of tweets per day related to cannabis in pregnancy, January 1, 2012, to December 31, 2021.
Frequency of cannabis-related keywords identified in tweets posted between January 1, 2012, to December 31, 2021.
Keyword | Count |
weed | 1,047,115 |
dope | 688,153 |
blunt | 556,865 |
pot | 399,444 |
keef | 356,605 |
marijuana | 183,409 |
bud | 161,328 |
bong | 116,876 |
kush | 99,916 |
thc | 44,970 |
hash | 44,906 |
cbd | 39,287 |
ecstasy | 33,989 |
hemp | 28,514 |
purp | 25,353 |
ganja | 24,641 |
indica | 8447 |
reefer | 6125 |
opiate | 4102 |
kief | 3092 |
mdma | 2459 |
mmj | 1386 |
budder | 643 |
marihuana | 637 |
cannamom | 40 |
medicalmarijuana | 13 |
The semantic community detection algorithm detected 220 clusters within the 3,000,000 tweets from our corpora. We manually inspected the top 5 and bottom 5 tweets of each cluster and assigned an appropriate label that best described the topical context of those tweets. For example, we found 9 topical clusters related to
We expect to conclude this study in December 2022.
Topical contexts (clusters) identified from tweets collected about cannabis use during pregnancy.
Top 3 and bottom 3 tweets selected from the cluster “Cannabis exposure on infants.”a
No. | Paraphrased tweet |
1 | random |
2 | newborns test positive |
3 | |
45 | |
46 | pediatric doctor advises passing |
47 | expert on |
aItalicized words represent our set of query keywords.
This study will infer how cannabis use in pregnancy is portrayed on Twitter, the content and origin of supportive posts, and how legal status changes influence the volume and tone of posts related to cannabis in pregnancy. Our findings will help inform policy strategies to public health agencies, care providers, and other stakeholders. Moreover, they will suggest future avenues for research. Our preliminary findings suggest that this work is feasible and that we have identified a sufficiently robust corpus of tweets for more detailed analyses.
Twitter is an extensive online platform to share news and opinions [
Besides Twitter, there are several online platforms used to share opinions, for instance, Facebook, Reddit, and Quora. To the best of our knowledge, only Facebook has been used to study people’s opinions on cannabis [
We will submit the final results of our review for publication in a peer-reviewed journal, present at academic conferences, and share through publicly available streams such as the professional and institutional social media accounts and webpages associated with the research team. The results will provide insight into how frequently and in what context Twitter is being used to discuss cannabis use during pregnancy. We anticipate that this knowledge will help public health agencies and health care providers assess the messaging patients may be receiving on Twitter and develop communication strategies to counter misinformation, especially in geographical regions where legalization is recent or imminent. Most importantly, we foresee that our findings will assist expecting families in making informed choices about where they choose to access advice about using cannabis during pregnancy.
Application Programming Interface
Bidirectional Encoder Representations from Transformers
Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews
This work was supported by a Canadian Institutes of Health Research Team Grant awarded to DJC (funding reference CA3-170126). The authors would like to thank Roberto Ulloca for support with the Twitter Academic Application Programming Interface and Indira Sen for advice on the state-of-the-art natural language processing techniques.
LC, LEN, MSQM, MK, and DJC conceptualized the study and designed the methodology. LC wrote the original draft. LC, LEN, MSQM, SR, MCW, MK, and DJC reviewed and edited the manuscript. DJC and MSQM supervised the study and acquired the funding. All authors read and approved the final version of this manuscript.
None declared.