Home » Uncategorized

The Opioid Epidemic and Ethical Data Warehousing


  • Data science and ML is being used to tackle the ongoing opioid epidemic.
  • Medical databases are rich with information but lack informed consent.
  • Problems of heterogeneous data and ethical concerns pose significant challenges,
  • Social media mining is emerging as a tool, but has its ethical problems as well.

The problem

The opioid epidemic remains a public health emergency for the U.S [1], accounting for 50,000 deaths in 2019 [2] and accelerating during the Covid-19 pandemic [3]. 

Data science and machine learning methods have been used to analyze the issue, with promising results. Good data has been garnered directly from medical establishments, but ethical and practical issues may prevent the practice from becoming commonplace. Recent studies have turned to a relatively new type of data mining called Social Media Mining (SMM). However, SMM has ethical issues of its own.

Mining Medical Databases

Unprecedented efforts have been undertaken to collect and analyze big gathered directly from multiple healthcare sites. However, this personal data collection is fraught with ethical problems.  As an example, The Massachusetts Public Health Data Warehouse, which contains data on most adults living in the state, was gathered without knowledge, input, or consent of most participants [4]. Previously distinct data systems were linked, with unsuspecting adults becoming subject to even more profiling and surveillance. In addition, the data collected was transformed into different formats for different machine learning models, which meant multiple copies of private and secure data [5] with an uncertain lifespan.

Another major issue concerns the heterogeneity of data that is gathered, which leads to analysis issues. For example, there is a lack of a standard terminology for medical data encoding; One code for Drug A in Los Angles may have an entirely different code in Sacramento. In addition, a drug can come with different names and compositions, adding another layer of complexity in model building.

Social Media Mining

A relatively new field called Social Media Mining (SMM) combines social media and big data. SMM works in the same way as data mining, but is confined to the world of Facebook, Twitter and Instagram.  For example, the Twitter Streaming API allows uses to analyze Tweets that fit certain keywords [6], while one of the largest social science datasets ever constructed–an exobyte (billion gigabytes) of data consisting of 10 trillion summary statistics from Facebook–can be accessed by academic researchers through Social Science One [7]

Researchers have derived a wealth of data on the opioid epidemic using SMM. For example:

  • Several studies have gathered Twitter data related to brand names, novel drugs, slang terms, medications, and mental health symptoms. One study showed a correlation between mentions of heroin and synthetic opioids on Twitter and opioid overdose deaths [8].
  • Deep learning models for emotion analysis of social media data has been combined with drug entities derived from cryptomarkets [9].
  • SMM has made it possible to shrink the time lag for statistics on opioid deaths. These statistics are generated from mortality data, which often has a reporting lag of over a year. Easily accessible and near-real-time social media data can shrink the time lag by months and improve public health surveillance efforts [8].

While SMM doesn€™t suffer from lack of explicit informed consent, there are some profound legal and ethical implications, including privacy considerations. [9] Social media mining has unclear boundaries between €˜public€™ and €˜private€™ spaces as well as problems with ensuring anonymity and privacy of subjects who may have unwittingly €œchosen€ to place their private information in the public domain [10].

It€™s clear from the research that data mining, in particular SMM, is a powerful tool to study trends in health-related epidemics such as the opioid crisis. But until meta-guidelines are created for ethics, researchers are entering into a gray ethical and legal area and should collect, analysis and store the data with an abundance of caution.


Image: Arbeck, CC BY 3.0;, via Wikimedia Commons

[1]   Renewal of Determination That A Public Health Emergency Exists

[2]  Opioid Overuse Crisis

[3]  Overdose Deaths Accelerating During Covid-19

[4]  A qualitative study of big data and the opioid epidemic: recommenda…


[6] Social Media Mining: The Effects of Big Data In the Age of Social M… 

[7]  Unprecedented Facebook URLs Dataset now Available for Academic Rese….

[8]  Understanding the evolving nature of the opioid epidemic and opioid…

[9] eDarkTrends: Harnessing Social Media Trends in Substance use di…

[10]  Mining social media data: How are research sponsors and researchers…