Tens of thousands data science research articles are published each year in academic and professional journals; Most are geared towards the usual suspects like business and government application of DS. However, some researchers are using data science to tackle subject areas that are a little unexpected.
ML Methods to Predict School Performance
Failing schools are a global problem. Despite many researchers use of ML methods to analyze educational indicators (measures of school performance), there is no consensus on how much impact socioeconomic factors have on the issue of poor student performance. A recent study by Brazilian researchers Joyce Maia and João Sato  sought to correct this problem, evaluating the different factors that delineate good schools from failing ones. As in the United States, the gap between rich and poor is increasing dramatically in Brazil, necessitating an urgent need to move towards access to quality education for all.
Maia and Sato generated different models based on Indice de Desenvolvimento da Educacao Basica (IDEB) data; a database created by the Brazilian Ministry of Education to monitor school performance. The non-linear model produced the most promising results, providing useful links between socioeconomic factors and educational indicators, The authors hope policymakers will use the study’s results to implement policy changes that reduce inequities throughout school systems.
Tracking Surgical Tools in The Operating Room
Image-based tracking of medical instruments like maneuverable operating room lights, endoscopic cameras, and insufflators (machines that blow powder or vapor into body cavities) is part of surgical data science. Historically, it’s been tough to generalize from one operating room to the next, which means these vital instruments aren’t always available in the right place at the right time.
Researchers at the Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ) in Heidelberg, Germany sifted through an array of challenging and graphic surgery images with the goal of using predictive analytics to assess when exactly these tools might be needed . The researchers’ goal was to use data science to give physicians right assistance at the right time.
The result of the study was the Heidelberg Colorectal (HeiCo) data set - the first publicly available data set for “benchmarking of medical instrument detection and segmentation algorithms.” The data set contains 30 videos from three different surgery types with sensor data from various operating room medical devices.
Perspectives on ADHD from Twitter
Social media has the potential to shed light on medical conditions because people post about what’s happening when events actually happen, rather than researchers having to rely on interviews weeks or months later. Facebook group data and forum data has been used in the past to analyze health conditions, but a recent study analyzed Tweets to paint a picture of what it’s like to live with ADHD .
in a recent Journal of Data and Information Science article, researchers Michael Thelwall and colleagues highlighted a new iterative data science method called word association thematic analysis. The technique, developed to identify themes from sets of texts, was employed to analyze 58,893ADHD-related Tweets with the query “My ADHD”. An additional 1,341,442 non-ADHD personal health related Tweets were collected for comparison. The authors hope their analysis will complement existing tools, like personal interviews, for assessing health conditions.
Therapeutics for Rare Diseases
The used of ML to study rare diseases and “neglected” public health problems like suicide hasn’t been at the forefront of data science--until now. Researchers at the Navarra Institute for Health Research in Pamplona, Spain  hypothesized that ML could be used to discover and design new therapeutics for rare and neglected diseases.
Many rare diseases affect millions of people, yet drugs are few and far between. For example, hepatocellular carcinoma, a rare form of liver cancer, kills 62,000 deaths every year. However, a disease-specific therapy isn’t yet available. The authors studied possible therapeutics for this disease and two others through database analysis and interactions of 12,000 compounds with targets and next-generation sequencing.
The same ML techniques were used to assess the “neglected” public health concern of suicide. Registry data from tens of thousands of Danish people was analyzed with classification trees and random forests. The results revealed that sex, general health, and psychiatric disorders were important factors for the risk of suicide risk.
Image: By Frp17580 - Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=5244864