DSC Weekly Digest 12 October 2021

Facebook, Social Media, and Jumping Sharks

Announcements

Build statistical and analytical expertise as well as the management and leadership skills necessary to implement high-level, data-driven decisions in Northwesterns Online MS in Data Science. Earn your degree entirely online in classes that are led by industry experts who are redefining how data is used to boost efficiency and effectiveness in a wide range of fields. Learn more
Get to know TIBCOs enterprise analytics platform that allows data scientists and business users to collaborate on advanced analytics using massively scalable in-database and in-cluster processing. Click here for more info

Spooky Scary Data Science Skeletons

October is the spookiest time of the year, when the ghosts and witches are out in force, there’s a chill in the air as gray clouds gather, and pumpkin-flavored, well, just about anything anymore seems ubiquitous. I blame a particular Seattle coffee chain for the last one, but there’s something about moving into Fall that focuses one’s mind on the spooky scary skeletons lurking underneath the bed.

In the realm of the data scientist, there are more than a few skeletons hiding in the closets as well. These are the things that keep analysts up at night, and no matter how well prepared you may be, these jump scares are enough to send anyone screaming.

Data Quality Demons. The business manager assured you that their data’s great and has everything you could ever need. Yet when you pry the lid off the coffin and stare at the mouldering remains of software projects past, you get the creeping sensation that perhaps the manager was a bit ¦ optimistic ¦ in his estimates. Inconsistencies in spelling, the use of arbitrary placeholders, lists of items stored as single strings, differing date and currency conventions, data type errors, these can usually be dispelled with intelligent analysis software, but the bigger demons come about due to cardinality misunderstandings, a failure to account for change in data over time, duplications with subsequent edits creating phantom information, and similar errors that can be difficult to catch and even harder to fix.

Sparse Metadata Monsters. These are more sublime issues having to do with data that was collected primarily to facilitate fast transactions at the expense of containing minimal metadata about those transactions. This includes identifying dimensional units (length, currency, and count units, such as three books not being the same as three cars), identifying the time over which a certain entity exists within the system, metadata about the provenance of the data (who entered it, why did they enter it, how valid is it, where is the source of record for that data), and so on. This data often determines the reliability of the data.

Modeling Mayhem. A recent prepress article about COVID-19 vaccine efficacies in Wisconsin made a modeling assumption about the number of people who had been vaccinated in the state. It turned out that the number was off by a factor of 100, and what had seemed like a strong statistical case against the vaccine became instead a strong case for the vaccine. These kinds of modeling errors can break careers.

Bias Boggarts. Sampling by its very nature can be fraught with gotchas. Is the sample representative of the overall population? What hidden assumptions were made about the questions being asked or the means that the information is gathered? For a long time, surveys were conducted over LAN lines, until a statistician realized that a growing number of people were no longer using them in favor of mobile phones, and those that were left were older, more conservative, and likely wealthier, skewing everything from product marketing to politics.

Interpretation Imps. Having created a model and run the data, ultimately the question is how to interpret the results, and it is here that the imps of the perverse delight in ruining a data scientist’s day. Are the conclusions supported by the analysis? Is it possible that those who have commissioned the analysis will ignore all of the caveats about probabilities and will treat the results as absolute statements? (Yes). Will people justify their own agendas based upon your conclusions, even when the conclusions do not support those results at all? Oh, definitely.

Data Science can be fun and exciting, but it can also be filled with deadly traps and snarling beasts. Sometimes the best that you can do is to be aware of all the goblins and ghoulies, and of course, read Data Science Central.

Goodnight, sleep tight ¦ don’t let the bedbugs bite!

Kurt Cagle
Community Editor,
Data Science Central

To subscribe to the DSC Newsletter, go to Data Science Central and become a member today. It’s free!

Data Science Central Editorial Calendar

DSC is looking for editorial content specifically in these areas for October, with these topics having higher priority than other incoming articles.

AI-Enabled Hardware
Knowledge Graphs
Metaverse
Javascript and AI
GANs and Simulations
ML in Weather Forecasting
UI, UX and AI
GNNs and LNNs
Digital Twins

DSC Featured Articles

Cloud Consulting Services: A New Paradigm Emerges

Divyesh Aegis on 11 Oct 2021
Sunglasses and Face Mask Won’t Fool Facial Recognition Systems Any …

Stephanie Glen on 11 Oct 2021
A window of opportunity for data democracy (Part II of III)

Alan Morrison on 10 Oct 2021
Human In The Loop: A Case for Human Augmented Interaction

ajit jaokar on 09 Oct 2021
PyHard A tool to assess dataset quality and identify hard-to-clas…

Kushal Mukherjee on 08 Oct 2021
Retail Predictive Analytics: Popular Use Cases

Ryan Williamson on 08 Oct 2021
How to build trust in AI

Nora Winkens on 08 Oct 2021
What Skills to Look for When Hiring a Python Developer

INEXTURE Solutions LLP on 08 Oct 2021
How to manage Supply Chain during COVID-19

Vinaksh on 07 Oct 2021
Data Quality Focused Data Pipelines

Indhu on 07 Oct 2021
Download book for data science beginners – Learn Data Science with R

Naryana Nemani on 07 Oct 2021
Efficiently Donating Excess COVID Vaccine Supplies

Saif Ahmed on 06 Oct 2021
How Artificial Intelligence Is Powering Search Engines

Edward Nick on 06 Oct 2021
DSC Weekly Digest 05 October 2021

Kurt A Cagle on 06 Oct 2021
How Digital Marketing Is Evolving and What the Benefits of Digital …

Yuri Filatov on 06 Oct 2021
Lessons to be Learned from the Facebook Outage

Vincent Granville on 06 Oct 2021
How to Set up a React JS Development Environment

Devstringx Technologies on 05 Oct 2021
Remote Work in the IT Industry: Stats, Benefits, and Best Practices

Tarun Nagar on 05 Oct 2021
How to Turn Your WordPress Site Into an App

Jason Camaya on 05 Oct 2021
Adding RDF Lists and Sequences To Sparql

Kurt A Cagle on 05 Oct 2021

Picture of the Week

Key Electronic Business XML (ebXML) elements

To make sure you keep getting these emails, please add [email protected] to your browser’s address book.

Join Data Science Central | Comprehensive Repository of Data Science and ML Resources

Videos | Search DSC | Post a Blog | Ask a Question

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

This email, and all related content, is published by Data Science Central, a division of TechTarget, Inc.

275 Grove Street, Newton, Massachusetts, 02466 US

You are receiving this email because you are a member of TechTarget. When you access content from this email, your information may be shared with the sponsors or future sponsors of that content and with our Partners, see up-to-date Partners List below, as described in our Privacy Policy . For additional assistance, please contact: [email protected]

Spooky Scary Data Science Skeletons

Data Science Central Editorial Calendar

DSC Featured Articles

Leave a Reply Cancel reply