For any data science project, if you start with the wrong question, you are bound to end up with the wrong answer, and fail. Who should identify the right question? I believe data scientists should be involved in the process, otherwise, they will be held responsible for the failure.
CDC headquarters in Druid Hills, Georgia
We illustrate this problem in the context of vaccination: the CDC and other organisations want more people to be vaccinated in US, and they have interesting data to make their point about how vaccination is good and should be more generalized. Yet they are unable to convince a large minority of anti-vaxxers, despite all their efforts. Worse, it backfires, making these anti-vaxxers more convinced about not vaccinating, and spreading the word.
This discussion is not about whether vaccination is good or not, but about why public awareness campaigns failed to produce positive results, because of the following reasons:
We will review each of these issues.
1. Using the Wrong Definition
The definition of anti-vaxxer is too broad: it seems to even apply to people who vaccinate (at least partially) their kids, who were vaccinated long ago against the worst 3 or 4 diseases, and who are not interested in flu shots, not for themselves. This is a large segment of the population, These people are not against vaccines in general, so the appellation anti-vaxxer is not correct, and part of the problem.
If the CDC vaccination awareness campaigns (deployed by public relation agencies) result in making these so-called "anti-vaxxers" appear as idiots. But many of them are well educated, in good health, not living in crowded cities, and not relying very much on official healthcare: you are upsetting these people. It does not matter what the intents are, if it the target audience perceive it as bullying: it will not resonate and even backfire. Every marketer knows that, and every marketer tests the results of all their campaigns (they also do A/B testing), and adjust their messages and channel distribution accordingly.
Part of the problem is the fact that the CDC is of monopoly, when it comes to collecting data about diseases, and handling vaccination issues. As most monopolies, they might not care about the impact of their messages. Just like electricity companies sending letters to heavy customers to tell them that they are bad neighbors, and that they should reduce consumption, without knowing the cause (e.g. your house is much bigger than neighbors homes, with many kids).
2. Failure to Identify a Probable Cause
Those who want to attack the root cause of the anti-vaccination movement believe to have identified the reason: anti-vaxxers believe in the fact that vaccines cause autism.
Now, not only can't they identify real anti-vaxxers (see section 1), but they arbitrarily decide on a cause. There is no scientific evidence proving that anti-vaxxers believe in autism, conspiracy, hazards linked to mercury, or that their rejection is based on religion. It might be true for a small minority, but not for the majority. It would be easy for a data scientist to conduct a survey, to identify the real causes of vaccination refusal. The causes could be:
As a result, disproving the myth that vaccines cause autism, is a waste of time and money, since anti-vaxxers don't believe in this theory in the first place. Data scientists and domain experts should try to understand the real causes.
And of course, there will always be 5% to 10% of the population that distrust science and propaganda enough, that convincing them to get vaccinated is a waste of time, and an attack against their lifestyle. See section 6 in this article. And true autism believers - the minority of anti-vaxxers - can't be converted. Again data science should easily prove this.
3. Absence of Marketing Analytics
It is doubtful that any A/B or multivariate testing was done to identify which messages and channel combinations work best, to convert anti-vaxxers. Likewise, no yield analysis was performed to measure the success of these campaigns. It looks like conversion rates were very small, and it backfired by possibly creating more adversarial people - in short, losing more pro-vaxxers than converting anti-vaxxers. The target market (US residents) was not segmented: the exact same message was broadcast over and over, unchanged, to all segments of the population, with no attempt at per-segment customization, or at avoiding the dangerous segments. In short, the yield over the baseline - that is, not running any campaigns at all - is negative.
This is compounded by Facebook paid-to-post pro-vaxxer trolls, who will copiously insult anyone who disagree with their masters, but are easy to detect. Public relations agencies are known to use them (some may be bots).
It is surprising that life and health insurance companies do not offer a discount to people who are vaccinated. These companies (life insurance at least) are very good at market segmentation to optimize premiums based on health risks.
At the end, it leaves the anti-vaxxer with a sour feeling that she is being manipulated by some sort of a vaccination cult. The fact that no one at the CDC really understands why educated, healthy people in NYC and Malibu are getting more and more reluctant to vaccinate, makes you think that the CDC's workforce is not diversified. Many of its hires might be risk-adverse, and most of their scientists need a security clearance. Nothing bad with that, but it makes the workforce not diversified - and they are in a monopoly position. It reminds me of Philip Morris (cigarette manufacturer): they would only hire smokers who would say good thinks about cigarettes (the first face-to-face job interview question used to be: do you want a cigarette?)
4. Computation and Data Issues
How are statistics about vaccinated people computed? Is there a centralized database? What about parents who lie on their application form, when enrolling a kid in a public school? And what about those with vaccinated kids, but who prefer to claim religious exemption, because they don't remember where / when / for what their kids were vaccinated, maybe because they moved out of state. I'm not saying that the data is significantly inaccurate, but errors might impact statistical significance of some results - it is important to know how accurate your data is, especially if data is provided by schools, or comes from surveys.
When my wife enrolled last year in a university program, she was asked about her vaccinations. She never provided evidence of the required vaccinations, and the school eventually dropped the issue (after all, they'd rather get $24,000 in tuition, than argue about some paperwork). My wife might have been properly vaccinated, who knows, but at 50 years old, few people remember anything about past vaccinations. It's a fact of life.
Finally, is the increase in (say) measles cases entirely attributable to unvaccinated people, or are there other causes explaining this trend? Increased density population could be a factor, easy to rule out (or not) using statistical models. Virus mutation another one. If other causes are found, they must be addressed.
5. Issue with Building Trust
When an organization publishes statistical statistics, building and keeping trust is a paramount, or soon, nobody believe in your reports anymore. Unemployment statistics from the BLS is a good example. Many people take it with a grain of salt: lower unemployment rates are masked by low-paid jobs, college graduates with high student debts and overqualified for the few jobs they can find, and people not fully employed. Again, it's a question of definition: how do you define "unemployed", and is the metric, as currently defined, of any use to measure the strength of the job market? Who should come with a better metric?
In the case of the CDC, it would be great to see how good they have been, regarding past predictions. Looking at their website, I see similar issues. One of their featured articles this week was about binge drinking and alcoholism. Not only the meaning of "binge drinking" varies by individual: many drinkers can drink 5 times more than the "heavy drinker" threshold defined by the CDC (14 glasses of wine a week) and be perfectly fine. A good chunk of these "spectacular drinkers" are never detected because they are never arrested and never cause any problems. They don't show up in the statistics, causing a bias. The article also has a chart (see below) that tries to convince you that the problem is worst for White people. However, the high 68% is because there are far more Whites than (say) American Indians. The issue is indeed far more severe for American Indians. Looking at the chart, you feel that it is misleading, just misinformation. And you are wondering how and why they collect data about race.
Your next question is then: what about the claims regarding vaccination? Is the reality also distorted?
Example of misleading chart used by the CDC
Finally, even scientific research in general is not as trusted as it used to be, given the increasing political and financial interests involved, and the rush to publish at any cost. Read my article on how to fix this.
6. Lack of Proper User Segmentation
There is a segment of anti-vaxxers that is not worth reaching out to: you won't convert them. Just like there is a proportion of the population that will never be able to read, no matter how much money is spent to train them. That's the baseline - the people that can't be converted no matter what, and especially when making the false assumption that they believe in the link between autism and vaccination.
Likewise, you will never be able to stop some people from rock climbing, or driving a car, or doing a barbecue, or playing with fireworks, despite the risks (higher risks than not being vaccinated). A better strategy might be to identify the population segments properly. Some people are open to receiving a limited number of vaccines, but the current anti-vaxxer propaganda, by bullying them, turns them off, or worse, makes them become more bold. Maybe there is a way to better reach out to them, and offer reduced vaccination schedules (only the most useful vaccines, via trusted channels) to at least better protect the entire population.
Data science should help find better approaches to deal with this problem, with "domain expert" data scientists participating in executive meetings, and in the design of databases and metrics to be tracked, including the definitions of terms such as anti-vaxxer, and data collection processes.