Subscribe to DSC Newsletter

Cambridge Analytica’s wholesale scraping of Facebook user data is big news now, and people are “shocked” that personal data is being shared and traded on a massive scale on the internet. But the real issue with social media is not harm to individual users whose information was shared, but sophisticated and sometimes subtle mass manipulation of social and political behavior by bad actors, facilitated by deceit, fraud, and amplification of lies that spread easily through societal discourse on the internet.

We abandoned any pretense to privacy long ago when we accepted the free service model of Google, Facebook, Twitter, etc. (though Senators listening to Mark Zuckerberg seemed only dimly aware of Facebook's ad-based revenue model).


The controversy about Cambridge Analytica that landed Mark Zuckerberg before Congress actually began brewing over a year ago. It was a controversy not about privacy but about how Cambridge Analytica put vast amounts of personal data, mostly from Facebook, into its so-called “psychographic” engine to influence behavior at the individual level..

Cambridge Analytica worked with researchers from Cambridge University who developed a Facebook app that provided a free personality test, then proceeded to scoop up all the user’s Facebook data plus that of all their friends (thus leveraging the actual users, who numbered less than a million, to harvest the data of more than 80 million people). Using this data, Cambridge Analytica then classified each individual’s personality according to the so-called “OCEAN” scale (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism) and fashioned individually-targeted messages to appeal to each person’s personality.


The real danger revealed by the Cambridge Analytica scandal is that the information and social platforms of the internet, on which we increasingly spend our time and through which more and more of our personal and social connections flow, are being corrupted in the service of con men, political demagogues, and thieves. Russia’s troll farm, the Internet Research Agency, employs fake user accounts to post divisive messages, purchase political ads, spread fabricated images, and even organize political rallies. Until recently, the social media giants seemed indifferent to this problem; any serious attempt to stem the creation of fraudulent accounts would have depressed the growth of the user base, which is all-important in Silicon Valley. Yet analytic methods to detect fake accounts are available.

Detecting Fake Social Media Accounts


In 2015, Dr. Jen Golbeck (right), who teaches Network Analysis at Statistics.com, published an ingenious real-time method for identifying fake social media accounts.

She found that the number of a user’s followers (Twitter) or friends (Facebook) follows a well-known distribution called Benford’s Law. Benford’s Law states that in a conforming data set, the first significant digit of numbers is a “1” about 30% of the time - 6 times more often than it’s a 9. Golbeck and others identified a number of accounts that did not follow this pattern and found they were all fake Russian troll accounts. 

Views: 710

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Peter Bruce on May 10, 2018 at 12:00pm

I saw this comment in my email:

Can someone explain the application of Benford's Law? What does "conforming data set" mean and how are the fake accounts and the significant digits related? Thanks!

Response:   A conforming data set is one whose frequency distribution of first significant digit follows the Benford Law distribution.  Actually, "conforming sources or types of data" might be a better way to put it.  For example, river lengths follow the law - all rivers, in general.  Golbeck found that, in general, the "friend count" distribution on FB and Twitter follows the law as well - with the exception of a set of accounts that all turned out to be the Russian trolls.  In other words, the fact that these were fake troll accounts, and not organically-created accounts, resulted in them not following Benford's law.

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service