Home » Business Topics » AI Ethics

DSC Weekly 23 August 2022: Five Billion Person Graph – Grand Achievement or Wakeup Call?

  • Kurt Cagle 
This image has an empty alt attribute; its file name is DSC-Banner-2-1-1024x185.jpg


  • Artificial intelligence and machine learning continue to create value streams across different industries and verticals. With significant challenges standing in the way of AI Integration, the three-day Machine Learning & Artificial Intelligence summit features free talks and presentations from industry experts to help successfully break through barriers and take your analytics strategy to the next level. Join live or on-demand to discover the advantages of incorporating advanced analytics into your business strategy, and learn exactly how to do it from the world’s leading BI experts.in the form of live webinars, panel discussions, keynote presentations and webcam videos.
  • The workplace is still reckoning with how the pandemic fueled emerging trends in remote work, work/life balance, productivity and generational attitudes towards what defines a successful career. Join the three-day Transforming the Future of Work summit to hear from leading HR visionaries and experts who will share how businesses can negotiate the lasting impacts of the changes Covid-19 wrought on the workplace. They will examine the full employee experience, from onboarding and recruitment, to culture, engagement and innovation.

A database with five billion people brings up both ethical questions and opportunities.
A graph with five billion people (nearly everyone on the Internet) brings up both ethical questions and opportunities.

Five Billion Person Graph –
Grand Achievement or Wakeup Call?

Oracle recently was in the news as the recipient of a lawsuit by Dr. Johnny Ryan, a privacy advocate working for the Irish Council for Civil Liberties. The lawsuit charges that Oracle had collected dossiers on five billion people, primarily Internet users, and was selling this information to other companies.

To put this into perspective – there are only 7.5 billion people on the planet. This means, in essence, that Oracle has managed to collect data on almost everyone who has used the Internet. Or, put another way, if you are reading this, you have a record in Oracle’s database. Let that sink in a second.

Without your permission or likely even awareness, Oracle has put together one of the largest person database in history (and perhaps one of the largest graphs). This is a stunning achievement, even if it raises significant ethical and potentially legal questions.

One of the central problems faced by any large-scale data product is that people do not, in and of themselves, have meaningful digital identifiers. They have biometric information – fingerprints, retinal scans, facial scans and so forth – that in general are very tightly regulated, typically by governments. They may have device associations such as cell phones or laptops that generally do have identifiers, but these again require some form of consent for data to be used. They have genetic patterns which are generally very accurate but also very slow to sequence.

Yet one of the other things that uniquely identify a person is their network of associations with others. Even if you don’t know who a given person is, if you know their spouse, siblings, parents, and coworkers, you can infer their existence. It is this particular pattern that Oracle now has, which means that the identifiers in question are going to remain stable so long as the graph itself is kept up to date.

This is why this particular database is so valuable – it represents a way of uniquely identifying individuals with enough data points to make comprehensive dossiers on people possible for literally anyone who has so much as touched a keyboard. The particular graphs that connect people together can act like a fingerprint, are relatively stable, and have a high guarantee of being unique – even when detailed information about a given individual is missing.

Such a scheme meet the requirements necessary to assign unique identifiers to people without having to actually give people those identifiers, and that in turn means that once a given marketer has such an identifier for a person, they can share that information with other affiliates to create long-standing, stable dossiers. This is the holy grail for marketers, especially since, in most cases marketers otherwise only collect shadows and snippets of information through inference, or if they’re really lucky, through some kind of buyer’s club or opt-in network. It is also precisely what acts such as the European GDPR and the California CDPA were intended to prevent.

Does this involve condemning Oracle? Perhaps, though it was pretty much inevitable that such graphs would be developed by somebody. Ethics very seldom get in the way of making a buck. If it hadn’t been Oracle, it would have been any of the other big data companies that dominate the landscape today, and I do not doubt that each of these organizations either has something similar in the works or will soon enough. In that regards, it’s a lot like human cloning – no organization necessarily wants to be the first to successfully clone a human being (and face all of the ethical condemnation for that act), but they will certainly race to be the second.

As to what happens now, we are likely facing a showdown between large scale information brokers and the governments that putatively regulate them. The ethical quandary comes primarily in the fact that the Oracle graph was not an opt-in graph, and there’s no clear mechanism for it even being an opt-out graph. Because it is a private graph, there is also no way of knowing when you are referenced for anything from political orientation to spending habits to health problems.

As this data becomes more widely spread, especially given that such graphs make profiling trivial to accomplish. Were you denied a loan because of your own credit history, or because you fit into a demographic where there was a higher incidence than normal of default (a practice known as redlining)? Were you passed over for a job because you have diabetes or were Jewish (or worse, could have satisfied these conditions but didn’t), even when neither of these was information shared with a hiring manager? These practices already occur, but a private but universal key system could make them far worse

I’d argue that we don’t need another forum on AI Ethics – government-focused or otherwise, at least not yet; however, the need for a framework for identity management, universally recognized and enforceable, is something that needs to happen yesterday. Taking a laissez-faire approach, in effect doing nothing, is ultimately no solution.

With private, corporate identity control, comes classification (and both accidental and deliberate misclassification), the ability to decline house and car loans, influence job selection, and trigger audits. It creates powerful, discriminatory, roadblocks that are harder to fight because they are invisible.

Will such a framework help? Hard to say, though it may buy some time to develop a more equitable solution. Without it, AI simply becomes yet another intangible process that has far more influence over you than anyone would feel comfortable admitting.

In Media Res,

Kurt Cagle
Community Editor
Data Science Central

Data Science Central Editorial Calendar: September 2022 

Every month, I’ll update this section with many topics I’m especially looking for in the coming month. These are more likely to be featured in our spotlight area. If you are interested in tackling one or more of these topics, we have the budget for dedicated articles. Please contact Kurt Cagle for details. 

  • Generative AI (GANs, and NERFs)
  • Gaming AI
  • Sustainability and Climate AI
  • Education and AI
  • Web 5
  • Metaverse Next Steps
  • Weather Report: State of Cloud
  • Ethical AI

If you are interested in posting something else, that’s fine too, but these are areas that we believe are hot right now. 

DSC Featured Articles

Picture of the Week

DSC Weekly 05 July 2022: Standardizing a Metaverse