Subscribe to DSC Newsletter

AI’s Ethical Dilemma – An Unexpectedly Urgent Problem

Summary:  In just the last 10 months based only on facial characteristics deep learning has been used to predict who is a criminal and who is gay.  These are rigorous, peer reviewed studies published in academic journals.  How should this knowledge be used and how will the public react?


Were you as freaked out as I was when earlier this month two well qualified data scientists from Stanford said they could predict whether someone is gay or straight from just their picture?

I am a solid technology optimist but in just the last 10 months data scientists have used deep learning AI to predict not only sexual orientation but also whether someone is a criminal??

These two studies conducted by well-meaning data scientists are exactly the uses of AI that will cause the public to wonder how much leeway data science should be given for applications that potentially impact everyone in society.  In brief, those two studies are:

  1. Predicting Criminality: A peer reviewed study released out of China last November that reported 89.51% accuracy in identifying criminals from non-criminals based on facial recognition alone. (Automated Inference on Criminality Using Face Images, Xiaolin Wu, McMaster Univ. and Xi Zhang, Shanghai Jiao Tong Univ. Nov. 21, 2016).


  1. Predicting Sexual Orientation: A peer reviewed study from Stanford just this month reporting 91% accuracy in distinguishing between gay and heterosexual men (83% for women) also based solely on facial recognition.  (Deep Neural Networks Can Detect Sexual Orientation from Faces, Yilun Wang, Michal Kosinski, Stanford University, Sept. 12, 2017).


It should require no explanation why these applications of deep learning are socially controversial.

Are There Other Areas of AI About Which We Should Be Worried?

If you follow the popular press and read comments about the public’s concerns about AI they tend to be clustered around two thoughts.

  1. AI enabled systems will develop ‘opinions’ that cause them to be bigoted or biased in some way not to our advantage.
  2. AI enabled systems will exceed our own grasp of knowledge of the world and take actions, once again not to our advantage.

For those of us here on the wizard’s side of the curtain we know these are not legitimate concerns.  AIs cannot ‘know’ more than the information that we give them.  Their grasp of systems, or more broadly of ‘reality’ cannot exceed their training data.

Take for example the debacle of Microsoft’s early chatbot Tay.  In 2016 the version implemented in Japan was wildly successful.  In the US, some practical jokers (to give them the benefit of the doubt) started communicating with Tay pro-Hitler and wildly permissive sexual comments from which she ‘learned’ that this was the correct way to interpret the world.  Microsoft had to take Tay down within 16 hours of her introduction.  Tay was a victim of her training data, not a bot formulating an independent opinion.

Similarly reinforcement learning combined with image recognition has resulted in AIs that win at Go and can regularly beat humans at a full range of Atari games (not to mention this is the technology behind your self-driving car).  It is true that reinforcement learning can result in algorithms that can perform better than humans in certain systems, but those AIs cannot reach out independently to learn beyond the realms we provide for training.  In fact, change their sensors or their actuators and they can’t adapt.  They have no imagination.


What Is the Real Source of Risk?

These commonly misunderstood memes are not a source of risk or ethical conflict.  Where ethical conflict actually arises is from the true and productive capabilities of deep learning in image and speech recognition.

In a sense this is like atomic energy in the 50’s.  There is the promise of so much good that can result from speech and image recognition AI systems, but now we see that they can also be weaponized.  A society that utilizes deep learning systems to identify and penalize individuals for their supposed criminality or their sexual orientation could almost instantly change the public’s opinion about the value of our most promising areas of innovation.

It is easy to project forward to other potential societal abuses such as making hiring decisions or allocating health care resources based on yet unachieved deep learning analysis of our DNA.


About These Specific Studies

The Criminality Study

This is a peer reviewed study from a major institution conducted by well qualified researchers.  The data science and techniques utilized appear sound.  Read our original analysis of this experiment here.

It was based on facial image recognition from 1,856 ID photos that satisfy the following criteria: Chinese, male, between ages of 18 and 55, no facial hair, no facial scars or other markings known to be convicted criminals of both violent and non-violent crime, compared to ID photos of 1,126 non-criminals with similar socio-economic profiles.

The data science and techniques utilized appear sound.  Wu and Zhang built four classifiers using supervised logistic regression, KNN, SVM, and CNNs with all four techniques returning strong results, the best by the CNN and SVM versions.


Since it is widely understood that CNNs can sometimes be deceived and may focus on factors not intended, the researchers also used the technique of introducing gaussian noise into the images with only about a 3% fall off in accuracy.

In summary, the study was conducted with rigor.  Wu and Zhang make no comment on the implications of the study beyond the data science.


The Sexual Orientation Study

Jokes about the general population’s ability to determine sexual orientation of others by sight abound but in this carefully controlled study Yilun Wang and Michal Kosinski created a control group that showed that human judges could determine preference by sight only 61% of the time for men and 54% for women.  Not that much better than a coin toss.

However, their deep neural net classifier when shown five images for each individual could correctly classify men 91% of the time, and women 83% of the time.

The paper shows a rigorous approach using 130,741 images of 36,630 men and 170,360 images of 38,593 women between the ages of 18 and 40 obtained from dating web sites and who self-reported their orientation as gay or heterosexual.  The study was limited to US located Caucasians.

The deep neural net utilized was a program called VGG-Face that had previously been trained on 2.6 million faces to recognize unique individuals based on 4,096 unsupervised attributes.  The classifier was a simple logistic regression with dimensionality reduction via singular value decomposition (SVD).

Unlike the Chinese team, Wang and Kosinski specifically conclude that their study “exposes a threat to the privacy and safety of gay men and women”.


Privacy versus Convenience

One of the great emerging conversations in this area is about privacy versus convenience.  There is a small but vocal minority of citizens who object to giving their data away on-line or to being tracked in the real world by camera, phone, or other mechanism.  The fact though is that this huge amount of data provides a level of convenience for all of us never before imaginable.

Not only do our applications understand and show us only what is statistically likely to please us, but this also dramatically increases the effectiveness of advertising, reducing the cost of goods sold.  The great majority of us would miss this if it were gone.

But on three counts we need to have a deeper public conversation about this.


Correlation versus Causation:  Particularly in the criminality study there is no attempt to discriminate between correlation and causation.  We do not live in the Matrix.  We are not going to arrest people based on this correlation.  But it is fair to ask how our policing agencies will respond if they have this capability.


The Importance of Error Rates:  Statistical methods will always have error rates that we on the practitioner side can quantify pretty readily.  If the error results in us seeing an ad in which we’re not interested then no harm.  If a false positive causes us to be classified as a potential criminal or discriminated against for sexual orientation that is quite another matter.


How Pervasive Tracking Has Become:  Most of us know and accept that our browsing is tracked and that our phone also provides our physical location stored by data services.  Not many of us are equally aware of the vast amount of facial recognition video footage that is created daily by private and government sources. 

Private companies and police agencies are also using license plate scanners, including those located on Ubers, taxis, and other non-official vehicles that average a dozen scans per day for each of the 250 million cars in the US.  Data that is then sold to law enforcement as well as private data marketers providing a uniquely accurate picture of our physical travels during each day.


Where Do We Go From Here?

Like the scientists on the Manhattan Project, no one is suggesting that these rigorous studies should not have been conducted.  Like all scientific studies they will need to be confirmed by other researchers.

What we are learning is that there are features hidden in our faces which our deep learning techniques are better at detecting than humans are.  In fact it is the accuracy and insight of deep learning that we value.

For the time being we need to have a conversation about how much we are tracked, particularly without our knowledge or agreement.  In the near future we may also need to have a conversation with our government about what applications of that data are acceptable and which are not or risk a public backlash that could derail the use of our best new techniques.


About the author:  Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist since 2001.  He can be reached at:

[email protected]

Views: 4885

Tags: AI, deep learning, ethics, image recognition


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jesse Russell on February 12, 2020 at 11:24am

Thank you for this post. A big part of the difficulty with creating codes or rules for ethics is that in real life data science there are often multiple trade-offs. Not all objectives can be simultaneously maximized. Often different ideas of fairness or equity are in tension with each other. It can be hard to make the likelihood of a particular machine learning prediction equal for all groups, to make the likelihood of experiencing an outcome given a prediction the same for all groups, the likelihood of a false positive (or true positive) the same for all groups. And it is hard to solve each of those inequalities all at the same time with the same model. The ethical solution here is not exactly about getting it right, but is instead being clear on your mission and values, how you decided to make trade-offs, and how you made decisions around which goals might be less important than others. Ultimately, ethics in data science have to be an on-going, diverse, collaborative discussion about how values and ethics apply to messy real world situations.

Jesse Rio Russell, PhD

Big Picture Research and Consulting

Comment by Paul Bremner on March 12, 2018 at 11:10am

Great post, Bill.  I'm kind of stunned that this stuff is taking place, particularly here in the Bay Area (i.e. Stanford.)  You rightly say the following in regard to these two studies: "It should require no explanation why these applications of deep learning are socially controversial."

Apparently neither Stanford nor the American Psychological Association agrees with that comment (see below) when they put their imprimatur on the study regarding sexual orientation.  (It looks like the Institutional Research Board at Stanford is devoted to protecting human subjects in research.  I guess that doesn't include considering the broader ethical implications of this type of research.) 

Graduate School of Business, Stanford University, Stanford, CA94305, USA
[email protected]
The study has been approved by the IRB at Stanford University

©American Psychological Association, 2017. This paper is not the copy of record
and may not exactly replicate the authoritative document published in the APA


Comment by Carenne Ludena on September 22, 2017 at 3:41am
I liked your article very much. It is the kind of serious treatment of a very complex reality that is needed. Thank you.
Comment by Suresh Babu on September 21, 2017 at 10:28am
Bill: Thank you! This is an excellent blog!
This is a huge area of concern because of how results of imprecise notions & human constructs like criminality and the poorly understood spectrum of sexual orientation are being presented by AI researchers.  The ethical challenges go beyond the usual ones in analysis like data modeling, training, error rates to the basic point if the cause/effect has any relationship at all to the inference frame (e.g. facial analysis).
The human brain has been forged to be the most acute analytics engine for facial recognition and emotional inference for obvious reasons. But even humans cannot go beyond a random coin toss in these areas (as studies have shown).  Our social brain has been forged to supply higher orders of inference because face reading is just one point in one's psychological assessment (a tremor in someone's voice make have a different emotional content; body language can present a different inference; and so on).
Law and criminality are human constructs not biological ones.  What passes for criminality in the dark ages is not criminal behavior in an enlightened one.  Our faces do not change when laws are rewritten. This effort is not new.  In the late 19th/early 20th, ideas of social darwinism (fed by beliefs of supremacy) produced lot of strange theories about intrinsic criminality and “racial” tendencies.  All of which were false.  But significant damage was done.  The problem with criminality studies is the very definition of criminal behavior.
In sexual orientation studies the analysis is based on self reporting behavior and the basic question here is how reliable are declarations of heterosexual orientation.  Also mapping the sexuality spectrum into binary classification.
This is just the framing of the analysis.  When is comes to uses, the ethic problems are indeed vast where institutions can end up making inferences about “criminality” and “orientation” unleash an AI engine to classify and judge people.
I’m waiting for the next study claiming to use AI to classify people as “crazy" and “normal”.
Very urgent need for folks to come together to discuss ethical dilemma as well as the existential crisis posed by AI.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service