Bias in Facial Classification ML Models
  1. Abstract
  • Abstract
  • 1  Introduction
  • 2  Data
  • 3  Methods
  • 4  Results
  • 5  Conclusions
  • References

Table of contents

  • Abstract

Bias in Facial Classification ML Models

Authors

Patrick Connelly

Grace Cooper

Bhavana Jonnalagadda

Carl Klein

Piya (Leo) Ngamkam

Dhairya Veera

Published

December 18, 2023

Abstract

Bias in how facial classification machine learning (ML) models label faces is a burgeoning problem; as the use of such models becomes widespread, it is more important than ever to identify the weaknesses in the models and how they could potentially discriminate against various class, like race, gender, or age. In this study, we run two widely used facial classification models (FairFace and DeepFace) on a popular face dataset (the UTKFace Dataset) and perform two sample proportion hypothesis tests – as well as evaluating model output using common ML performance metrics – in order to highlight and identify potential bias in the aforementioned classes. We found that DeepFace had significant bias in age and race, with white males being classified more accurately than other factor categories; FairFace performed significantly better with less detected bias, affirming the intended goal of FairFace being built specifically to be more “fair” (less biased) on various categories. The implications lead us to recommend more work to be done on improving facial classification ML models, in order for them to be equitable and fair to all humans they are run on.

Report PDF and Code Location

A link to download the PDF version of this report, a link to the Github source code for this report, and the Youtube presention are available as icons in the top nav bar of this website.

Buolamwini, Joy. 2023. “Gender Shades: Intersectional Accuracy Disparities in.” MIT Media Lab. https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification.
Georgetown Law. 2016. “The Perpetual Line-Up: Unregulated Police Face Recognition in America.” Center on Privacy & Technology. https://www.perpetuallineup.org.
Huilgol, Purva. 2021. “Accuracy vs. F1-Score - Analytics Vidhya - Medium.” Medium, December. https://medium.com/analytics-vidhya/accuracy-vs-f1-score-6258237beca2.
Karkkainen, Kimmo, and Jungseock Joo. 2021. “FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation.” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1548–58.
Lohr, Steve. 2018. “Facial Recognition Is Accurate, if You’re a White Guy.” N.Y. Times, February. https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html.
NIST. 2020. “NIST Study Evaluates Effects of Race, Age, Sex on Face Recognition Software \(\vert\) NIST.” NIST. https://www.nist.gov/news-events/news/2019/12/nist-study-evaluates-effects-race-age-sex-face-recognition-software.
Serengil, Sefik Ilkin, and Alper Ozpinar. 2021. “HyperExtended LightFace: A Facial Attribute Analysis Framework.” In 2021 International Conference on Engineering and Emerging Technologies (ICEET), 1–4. IEEE. https://doi.org/10.1109/ICEET53442.2021.9659697.
“UTKFace.” 2021. UTKFace. https://susanqq.github.io/UTKFace.
1  Introduction