OpenAI's Misstep with Scarlett Johansson's Voice: A Case Study in Tone Deafness
Today’s weird news relates to OpenAI’s tone deafness with its new text-to-speech system. Recently, OpenAI approached Scarlett Johansson to request the use of her voice for their latest model, GPT-4o. This decision was influenced by OpenAI CEO Sam Altman’s fondness for the movie "Her," where Johansson voices an AI companion to Joaquin Phoenix.
However, Johansson was not interested. During the negotiation, even before she could formally decline, OpenAI released the model with a voice strikingly like hers. The system, Sky, mimicked Johansson’s voice so well that it fooled her friends into believing it was her.
Understandably, Johansson lawyered up. OpenAI, recognizing the potential for a legal battle, quickly took down the voice. This scenario begs the question: how did OpenAI not foresee the backlash? It is bizarre to release a voice that closely resembles that of someone who explicitly denied permission to use theirs. This move not only disrespects Johansson but also amplifies the artistic community’s concerns about privacy, copyright, utilization, and consent.
These are crucial issues, especially when considering AI agents. If someone’s voice is used without their permission in an inappropriate manner, it veers dangerously close to the realm of deepfake pornography. The fact that OpenAI even put this voice out there initially is shocking. There are countless voices available for these systems; choosing one that mirrors Johansson’s without her consent is an absurd oversight.
OpenAI is staffed with incredibly smart individuals, so how did this happen? Perhaps they believed they could get away with it, or maybe they thought they were powerful enough to withstand any complaints. Essentially, OpenAI functionally stole from a very famous movie star. You can’t claim to love someone’s voice and then use a similar voice without explicit permission, regardless of how the voice was obtained.
It is entirely possible they used a different voice actress. However, the choice of a voice actress who sounds almost identical to Scarlett Johansson only adds to the confusion and egregiousness of the act. Johansson’s voice is distinctive, much like Clint Eastwood, Christopher Walken, or Sylvester Stallone on the male side. Finding an actor to mimic these voices and training a model based on that would be clearly identifiable and equally problematic.
The legal issues here are ambiguous, and the moral issues are equally murky. However, the publicity fallout is undeniable. OpenAI’s action has not only caused a public relations nightmare but also highlighted significant ethical concerns within the AI community.
This incident with OpenAI’s Sky system demonstrates a significant lapse in judgment and sensitivity to broader implications. It serves as a stark reminder of the necessity for consent and ethical considerations in the development and deployment of AI technologies.
Kristian Hammond
Bill and Cathy Osborn Professor of Computer Science
Director of the Center for Advancing Safety of Machine Intelligence (CASMI)
Director of the Master of Science in Artificial Intelligence (MSAI) Program