Vocal Deepfakes: Why the Law Lacks Protection for Your Voice

By Max Kim

Generative artificial-intelligence’s rapidly growing capabilities reveal how America’s legal system lags behind modern innovations. Generative AI creates new content by analyzing and replicating patterns in previously created work, or other information.[1] Google’s text to speech AI, Tacotron 2, uses neural networks “trained using only speech examples and corresponding text transcripts” to create “human-like speech.”[2] Therefore, users of Tacotron 2 can theoretically make sound clips of whoever they want saying whatever they want.[3] Events in the past few years have brought to light the dangers and questions over how the government should regulate use of such a tool.

In 2020, Voice Synthesis, a non-monetized YouTube channel, used Tacotron 2 to create vocal deepfakes of musician Jay-Z covering musical works and then subsequently uploaded these creations onto the internet.[4] Later, Jay-Z would file a DMCA and copyright claim in a failed attempt to take these videos down from the internet.[5]

Now, why did Jay-Z’s claim fail? While the exact laws in question may vary between jurisdictions, federal case law for this issue is found prior to the invention of generative AI. In Midler v. Ford Motor Co., the plaintiff’s voice was imitated, without consent, in an advertisement made by the defendant.[6] The court held a “voice is as distinctive as a face” and is a tortious act when “deliberately imitated to sell a product.” [7] This decision was solidified four years later in Waits v. Frito-Lay, Inc., where the plaintiff’s voice was imitated for a commercial.[8] However a legal precedent has yet to be instated because, Jay-Z’s takedown requests were incomplete leading to Voice Synethesis’ controversial videos being reinstate.[9]

This revelation can be frightening when viewing it from the ordinary person. Currently, there is a limited amount of legislation that regulates very specific uses of generative AI or “deepfakes.”[10] However, in present day there is a clear lack of legislation that prohibits the use of vocal generative AI outside of the realm of imitation for monetary gain.[11] A person’s voice has already been held to be just as important as their face.[12] It is not a farfetched idea to imagine a bad actor using Tacotron 2 or a different generative AI to impersonate someone’s voice in the efforts of identity theft or to advance another crime. The government should adapt the law of the land to better suit this new age of generative AI.

[1] Generative AI, Nvidia, https://www.nvidia.com/en-us/glossary/data-science/generative-ai (last visited Sept. 25, 2023).

[2] Jonathan Shen and Ruoming Pang, Tacotron 2: Generating Human-Like Speech from Text, Google Research (Dec. 19, 2017), https://blog.research.google/2017/12/tacotron-2-generating-human-like-speech.html.   

[3] See id.

[4] Nick Statt, Jay Z Tries to Use Copyright Strikes to Remove Deepfaked Audio of Himself From YouTube, The Verge (Apr. 28, 2020, 6:38 PM), https://www.theverge.com/2020/4/28/21240488/jay-z-deepfakes-roc-nation-youtube-removed-ai-copyright-impersonation.

[5] Id.

[6] Midler v. Ford Motor Co., 849 F.2d 460, 461-62 (9th Cir. 1988).

[7] Id. at 463.

[8] Waits v. Frito-Lay, Inc., 978 F.2d 1093, 1112 (9th Cir. 1992).

[9] See Statt, supra note 4.

[10] See id.

[11] See Ivy Attenborough, Voice, Copyrighting and Deepfakes, IPWatchdog (Oct. 14, 2020, 7:15 AM)https://ipwatchdog.com/2020/10/14/voices-copyrighting-deepfakes/id=126232.

[12] See Midler, 849 F.2d at 463.