Jay-Z’s Roc Nation recently filed copyright strikes against a YouTube channel that posted audio deepfakes of the mogul rapping, among other things, the “To Be or Not to Be” soliloquy from Hamlet and Billy Joel’s “We Didn’t Start the Fire.” The videos were briefly taken down from YouTube over the weekend, but as The Verge reports, they’ve been reinstated due to an incomplete takedown request. A representative from Google told The Verge they could be pulled again, “pending additional information from the claimant.”
The channel, Voice Synthesis, posts silly AI-generated deepfakes, usually juxtaposing historical figures or entertainers with a surprising audio companion. Jay-Z, like many of Voice Synthesis’ other subjects, also raps the Navy Seals copypasta meme and excerpts from the Book of Genesis. George W. Bush raps 50 Cent’s “In Da Club”; JFK recites the meme about Rick and Morty going over your head. It’s in the same realm of comedy as those JibJab political parody videos that went early-internet viral.
Andy Baio’s blog Waxy published a thorough overview of the channel on Tuesday, and spoke to its anonymous owner about the takedown. Voice Synthesis operates by feeding speech and lyric samples into Tacotron 2, a free Google AI text-to-speech model. Jay-Z’s all-encompassing entertainment hub attached the following statement to their copyright order: “This content unlawfully uses an AI to impersonate our client’s voice.”
But this raises some questions about how, if at all, Voice Synthesis violates porous copyright laws. The videos aren’t purporting to be anything other than deepfakes, with a disclaimer in the caption that says as much: “The voice in this video was entirely computer-generated using a text-to-speech model trained on the speech patterns of Jay-Z.” Given that it isn’t replicating an existing Jay-Z track or purporting to be him on a new song that generates revenue, this might all register no differently than Jay Pharoah impersonating him on Saturday Night Live in the eyes of the law.
As Baio puts it: “Vocal Synthesis is an anonymous and non-commercial project, not monetizing the channel with advertising and no clear financial benefit to the creator, and the impact on the market value of Jay-Z’s discography is non-existent.” Although fair use is difficult to parse and varies from case to case, it’s likely that Vocal Synthesis’ repurposing of popular songs and text excerpts might be covered since these are clearly differentiated from the original copy.
Deepfakes are largely considered a sinister force, with their potential to convincingly edit the faces of women onto the bodies of adult film stars or create politically defamatory material. But Voice Synthesis and other innocuous uses of artificial intelligence could become increasingly common as entertainment, if they can survive the morass of copyright law. “I believe that there are a lot of potential positive uses of this technology, especially as it gets more advanced. It’s possible I’m wrong, but for now at least I’m not convinced that the potential negative uses will outweigh that,” the channel’s owner told Baio. Much like any technology that approaches its saturation point, deepfakes may become universal to the point that their reputation only relies on the back of individual creators on a specific basis.