These Students Are Using Data Science to Predict Which Rap Songs Will Become Hits
"Drop that science." — Ol' Dirty Bastard
With each passing year, hip-hop becomes a little more omnipresent in popular culture. Had the genre never been born in the Bronx, in New York City, today's Billboard Hot 100 would look completely different. As with most popular genres, hip-hop has fallen into some established lyrical patterns — but how much do these patterns determine the success of a song? Is there a science to creating a blockbuster rap?
A new algorithm created by students in the master of information and data science program at the University of California, Berkeley, suggests there may be. Using applied machine learning techniques and data science principles, Tony Abraham, Nikhita Koul and Joe Morales have created a tool that can predict whether or not a rap song will become a hit based on its lyrical content. It currently has a 71% success rate.
The students found that the biggest factors influencing a song's success are its level of profanity and its lyrical themes. However, these need to be considered alongside the song's release date, as hip-hop's standards have changed dramatically over hip-hop's more than 30-year history.
"What's a hit today may not have been a hit 20 or 30 years ago," Morales told Mic in a phone interview. "Most notably is the level of profanity in the songs. There's far more now than there was in the beginning."
Lyrical themes have also fluctuated over time, as the above chart proves. Rap songs preaching revolution were extremely popular in the early-to-mid-'80s, but not so much now.
Perhaps the most surprising takeaway is that rap songs with a wider range of lyrical themes actually have a bigger chance of hitting than songs that just discuss rappers' lifestyles. Performers who simply mention their cars, clothes, wants, desires, line of work and preferred recreational activities don't perform as well.
"I was surprised that popular songs covered any things beyond cars and women," Koul told Mic. "I was extremely happy to see these graphs. We do have all these very meaningful things to talk about. That was my learning — that popular rap goes way beyond the superficial."
The lyrics to any rap song — even one a user makes up on the spot — can be plugged into the algorithm to reveal the rap's hit potential. The algorithm will determine how much of the song discusses the rapper's lifestyle, how much discusses politics and medicine (read: drugs). The algorithm will evaluate the likelihood of the song charting dependent on these factors and the year of release.
It's highly sophisticated. Kendrick Lamar's "The Blacker the Berry," for example, charts high on the revolution, lifestyle and philosophy scales, but still the algorithm correctly predicts that it didn't become a hit. The algorithm also successfully identifies Wiz Khalifa's "See You Again," one of 2015's songs of summer, as a certified hit, comparing it with "What's Your Fantasy (Remix)" by Ludacris, "Chubby Boy" by Mannie Fresh and "Slow Jamz" by Twista.
Morales, Abraham and Koul also used their databases to determine the most popular brands and locations rappers mention while discussing their lifestyles. New York, Los Angeles, Atlanta and Chicago top the latter list, while Bentley, Apple, Chevy, Porsche and Twitter dominate the former. "Trigger fingers" are really turning to "Twitter fingers" as Drake asserted on "Back to Back."
The tool's architects admit there are plenty of other factors that go into determining a hit, such as the name recognition of the artist and the quality of the beat. Given more time and resources, the team would have liked to build out the algorithm to factor in these elements. As it stands, the algorithm's creators still see it having some functional applications.
"Our tool is not there to predict the success of songs by these big artists," Koul told Mic. "Our tool is there to determine for those just entering the mainstream to figure out the next Kanye West. What will be the next runaway song by somebody who is not known?"