THE REGEX KING

jeff sisson's blog (email me)

elvis-tools

26 Jun 2018

I’ve had the itch to read an Elvis book for a while. I can’t really explain this. I’m not an Elvis fan and didn’t really know anything about him, but for whatever reason in 2018 “Elvis” felt like a blindspot I wanted to attempt to correct. I saw “Last Train to Memphis” on the shelves at my local library and checked it out.

Elvis xmas

“Last Train to Memphis” rarely treads far from the chronological telling of Elvis' life, from when he was born in Tupelo to when his mom dies at Graceland (there’s a second book covering the 2nd half of his life/career). Pretty early on in the book I realized that the prose was going to be completely stuffed with musical references that meant basically nothing to me as read on the page: country & western musicians, rhythm and blues singer/songwriters, Memphis-famous producers and DJ’s, etc. What’s nice is that these references aren’t even very Elvis heavy; they’re a spring mix of songs and artists he was exposed to growing up, connections he made touring with Hank Snow, musicians hired to write songs for him, musicians that re-recorded his songs with new twists.

I was initially tempted to search YouTube/Spotify for every unfamiliar song that got mentioned, but this started to feel sisyphean. I also felt hanging above my head the presence of something I think of as the “non-fiction shot clock”: if I don’t keep reading a non-fiction book at a reasonable pace, pretty quickly the book becomes unfinishable.

So an idea started developing in the back of my mind as I was reading: what if I can just mine the text for the musical footnotes after I’ve finished the book? This would allow me to carry on reading the book, confident with the knowledge that I’d catch up with all of the musical texture once I was done reading.

I managed to find a second-hand digital copy of the book which allowed me to process it as a plain text file, and set about trying to figure out the quickest path to extracting some of the musical annotations from the text. The heuristic I came up with for picking out songs was:

  • if it’s in double quotes
  • and the first word starts with a capital letter
  • and there’s something like 1 to 10 words

…it’s likely to be a song. Expressed as a regex this looked like:

grep -o -P "\"[A-Z](?:[A-Za-z0-9',]+[^A-Za-z0-9',\"]*){1,10}\""
# matches "Flip, Flop and Fly" or "Pins and Needles in My Heart,"

…to which I added further processing which favored quoted strings that had greater than 50% capital-case letters. This ruleset is a little lossy (I’m sure it misses some songs) but it provided a good starting point, narrowing things down to a list of 500 or so potential song candidates, located chronologically where they were found in the text (thus loosely following the chronological framing of Elvis' life in the book).

From there, I needed to further weed out false positives (lots of matches for “Elvis Presley” or headlines like “These Are the Cats Who Make Music for Elvis”), and also try to match song names with musicians' names:

“For You, My Love”

For this purpose I wrote a little utility which took each of the potential song names and looked nearby in the text for potential musician names, using something called named entity recognition (a computerized way of looking for names in parts of speech). From there, the tool presents all possible musician names for a given song and prompts you to choose the “correct” one. Many times the book would cite a song, where the whole paragraph described a long chain of artistic custody for who had written, recorded or licensed the song, so this was no trivial task!

After doing this data entry-ish task, I ended up with 96 or so songs paired with artist names. Despite the fact that we live in an era where there are multiple, competing, corporate, infinite jukeboxes, music metadata is still famously messy. There’s nothing like an ISBN or URL for a given song recorded by a given artist that might allow you to find it right away on Spotify or YouTube. The closest database that even approximates something like this (with open licensing) is Discogs, but its catalog isn’t completist in the same way that something like Wikipedia is. So to turn this list of songs/artists into something that I could play as music, I turned to YouTube search.

I experience YouTube as the only entity that in any way satisfies the spirit that Napster had when it first launched. I look for music on YouTube and it’s mostly just there, no matter how rare. As a cultural institution, YouTube feels like a shaky foundation on which to build a multimedia Library of Alexandria, but it’s the Library of Alexandria we have.

An experience that’s bound to be familiar to anyone seeking out music on YouTube is the set of snap judgements you make when trying to assess which of the YouTube search results contains the specific version of the song you’re looking for. I can’t relate to people who wax nostalgic for the experience of shopping for records in a record store, because we have something infinitely more insane and interesting on the YouTube search results page. Obscure video naming conventions. A whole set of aesthetics around video thumbnails. Impenetrable uploader and commenter jargon. The reputational marks of the uploader (filming a spinning record, “lyrics video”, ORIGINAL and RARE).

elvis youtube picker

Though YouTube has an official API, I chose instead to write a tool which paid tribute to this messy process of picking a YouTube: it scrapes the search results and shows the titles and thumbnails of potential videos, piping out the URL’s any videos picked onto the command line for reuse elsewhere.

The results of my questionable labor:

Write a comment...