Monday, April 18, 2016

Name Chains

Some people's last names sound like first names, such as Larry David.  Some have first names that sound like last names, like Taylor Swift.  And some have both, like Campbell Scott and Wallace Shawn.

FWIW
I downloaded a list of names of actresses and actors from IMDB and did some messing around with the data.  At first, I wanted to create chains of names, where the first name of each person matched the last name of the one that came before.  I think I'd get even better results if I could just extend my list to celebrities in general (not just IMDB actors and actresses), but so far I haven't found any better lists.

On second thought, I'll take the check.
There are a 3.6 million people in these lists, and most of them are quite obscure.  I'm not great with knowing celebrity names, but I think even the most die-hard movie memorizer would have trouble recognizing the names of all the "Documentary Couple" people in When Harry Met Sally, one of whom was apparently Peter Pan, and all of whom are in this database.  Plus - and I feel bad about this - the likelihood that I'll recognize an actor or actress by name decreases the longer it has been since that person graced the screen.  I figure recognizability is important in the kind of name chains I want to make, because that's what makes the chain interesting when compared, say, to a randomly generated list of names.

But how can I consider only the recognizable thespians?  I don't know a good way to do it, so I picked a bad way.  I created a "credit score" for each person, which does not at all reflect their quality as a performer, and honestly it's a pretty bad measure of fame, too.  But it was something.  It goes like this: for each credit they have in a TV episode or movie released in or after 1970, they get a number of points (1 point for a TV episode, 5 points for a direct-to-video movie, 10 points for a made-for-TV movie, and 20 points for a... standard(?)... movie).  Anyone with 30 or fewer points was discarded.  Maybe I could have linked their performances with ratings, orcounted how far down the cast list they were, or discounted multiple appearances on the same TV show, but I didn't. I did drop everyone for whom I couldn't parse out a first and last name, and people with the 'Jr.' suffix, because they won't fit well in a chain.

"I'm gonna play it safe. I'll wager $0."
Most of the top credit scores are porn stars, simply because they make a lot of movies.  I'll let you guess who tops that list.  Yes, that's right.  Skipping those, the top score goes to Shakti Kapoor (12423), a prolific Bollywood actor.  Skipping Bollywood as well, we get to Frank Welker (10544), a prolific voice actor.  Skipping voice actors too... Alex Trebek (8144), whose score was bumped by thousands of episodes of "Jeopardy!".  It goes on like this for a while.  Like I said, it's not a great algorithm.

Now, with only around 428,000 names, and a slightly better (but still terrible) chance of each name being known by an average reader, let's look at some numbers.  Here are the basic high-fives:

Top Five First Names, Actors
Michael4,420
David4,223
John4,030
Robert2,480
Paul2,217
Top Five First Names, Actresses
Anna1,110
Sarah1,105
Jennifer1,075
Maria981
Laura916
Top Five Last Names
Smith1,357
Lee1,249
Jones1,016
Johnson1,000
Williams979
For chaining, those would be great names to see switched.  Somebody named Smith Michael would be very handy.  Let's call his name "inverted".  Sadly, there's nobody with that name, but there are plenty of other inverted names.  I gave everyone an "inversion score": (F_L * L_F) / (F_F * L_L), where for instance F_L is the number of people whose last name matches your first name.  Low inversion scores include Jennifer Lopez (0.000004) and Chris Williams (0.000005).  Top score goes to King Jeff (4,884), but others include Rooney Mara (455) and Pruitt Vince (235), whose names definitely sound backward to me.

In this data set, there are around 285,000 "dead ends" - people whose last name matches nobody's first name.  Reedus, Gurira, or Cudlitz, for instance.  That means if I picked matching names randomly, there's a 2 in 3 chance at each link that the chain will end there.  If I picked each name optimizing for how many choices I'll have after that, it's going to be a very boring chain, full of Michael John, John Michael, Michael Paul, Paul John, John Paul, Paul Michael, and so on.  All real people, apparently.  It turns out this is a hint about how many branches there are in this system, and why my next idea won't work.

I opted to avoid a centipede metaphor.
I tried to automate chain-generation.  I wrote a program (hereafter abbreviated as IWAP) that searches for the top-scoring chains/cycles of a given length.  2-cycles are interesting because that's basically two people whose names are switched.  My favorite among those is Keith David/David Keith.  The problem is that the program takes too long for lengths greater than 2.  There are a ton of possibilities, and even if I just pick a few random ones, that's not really what I'm looking for - I want good ones.

Perhaps there are still some things at which humans can beat computers.  I'll just try doing it by hand.

Ok, maybe not entirely by hand.  IWAP that takes a name and lists everyone with that as their first name.  IWAP for last names, too.  In each case, it sorts the list by yet another score, this one the product of the person's score and the number of choices I would have for the next link.  With these programs I can just explore the tree of possibilities manually.  Here are some chains I've produced so far:

Michael Shannon Elizabeth Ashley Judd Nelson Franklin
Mia Sara Gilbert Gottfried John Oliver Platt
James Taylor Elizabeth Jordan Elizabeth Taylor James
Clark Gregg Henry Thomas (Ian) Nicholas Brendon

One more idea before I wrap this up.  A simple 2-person chain can be used as part of a puzzle.  If I give you the first name of one person, and the last name of another, it could clue the name I didn't give you, which they share.  For instance, Meg Reynolds would clue Ryan (at least, given the particular set of people I narrowed it down to).  In theory, another pair of names could clue another "middle" name, and the middles could be put together, and so on.  In practice I think it would be very difficult to construct such a tree of names and have it be solvable by hand.  For the simplest 2-person case, IWAP that takes a name and finds unique clue pairs, but once again lots of human intervention is necessary to prune the possibilities.  Here are some examples for you to try:

Kurt Crowe
Clive Teale
Aaron Sorvino
Samuel Rathbone
Brion Woods
Gene Osbourne
Currie Norton
Clint Hesseman
Diane Stapleton
Peter Mewes

No comments:

Post a Comment