5 min read

Making Name Mashups in R

Creating name mashups in R

About five years ago I was pregnant with my daughter at the same time a coworker of mine was also expecting his first kid, and we spent a lot of time at work thinking of ridiculous name suggestions for each other. Our preferred method was to create hybrid mashups of standard names, such as “Jimothy”, “Stangelina”, and “Vladimiranda”. Somehow this game has persevered, and we still enjoy texting each other when we think of a new one. So when I studied text mining and string manipulation using R last year, I hoped to one day use my new skills to write a name mashup algorithm. I’m very pleased to say that I finally did it!

stringr, rebus, babynames packages

library(stringr)
library(rebus)
library(babynames)
library(dplyr)
library(Hmisc)

I’m using most of the same packages as I did in my previous post on baby name pattern matching in R. The stringr package has string operation functions that are very useful for string patterns. The rebus package provides a simple way of building regular expressions, or regex (a sequece of characters that define a search pattern). The babynames package contains a dataset of names given to babies born in the US for the years 1880 until 2015. I’m including the dplyr package for filtering purposes. Hmisc has a capitalize() function that I used to capitalize the first letter of all the returned mashup names.

Get a list of names to work with

I used the most recent data from the babynames package: a list of names give to babies born in the United States in 2015.

babynames_2015 <- filter(babynames, year == 2015)
all_names <- tolower(babynames_2015$name)

Create the algorithm

The algorithm I wrote employs the following steps:

  1. Extract letter patterns from a given name
  2. Search the all_names list for names that contain the same pattern
  3. Combine the original name with the matching names to create mashups
  4. Return a list of the mashups for a given name’s possible letter combinations

Step 1: pattern extraction functions

I wrote two pattern extraction functions: one to extract letters at the beginning of the name and one to extract letters at the end of the name. The n argument in both functions determines how many letters to use in the pattern. For the purposes of this exercise, I believe patterns of two or three will suffice to produce mashups.

#create first n letters pattern
first_n_pattern <- function(name1, n) {
  name1 = tolower(name1)
  if(n == 2) {
    str_extract(name1, pattern = START %R% LOWER %R% LOWER)
  } else if(n == 3) {
    str_extract(name1, pattern = START %R% LOWER %R% LOWER %R% LOWER)
  } 
}
#create last n letters pattern
last_n_pattern <- function(name1, n) {
  name1 = tolower(name1)
  if(n == 2) {
    str_extract(name1, pattern = LOWER %R% LOWER %R% END)
  } else if(n == 3) {
    str_extract(name1, pattern = LOWER %R% LOWER %R% LOWER %R% END)
  }
}

Steps 2 & 3: Find names that match the pattern & mash them up with original name

The pattern_names() function calls one of the pattern matching functions (first_n_pattern() or last_n_pattern) to create a pattern from the given name, then creates a list of names that match that pattern, and finally uses that list to create a new list of mashup names for the given pattern. It returns the list of mashup names for exactly one letter pattern.

#make names by calling the pattern functions
pattern_names <- function(startname, first_or_last, n) {
  if(first_or_last == "first") {
    pttrn = first_n_pattern(startname, n)
    match_names = str_subset(all_names, pattern = pttrn)
    names_splt = str_split(match_names, pattern = pttrn, simplify = TRUE)
    if(length(names_splt) > 0) {name_start = names_splt[,1]
    name_start = names_splt[str_length(name_start) > 0, 1]
    return(capitalize(unique(str_c(name_start, startname))))}
  } else if(first_or_last == "last") {
    pttrn = last_n_pattern(startname, n)
    match_names = str_subset(all_names, pattern = pttrn)
    names_splt = str_split(match_names, pattern = pttrn, simplify = TRUE)
    if(length(names_splt > 0)) {name_end = names_splt[,2]
    name_end = names_splt[str_length(name_end) > 0, 2]
    return(capitalize(unique(str_c(startname, name_end))))
    }
  }
}

Step 4: return list of mashups from multiple letter patterns

The final function, mashup_names() returns a list of mashups from different possible letter combinations at the start and end of the given name, by calling the pattern_names() function two or more times. Longer names are sliced into three-letter patterns, while three-letter names just use the first two and last two to find mashups.

mashup_names <- function(startname) {
  if(str_length(startname) > 3){
    results <- unique(c(pattern_names(startname, "last", 3), pattern_names(startname, "first", 3)))
  }
  else if(str_length(startname) > 2) {
    results <- unique(c(pattern_names(startname, "last", 2), pattern_names(startname, "first", 2)))
  }
  if(length(results) == 0) {
    return("Sorry, I didn't find any names to mash that up with! Try a different name?")
  }
  else {
    return(results)
  }
} 

That’s it! Time to try some examples!

Some examples

I could do this all day. Here are some examples:

mashup_names("taylor")
##  [1] "Taylory"      "Taylorelei"   "Tayloria"     "Taylorelai"  
##  [5] "Taylora"      "Tayloretta"   "Taylorence"   "Taylorena"   
##  [9] "Taylori"      "Taylorraine"  "Tayloren"     "Tayloralei"  
## [13] "Taylorie"     "Taylorna"     "Taylores"     "Tayloryn"    
## [17] "Tayloriana"   "Tayloralie"   "Tayloreal"    "Taylorencia" 
## [21] "Taylorali"    "Tayloralai"   "Taylorenza"   "Taylorah"    
## [25] "Taylore"      "Taylorin"     "Taylorielle"  "Tayloraine"  
## [29] "Taylorene"    "Taylorianna"  "Taylorraina"  "Taylorelle"  
## [33] "Tayloriann"   "Tayloralye"   "Tayloreli"    "Taylorel"    
## [37] "Taylorelie"   "Taylorianne"  "Taylorien"    "Tayloree"    
## [41] "Tayloreley"   "Tayloricely"  "Tayloreen"    "Taylorissa"  
## [45] "Taylorynn"    "Taylorentina" "Taylorious"   "Tayloralee"  
## [49] "Tayloran"     "Taylorilai"   "Tayloris"     "Taylorina"   
## [53] "Tayloriah"    "Taylorrie"    "Taylorryn"    "Taylorance"  
## [57] "Tayloralynn"  "Taylorayna"   "Taylordina"   "Tayloriel"   
## [61] "Taylorinda"   "Taylorenzo"   "Taylord"      "Taylorian"   
## [65] "Taylorne"     "Taylorencio"  "Taylorik"     "Taylorentino"
## [69] "Taylorenz"    "Taylorean"    "Taylorcan"    "Tayloreto"   
## [73] "Mataylor"     "Nataylor"     "Setaylor"     "Lataylor"    
## [77] "Dontaylor"    "Deontaylor"   "Davontaylor"  "Itaylor"     
## [81] "Devontaylor"  "Keontaylor"   "Altaylor"     "Javontaylor" 
## [85] "Lavontaylor"  "Montaylor"    "Temitaylor"

I love Taylorenzo!

mashup_names("kevin")
##  [1] "Kevina"     "Kevine"     "Kevinia"    "Kevinity"   "Kevincenza"
##  [6] "Kevinah"    "Kevingston" "Kevini"     "Kevinaya"   "Kevinisha" 
## [11] "Kevinee"    "Kevincent"  "Kevinna"    "Kevincy"    "Kevinnie"  
## [16] "Kevincenzo" "Kevince"    "Keving"     "Kevinny"    "Kevinh"    
## [21] "Kevinash"   "Kevino"     "Kevinson"   "Kevinci"    "Kevincente"
## [26] "Kevind"     "Kevinicio"  "Kevinicius" "Kevinn"     "Kevinay"   
## [31] "Kevinda"    "Kevincen"   "Kevinton"   "Kevinzent"  "Paraskevin"
## [36] "Markevin"

Kevincy is a great name!

Shiny app

I used this algorithm to make my first ever shiny app, so please feel free to try it with your own name. Enjoy!!