6 min read

Building the best baby name search with R

Searching for the perfect name

I have mentioned my fascination with names a few times before and how it led me to create an algorithm for making mashup names. I’m back at and now instead of making mashup names we’re actually being serious this time. I’m making a name search app to search for actual names that you might want to give your baby, pet, imaginary alien friend, fictional character, robot, or anything else you need to name.

Whenever I’ve searched for a name in the past I’ve usually looked in books which, naturally, list names alphabetically by gender. But what if you want to search differently? What if you know you like girls names that end in “ina”? Or boys names that end in “den”? Or girls names that contain the letter pattern “rose”? You don’t want to have to search through every starting letter in a book looking for all of those! That’s why we need a better baby name search, and that’s why I’m here now. Let’s get to work!

R packages needed

library(stringr)
library(rebus)
library(babynames)
library(dplyr)
library(ggplot2)
library(ggthemes)

In addition to dplyr which I use constantly, I need the stringr and rebus packages for R. stringr has string operation functions that are very useful for string patterns. rebus provides a simple way of building regular expressions, or regex (a sequece of characters that define a search pattern).

The babynames package has helped me immensely in all my name searching quests. It provides the data on names given to babies in the US, as reported by the social security adminstration.

I’ll be using ggplot2 and ggthemes to make a time series chart of name popularity.

Write the algorithm

Here is my function which I have called get_name_list(). The arguments are: 1. Starting letters 2. Ending letters 3. Contains pattern (i.e. “rose” anywhere in the name) 4. Start year 5. End year

get_name_list <- function(begins = "", ends = "", 
                          contains = "", gender, 
                          start_year, end_year) {
  start_pattern <- if_else(!is.null(begins), 
                           tolower(begins), "")
  end_pattern <- if_else(!is.null(ends), 
                         tolower(ends), "")
  contains_pattern <- if_else(!is.null(contains), 
                              tolower(contains), "")
  gender1 <- ifelse(gender == "both", c("F", "M"), 
                    toupper(gender))
 
  babynames %>% filter(str_detect(tolower(name), 
                                   pattern = START %R% start_pattern),
                       str_detect(tolower(name), 
                                   pattern = end_pattern %R% END),
                       str_detect(tolower(name), 
                                pattern = contains_pattern),
                       sex == gender,
                       year >= start_year,
                       year <= end_year)%>%
    group_by(name, sex) %>%      
    summarise(total = sum(n)) %>%
    arrange(desc(total))
}

I included starting and ending years, because the popularity of names changes a lot over the years. The most common names from 1881 might not be used that much now, and the most common names of today might not have ever been heard of in 1920!

Test it a few times

Let’s try a few examples. What if you want a modern girl’s name that starts with “Ja” and ends in “ina”? Try this:

get_name_list(begins = "Ja", ends = "ina", gender = "F", 
              start_year = 2000, end_year = 2017)
## # A tibble: 15 x 3
## # Groups:   name [15]
##    name       sex   total
##    <chr>      <chr> <int>
##  1 Jaina      F      1419
##  2 Jasmina    F       486
##  3 Janina     F       401
##  4 Jalina     F       380
##  5 Jaylina    F       126
##  6 Jalaina    F       117
##  7 Jamina     F        47
##  8 Jazmina    F        25
##  9 Jacquelina F        22
## 10 Jaelina    F        10
## 11 Jazlina    F         6
## 12 Jacina     F         5
## 13 Janaina    F         5
## 14 Jatina     F         5
## 15 Javina     F         5

Cool name list! I really like Jazlina.

Or maybe you want an old fashioned boy’s name that ends in “ter”?

get_name_list(ends = "ter", gender = "M", 
              start_year = 1900, end_year = 1910)
## # A tibble: 22 x 3
## # Groups:   name [22]
##    name      sex   total
##    <chr>     <chr> <int>
##  1 Walter    M     22593
##  2 Peter     M      5007
##  3 Lester    M      4917
##  4 Chester   M      4349
##  5 Sylvester M      1543
##  6 Buster    M       565
##  7 Foster    M       385
##  8 Porter    M       317
##  9 Carter    M       225
## 10 Webster   M       212
## # ... with 12 more rows

Now let’s test the “contains” argument. Maybe you like a certain grouping of letters together and don’t care if they’re in the start, middle, or end of the name? The “contains” argument’s got you! Here are all the girls names from the Generation X era that contain the pattern “ail”.

get_name_list(contains = "ail", gender = "F", 
              start_year = 1965, end_year = 1980)
## # A tibble: 57 x 3
## # Groups:   name [57]
##    name     sex   total
##    <chr>    <chr> <int>
##  1 Gail     F     17992
##  2 Abigail  F      9470
##  3 Aileen   F      3118
##  4 Maile    F       660
##  5 Laila    F       643
##  6 Hailey   F       497
##  7 Kaila    F       184
##  8 Nailah   F       168
##  9 Shaila   F       168
## 10 Abbigail F       150
## # ... with 47 more rows

Add a time series chart to track popularity

I like to see how popular names have been over time. When I was a kid the name “Heather” was pretty popular for girls, but now that I think about it, I can’t remember the last time I interacted with a person named Heather. Let’s make a bar chart to see how popular the name “Heather” has been through the years.

babynames %>%
  filter(name == "Heather" & sex == "F") %>%
  ggplot(aes(x = year, y = n)) +
  geom_col(fill = "magenta") +
  ggtitle("Popularity of the name Heather") +
  xlab("") +
  ylab("number of babies")+
  theme_wsj() +
  theme(
    panel.background = element_rect(fill = "gray95"),
    plot.background = element_rect(fill = "gray95"),
    strip.background = element_rect(fill = "gray95"),
    panel.grid.major.y = element_line( colour = "darkgray"),
    plot.title = element_text(size = 18)
  )

Wow, Heather really peaked in the 1970s and 1980s! I wonder how much impact the movie Heathers had on the name Heather? Or on the name Veronica? I’ll let you look that one up yourself with my shiny app.

Check out the shiny app and use it to name your baby!

I turned all this into a shiny app so you can search for your own names. Please check it out here, and let me know what you think!