6 min read

Baby name pattern matching in R

I was recently introduced to the stringr and rebus R packages when I took a string manipulation course from DataCamp. I thoroughly enjoyed pattern matching with regular expressions and after a few examples using the babynames data package, I was hooked on name patterns.

stringr, rebus, babynames packages

library(stringr)
library(rebus)
library(babynames)
library(dplyr)
library(pander)

The stringr package has string operation functions that are very useful for string patterns. The rebus package provides a simple way of building regular expressions, or regex (a sequece of characters that define a search pattern). The babynames package contains a dataset of names given to babies born in the US for the years 1880 until 2015. I’m including the dplyr package for filtering purposes.

Matching patterns in names

I’ve always had a fascination with names. When I was a kid I found a book of baby names my parents had used and went through every page over and over looking for names I liked. When I was pregnant with my daughter I did a lot of name research, gathering details like popularity, origins, and how people who have that name liked it. I kept this data in a sortable spreadsheet. I wasn’t using R much at the time and certainly didn’t know about the babynames package (if it even existed in 2013 – 2014!). It’s probably best that I didn’t have access to this information at the time, or I would likely have done nothing else for months.

For this exercise I’m going to look at the list of baby names from 2015, the most recent year in the set.

babynames_2015 <- filter(babynames, year == 2015)
boy_names <- filter(babynames_2015, sex == "M")$name
girl_names <- filter(babynames_2015, sex == "F")$name

Pattern searching on names can be so useful when you’re picking out a name. You may like names that start or end with a certain sound and want to explore new options that you may not have considered or even heard before.

Honestly this was so fun, I had a hard time narrowing down the examples to include here. Here are a few and their results. To save space I’m limiting the search results the top 75 names (by popularity) for each pattern.

Girls names ending in “ita”

The first pattern I tried matching was girls’ names that ended with the same pattern as my daughter Lita’s name.

ita <- "ita" %R% END

Let’s see what we get:

## 64  girls' names match the 'ita' pattern
Rita Evita Zita Adrita Melita
Anita Ishita Janita Advita Nishita
Margarita Lolita Marita Amita Shrita
Juanita Sarita Carmelita Eshita Aashrita
Nikita Samhita Gita Sita Aelita
Lupita Akshita Lanita Armita Ashmita
Anwita Amrita Ankita Asmita Brita
Vita Adelita Nandita Danita Carlita
Angelita Bonita Advaita Jovita Clarita
Anvita Estrellita Alita Sulamita Larita
Rosita Lita Lalita Ghita Nikkita
Benita Nita Nakita Markita Paulita
Anahita Rishita Teresita Marquita Rita

Some pretty names!

Boys names rhyming with “Aiden”

I was thinking recently there have been a lot of boys’ names rhyming with Aidan and Jaden. This one is more complex than the “ita” example because there are several ways to spell the sound “aden”. Here is the search expression I wrote, accounting for the different possible spellings.

aden <- or("A", "a") %R% 
  zero_or_more(char_class("iey")) %R% 
  "d" %R% 
  or("a", "e") %R% 
  one_or_more("n") %R% 
  END

Here’s what we get from boys’ names:

## 142  boys' names match the 'aden' pattern
Aiden Aden Jaydan Xayden Laiden
Jayden Zaiden Jaeden Zaeden Jaydenn
Brayden Adan Zaden Bladen Braedan
Ayden Braden Grayden Khayden Graden
Kayden Raiden Drayden Aeden Jakaiden
Kaiden Rayden Aiyden Kaydan Raeden
Hayden Aaden Aaiden Braydan Shayden
Kaden Braeden Xaiden Jakayden Xaden
Aidan Braiden Caeden Payden Kaydenn
Caden Aydan Layden Zaydan Khaden
Zayden Kaeden Haden Jaidan Jakaden
Jaden Haiden Taiden Khaiden Paiden
Cayden Aayden Aydenn Paden Trayden
Jaiden Aedan Kaidan Zaidan Vaden
Caiden Tayden Blayden Draiden Caidan

That’s a lot of different spelling options and starting sounds!

Girls names starting with J and ending in N sound

Now let’s say you want to find a girl’s name that starts with the J sound and ends with an N sound. Let’s use this pattern:

j_n <- or("J", "Gi", "Ge") %R% 
  one_or_more(WRD) %R% 
  "n" %R% 
  optional("e") %R% 
  END

And the output:

## 320  girls' names match the 'j_n' pattern
Jasmine Jayden Jaylyn Jaiden Jasmyn
Josephine Joselyn Jaclyn Jacelyn Jaden
Jocelyn Jaylene Jaylen Justine Jayne
Jordyn Jaylynn Jailyn Joan Jacelynn
Jacqueline Jolene Jaqueline Josselyn Jensen
Jordan Jazlynn Jackeline Jaidyn Jaslene
June Joslyn Jaslyn Joanne Jesslyn
Jane Jasmin Geraldine Jacklyn Jean
Jayleen Jordynn Jordin Joann Jailyne
Jazmin Julianne Gillian Jhene Joselin
Jillian Jaylin Josslyn Jailynn Jazleen
Jazmine Jocelynn Joslynn Jalynn Joseline
Jazlyn Jacquelyn Jessalyn Jazzlynn Jackelyn
Jaelynn Jazmyn Jadyn Jasleen Jadelyn
Jaelyn Jazzlyn Jaslynn Jazmyne Jameson

So many to choose from! Btw I have a search pattern to find all the different spellings of Jasmine (including those that start with a “Y”) if anyone is interested!

Boys names ending in “ter”

Suppose you were looking for a boy’s name and you like names ending in “ter” and want to see some options. This is an easy search pattern:

ter <- "ter" %R% 
  END

So here’s what we get:

## 52  boys' names match the 'ter' pattern
Carter Chester Sutter Master Slayter
Hunter Alister Winter Alter Caster
Karter Kharter Aleister Macallister Forester
Peter Sylvester Mister Daxter Hollister
Walter Jeter Jupiter Dieter Kester
Porter Baxter Coulter Winchester Richter
Dexter Cutter Kutter Wynter Silvester
Foster Slater Buster Jacarter Webster
Colter Allister Gunter Pieter Cotter
Lester Kolter Shooter Ritter Holter

I like a lot of those!

Names that rhyme with “Cory”

Names that rhyme with Cory or Rory occur a lot among both boys and girls. This time I’ll build the search pattern and run it on both boys and girls names.

ory <- START %R% 
  one_or_more(negated_char_class("aeiouAEIOU")) %R% 
  or("o", "au") %R% 
  one_or_more("r") %R% 
  one_or_more(char_class("iey")) %R%
  END

Ladies first:

## 53  girls' names match the 'ory' pattern
Kori Story Norie Korrie Torey
Rory Corrie Torrie Dori Coree
Tori Zori Jorie Cory Glorie
Nori Tory Jory Kauri Korri
Lori Rorie Corey Koree Lorrie
Cori Jori Korey Zorie Torrey
Rori Khori Gauri Khorie Dorie
Korie Stori Torri Lorie Joree
Glory Corie Kory Corri Laury
Laurie Storie Torie Rorey Maurie

And now gentlemen:

## 29  boys' names match the 'ory' pattern
Rory Jory Corry Kmauri Dmauri
Corey Torrey Glory Koree Maury
Cory Kori Nori Korie Nore
Korey Torey Tori Mauri Rauri
Kory Cori Corrie Torre Rhory
Tory Corie Correy Torry NA

I see some overlap and some exclusive to either boys or girls. Are girls names more likely to end in “ee” or “i” or “ie” than boys names? It seems like they might be, but I’m not making any conclusions today.

I want to keep doing this

I could literally do this all day and this post could easily have been three times as long!