I was recently introduced to the stringr and rebus R packages when I took a string manipulation course from DataCamp. I thoroughly enjoyed pattern matching with regular expressions and after a few examples using the babynames data package, I was hooked on name patterns.
stringr, rebus, babynames packages
library(stringr)
library(rebus)
library(babynames)
library(dplyr)
library(pander)
The stringr package has string operation functions that are very useful for string patterns. The rebus package provides a simple way of building regular expressions, or regex (a sequece of characters that define a search pattern). The babynames package contains a dataset of names given to babies born in the US for the years 1880 until 2015. I’m including the dplyr package for filtering purposes.
Matching patterns in names
I’ve always had a fascination with names. When I was a kid I found a book of baby names my parents had used and went through every page over and over looking for names I liked. When I was pregnant with my daughter I did a lot of name research, gathering details like popularity, origins, and how people who have that name liked it. I kept this data in a sortable spreadsheet. I wasn’t using R much at the time and certainly didn’t know about the babynames package (if it even existed in 2013 – 2014!). It’s probably best that I didn’t have access to this information at the time, or I would likely have done nothing else for months.
For this exercise I’m going to look at the list of baby names from 2015, the most recent year in the set.
babynames_2015 <- filter(babynames, year == 2015)
boy_names <- filter(babynames_2015, sex == "M")$name
girl_names <- filter(babynames_2015, sex == "F")$name
Pattern searching on names can be so useful when you’re picking out a name. You may like names that start or end with a certain sound and want to explore new options that you may not have considered or even heard before.
Honestly this was so fun, I had a hard time narrowing down the examples to include here. Here are a few and their results. To save space I’m limiting the search results the top 75 names (by popularity) for each pattern.
Girls names ending in “ita”
The first pattern I tried matching was girls’ names that ended with the same pattern as my daughter Lita’s name.
ita <- "ita" %R% END
Let’s see what we get:
## 64 girls' names match the 'ita' pattern
Rita | Evita | Zita | Adrita | Melita |
Anita | Ishita | Janita | Advita | Nishita |
Margarita | Lolita | Marita | Amita | Shrita |
Juanita | Sarita | Carmelita | Eshita | Aashrita |
Nikita | Samhita | Gita | Sita | Aelita |
Lupita | Akshita | Lanita | Armita | Ashmita |
Anwita | Amrita | Ankita | Asmita | Brita |
Vita | Adelita | Nandita | Danita | Carlita |
Angelita | Bonita | Advaita | Jovita | Clarita |
Anvita | Estrellita | Alita | Sulamita | Larita |
Rosita | Lita | Lalita | Ghita | Nikkita |
Benita | Nita | Nakita | Markita | Paulita |
Anahita | Rishita | Teresita | Marquita | Rita |
Some pretty names!
Boys names rhyming with “Aiden”
I was thinking recently there have been a lot of boys’ names rhyming with Aidan and Jaden. This one is more complex than the “ita” example because there are several ways to spell the sound “aden”. Here is the search expression I wrote, accounting for the different possible spellings.
aden <- or("A", "a") %R%
zero_or_more(char_class("iey")) %R%
"d" %R%
or("a", "e") %R%
one_or_more("n") %R%
END
Here’s what we get from boys’ names:
## 142 boys' names match the 'aden' pattern
Aiden | Aden | Jaydan | Xayden | Laiden |
Jayden | Zaiden | Jaeden | Zaeden | Jaydenn |
Brayden | Adan | Zaden | Bladen | Braedan |
Ayden | Braden | Grayden | Khayden | Graden |
Kayden | Raiden | Drayden | Aeden | Jakaiden |
Kaiden | Rayden | Aiyden | Kaydan | Raeden |
Hayden | Aaden | Aaiden | Braydan | Shayden |
Kaden | Braeden | Xaiden | Jakayden | Xaden |
Aidan | Braiden | Caeden | Payden | Kaydenn |
Caden | Aydan | Layden | Zaydan | Khaden |
Zayden | Kaeden | Haden | Jaidan | Jakaden |
Jaden | Haiden | Taiden | Khaiden | Paiden |
Cayden | Aayden | Aydenn | Paden | Trayden |
Jaiden | Aedan | Kaidan | Zaidan | Vaden |
Caiden | Tayden | Blayden | Draiden | Caidan |
That’s a lot of different spelling options and starting sounds!
Girls names starting with J and ending in N sound
Now let’s say you want to find a girl’s name that starts with the J sound and ends with an N sound. Let’s use this pattern:
j_n <- or("J", "Gi", "Ge") %R%
one_or_more(WRD) %R%
"n" %R%
optional("e") %R%
END
And the output:
## 320 girls' names match the 'j_n' pattern
Jasmine | Jayden | Jaylyn | Jaiden | Jasmyn |
Josephine | Joselyn | Jaclyn | Jacelyn | Jaden |
Jocelyn | Jaylene | Jaylen | Justine | Jayne |
Jordyn | Jaylynn | Jailyn | Joan | Jacelynn |
Jacqueline | Jolene | Jaqueline | Josselyn | Jensen |
Jordan | Jazlynn | Jackeline | Jaidyn | Jaslene |
June | Joslyn | Jaslyn | Joanne | Jesslyn |
Jane | Jasmin | Geraldine | Jacklyn | Jean |
Jayleen | Jordynn | Jordin | Joann | Jailyne |
Jazmin | Julianne | Gillian | Jhene | Joselin |
Jillian | Jaylin | Josslyn | Jailynn | Jazleen |
Jazmine | Jocelynn | Joslynn | Jalynn | Joseline |
Jazlyn | Jacquelyn | Jessalyn | Jazzlynn | Jackelyn |
Jaelynn | Jazmyn | Jadyn | Jasleen | Jadelyn |
Jaelyn | Jazzlyn | Jaslynn | Jazmyne | Jameson |
So many to choose from! Btw I have a search pattern to find all the different spellings of Jasmine (including those that start with a “Y”) if anyone is interested!
Boys names ending in “ter”
Suppose you were looking for a boy’s name and you like names ending in “ter” and want to see some options. This is an easy search pattern:
ter <- "ter" %R%
END
So here’s what we get:
## 52 boys' names match the 'ter' pattern
Carter | Chester | Sutter | Master | Slayter |
Hunter | Alister | Winter | Alter | Caster |
Karter | Kharter | Aleister | Macallister | Forester |
Peter | Sylvester | Mister | Daxter | Hollister |
Walter | Jeter | Jupiter | Dieter | Kester |
Porter | Baxter | Coulter | Winchester | Richter |
Dexter | Cutter | Kutter | Wynter | Silvester |
Foster | Slater | Buster | Jacarter | Webster |
Colter | Allister | Gunter | Pieter | Cotter |
Lester | Kolter | Shooter | Ritter | Holter |
I like a lot of those!
Names that rhyme with “Cory”
Names that rhyme with Cory or Rory occur a lot among both boys and girls. This time I’ll build the search pattern and run it on both boys and girls names.
ory <- START %R%
one_or_more(negated_char_class("aeiouAEIOU")) %R%
or("o", "au") %R%
one_or_more("r") %R%
one_or_more(char_class("iey")) %R%
END
Ladies first:
## 53 girls' names match the 'ory' pattern
Kori | Story | Norie | Korrie | Torey |
Rory | Corrie | Torrie | Dori | Coree |
Tori | Zori | Jorie | Cory | Glorie |
Nori | Tory | Jory | Kauri | Korri |
Lori | Rorie | Corey | Koree | Lorrie |
Cori | Jori | Korey | Zorie | Torrey |
Rori | Khori | Gauri | Khorie | Dorie |
Korie | Stori | Torri | Lorie | Joree |
Glory | Corie | Kory | Corri | Laury |
Laurie | Storie | Torie | Rorey | Maurie |
And now gentlemen:
## 29 boys' names match the 'ory' pattern
Rory | Jory | Corry | Kmauri | Dmauri |
Corey | Torrey | Glory | Koree | Maury |
Cory | Kori | Nori | Korie | Nore |
Korey | Torey | Tori | Mauri | Rauri |
Kory | Cori | Corrie | Torre | Rhory |
Tory | Corie | Correy | Torry | NA |
I see some overlap and some exclusive to either boys or girls. Are girls names more likely to end in “ee” or “i” or “ie” than boys names? It seems like they might be, but I’m not making any conclusions today.
I want to keep doing this
I could literally do this all day and this post could easily have been three times as long!