Fun with Regex

R
python
regex
Author

Jun Ryu

Published

February 21, 2023

Some simple exercises with regular expressions

1. Using R

names.txt

names <- unlist(read.table("names.txt", sep = "\n"), use.names = FALSE)
names
 [1] "abc123"          "horribleTurtle"  "messsages"       "keep_it_simple" 
 [5] "hello world!"    "Zoran D Wang"    "myUsernameis210" "abc defg"       
 [9] "asml"            "john"            "Edward Lazowska" "123fionaFog"    
[13] "Red chihuahua5"  "1"               "CLEAN"           "+plus+"         
[17] "omaha poshy"     "0maha p0shy"     "OMAHA POSHY"    

(a)

Find all usernames that contain at least one numeric character.

names[str_detect(names, "[0-9]")]
[1] "abc123"          "myUsernameis210" "123fionaFog"     "Red chihuahua5" 
[5] "1"               "0maha p0shy"    

(b)

Find all usernames that are exactly four characters long and consist only of alphabetic characters.

names[str_detect(names, "^[a-zA-Z]{4}$")]
[1] "asml" "john"

(c)

Find all usernames following the conventional way of name format, i.e., the “given name” goes first, and the “family” name last, with any other names in-between. The names are separated by a single white space and each name should be uppercase letter followed by one or more lowercase letters.

names[str_detect(names, "^(?:[A-Z][a-z]+ )+(?:[A-Z][a-z]+)$")]
[1] "Edward Lazowska"

cards.txt

cards <- unlist(read.table("cards.txt", sep = "\n"), use.names = FALSE)
cards
 [1] "5123456789101112"         "4789 0123 8910 1112"     
 [3] "4444 9321 1230 3"         "5315 4011 1721 51"       
 [5] "4987 9381 2457"           "4891 0870 8908 70987"    
 [7] "5234 4567 8910 1112"      "58907890782309171"       
 [9] "3008 9078 1891 7890"      "5192 9295 91828818"      
[11] "4182 2884 1232 9582 2182"

Note
  • A Master card number begins with a 5 and it is exactly 16 digits long.
  • A Visa card number begins with a 4 and it is between 13 and 16 digits long.

(a)

Write a regex pattern to match valid Master card number and print all the valid numbers, grouped into sets of 4 separated by a single space.

pat_a <- "^([5][0-9]{3})\\s*([0-9]{4})\\s*([0-9]{4})\\s*([0-9]{4})$"
apply(str_match(cards[str_detect(cards, pat_a)], pat_a)[,2:5], 1, paste, collapse = " ")
[1] "5123 4567 8910 1112" "5234 4567 8910 1112" "5192 9295 9182 8818"

(b)

Write a regex pattern to match valid Visa card number and print all the valid numbers, grouped into sets of 4 separated by a single space.

pat_b <- "^([4][0-9]{3})\\s*([0-9]{4})\\s*([0-9]{4})\\s*([0-9]{1,4})$"
apply(str_match(cards[str_detect(cards, pat_b)], pat_b)[,2:5], 1, paste, collapse = " ")
[1] "4789 0123 8910 1112" "4444 9321 1230 3"   

passwords.txt

passwords <- unlist(read.table("passwords.txt", sep = "\n"), use.names = FALSE)
passwords
 [1] "1234567"        "12345678"       "Strings78"      "appleO07"      
 [5] "1brownie"       "asdfjkl"        "90095"          "glhf1234"      
 [9] "789afk"         "alllowercase"   "ALLUPPERCASE"   "missingNumbers"

(a)

Write a regex pattern to identify the passwords that satisfies the requirements below.

  • Minimum 8 characters
  • Must contain at least one letter
  • Must contain at least one digit
passwords[str_detect(passwords, "(?=.*[0-9])(?=.*[a-zA-Z]).{8}")]
[1] "Strings78" "appleO07"  "1brownie"  "glhf1234" 

(b)

Write a regex pattern to identify the passwords that satisfies the requirements below.

  • Minimum 8 characters
  • Must contain at least one uppercase character
  • Must contain at least one lowercase character
  • Must contain at least one digit
passwords[str_detect(passwords, "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).{8}")]
[1] "Strings78" "appleO07" 

wordlists.RData

Write regular expression patterns which will match all of the values in x and none of the values in y.

load("wordlists.RData")

(a)

all(str_detect(wordlists$Ranges$x, "^[a-f]+$")) == TRUE
[1] TRUE
any(str_detect(wordlists$Ranges$y, "^[a-f]+$")) == FALSE
[1] TRUE

(b)

all(str_detect(wordlists$Backrefs$x, "([a-z]{3}).*\\1")) == TRUE
[1] TRUE
any(str_detect(wordlists$Backrefs$y, "([a-z]{3}).*\\1")) == FALSE
[1] TRUE

(c)

all(str_detect(wordlists$Prime$x, "^(?!(xx+)\\1+$)")) == TRUE
[1] TRUE
any(str_detect(wordlists$Prime$y, "^(?!(xx+)\\1+$)")) == FALSE
[1] TRUE