Some simple exercises with regular expressions …
1. Using R
names.txt
names <- unlist (read.table ("names.txt" , sep = " \n " ), use.names = FALSE )
names
[1] "abc123" "horribleTurtle" "messsages" "keep_it_simple"
[5] "hello world!" "Zoran D Wang" "myUsernameis210" "abc defg"
[9] "asml" "john" "Edward Lazowska" "123fionaFog"
[13] "Red chihuahua5" "1" "CLEAN" "+plus+"
[17] "omaha poshy" "0maha p0shy" "OMAHA POSHY"
(a)
Find all usernames that contain at least one numeric character.
names[str_detect (names, "[0-9]" )]
[1] "abc123" "myUsernameis210" "123fionaFog" "Red chihuahua5"
[5] "1" "0maha p0shy"
(b)
Find all usernames that are exactly four characters long and consist only of alphabetic characters.
names[str_detect (names, "^[a-zA-Z]{4}$" )]
(c)
Find all usernames following the conventional way of name format, i.e., the “given name” goes first, and the “family” name last, with any other names in-between. The names are separated by a single white space and each name should be uppercase letter followed by one or more lowercase letters.
names[str_detect (names, "^(?:[A-Z][a-z]+ )+(?:[A-Z][a-z]+)$" )]
cards.txt
cards <- unlist (read.table ("cards.txt" , sep = " \n " ), use.names = FALSE )
cards
[1] "5123456789101112" "4789 0123 8910 1112"
[3] "4444 9321 1230 3" "5315 4011 1721 51"
[5] "4987 9381 2457" "4891 0870 8908 70987"
[7] "5234 4567 8910 1112" "58907890782309171"
[9] "3008 9078 1891 7890" "5192 9295 91828818"
[11] "4182 2884 1232 9582 2182"
A Master card number begins with a 5 and it is exactly 16 digits long.
A Visa card number begins with a 4 and it is between 13 and 16 digits long.
(a)
Write a regex pattern to match valid Master card number and print all the valid numbers, grouped into sets of 4 separated by a single space.
pat_a <- "^([5][0-9]{3}) \\ s*([0-9]{4}) \\ s*([0-9]{4}) \\ s*([0-9]{4})$"
apply (str_match (cards[str_detect (cards, pat_a)], pat_a)[,2 : 5 ], 1 , paste, collapse = " " )
[1] "5123 4567 8910 1112" "5234 4567 8910 1112" "5192 9295 9182 8818"
(b)
Write a regex pattern to match valid Visa card number and print all the valid numbers, grouped into sets of 4 separated by a single space.
pat_b <- "^([4][0-9]{3}) \\ s*([0-9]{4}) \\ s*([0-9]{4}) \\ s*([0-9]{1,4})$"
apply (str_match (cards[str_detect (cards, pat_b)], pat_b)[,2 : 5 ], 1 , paste, collapse = " " )
[1] "4789 0123 8910 1112" "4444 9321 1230 3"
passwords.txt
passwords <- unlist (read.table ("passwords.txt" , sep = " \n " ), use.names = FALSE )
passwords
[1] "1234567" "12345678" "Strings78" "appleO07"
[5] "1brownie" "asdfjkl" "90095" "glhf1234"
[9] "789afk" "alllowercase" "ALLUPPERCASE" "missingNumbers"
(a)
Write a regex pattern to identify the passwords that satisfies the requirements below.
Minimum 8 characters
Must contain at least one letter
Must contain at least one digit
passwords[str_detect (passwords, "(?=.*[0-9])(?=.*[a-zA-Z]).{8}" )]
[1] "Strings78" "appleO07" "1brownie" "glhf1234"
(b)
Write a regex pattern to identify the passwords that satisfies the requirements below.
Minimum 8 characters
Must contain at least one uppercase character
Must contain at least one lowercase character
Must contain at least one digit
passwords[str_detect (passwords, "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).{8}" )]
[1] "Strings78" "appleO07"
wordlists.RData
Write regular expression patterns which will match all of the values in x and none of the values in y.
(a)
all (str_detect (wordlists$ Ranges$ x, "^[a-f]+$" )) == TRUE
any (str_detect (wordlists$ Ranges$ y, "^[a-f]+$" )) == FALSE
(b)
all (str_detect (wordlists$ Backrefs$ x, "([a-z]{3}).* \\ 1" )) == TRUE
any (str_detect (wordlists$ Backrefs$ y, "([a-z]{3}).* \\ 1" )) == FALSE
(c)
all (str_detect (wordlists$ Prime$ x, "^(?!(xx+) \\ 1+$)" )) == TRUE
any (str_detect (wordlists$ Prime$ y, "^(?!(xx+) \\ 1+$)" )) == FALSE