regular expression RGEX

Regular Expression Resources | Regex

Posted on Posted in command line, Ruby

Howdie Reader, or mum as I otherwise call her,

So week 4 of Makers Academy Pre-course and I am still working away at my codewars kyu ranking.

Since the last challenge proved too much for me I decided to go for a simpler one. One with a ranking of 7 Kyu.

The one I chose was on regular expression and this lead me to discovering some very useful resources.

A Quick reference for regular expression Terms:

Regular expression quick references REGEX

Here is another useful list of quick references (This one is from Ruby Kickstarter cheatsheet):

# .             any character except newline
# [ ]           any single character of set
# [^ ]          any single character NOT of set
# *             0 or more previous regular expression
# *?            0 or more previous regular expression (non-greedy)
# +             1 or more previous regular expression
# +?            1 or more previous regular expression (non-greedy)
# ?             0 or 1 previous regular expression
# |             alternation
# ( )           grouping regular expressions
# ^             beginning of a line or string
# $             end of a line or string
# {m,n}         at least m but most n previous regular expression
# {m,n}?        at least m but most n previous regular expression (non-greedy)
# \1-9          nth previous captured group
# \&            whole match
# \`            pre-match
# \'            post-match
# \+            highest group matched
# \A            beginning of a string
# \b            backspace(0x08)(inside[]only)
# \b            word boundary(outside[]only)
# \B            non-word boundary
# \d            digit, same as[0-9]
# \D            non-digit
# \S            non-whitespace character
# \s            whitespace character[ \t\n\r\f]
# \W            non-word character
# \w            word character[0-9A-Za-z_]
# \z            end of a string
# \Z            end of a string, or before newline at the end
# (?#)          comment
# (?:)          grouping without backreferences
# (?=)          zero-width positive look-ahead assertion
# (?!)          zero-width negative look-ahead assertion
# (?>)          nested anchored sub-regexp. stops backtracking.
# (?imx-imx)    turns on/off imx options for rest of regexp.
# (?imx-imx:re) turns on/off imx options, localized in group.

 

 

The site this was on also has a way to test regular expressions you are creating.

A series of Video Tutorials on Regular Expression.

From this set of videos I could clarify a bit more of the quick reference table. I got the \d any digit \w any word character however I found the following information chunks very useful:

.           matches any character EXCEPT a line break

\b        a space which proceeds or follows any word

?          zero or 1 repetition of whatever code proceeds it

*          zero or more reps of any code which proceeds it

\d{5}  5 digits in a row

\d{1..5} 1-5 digits in a row.

Some characters require escaping which means followed by a /.

Examples:

1.

'Jenifer\s\w+\s'

Can be explained as;

'Jenifer

the combination of letters which matches exactly ‘Jenifer’

\s

followed by a space

\w+

followed by one or more characters (could be surname ..)

\s'

followed by another space.

2.

\$\d*\.\d{2}

\$\

an actual $ sign as it has a \ before it which means we want it instead of it’s denotation of the end of a line.

d*

any number of digits

\.\

a full stop

d{2}

two digits

so $245.98 would be found with this code.

.. I can search through any sites quickly for a price??

WOW this is beginning to look like it is a very powerful tool!!

regular expression as a powerful tool

It looks like it could be a key part of building a web-scaping tool.. which as a previous growth marketer is something which is extremely exciting to me .. Even if my reader (mum) doesn’t quite get why that may be so useful!!

 

Here is the initial codewars task I was given, the one that sparked off this whole blog post:

Return the number (count) of vowels in the given string.

We will consider a, e, i, o, and u as vowels for this Kata.

And here is the solution:

def getCount(inputStr)
  inputStr.count("/[aeiou]/")
end

The .count method is used to count the number of occurrences in a string. The optional parenthesis, (), after it are to give the count method an argument. This argument allows you to count whatever is instead the () within the string.

The regular expression:

"/[aeiou]/"

means any characters which are equal to a, e, i, o, u. It is within quotes “” because when I didn’t do this I got an error message from irb which stated  there was no conversion of Regex to string. This must be done so it can count those characters within a string not within a regular expression.

The // are used to capture any regular expression.

The [] denote a “character group”.

The () is to group the string together.

Another Example

While perusing stackoverflow, I found A useful explanation of a more complicated regular expression:

/([^.]*)\.(.*)/

I have re-worded the explanation a bit into my simpler beginners language:

// mark the start and end of regular expressions.

([^.]*)

() are to group a string together and to let ruby know the expression within is related.

[] denote the character group.

^ reverses the groups meaning.

. is a fullstop (or period if American). So the ^ means to match any character which is not a fullstop.

* is a wildcard, which means anything within the square brackets before it can be matched zero or more times.

\. is an escape period. Fullstops in regex have a special meaning they match everything. The \ before it cancels this meaning out so that it literally means match fullstops.

(.*) is a new sub-group. The . takes on it’s special meaning so that it matches any character and the * wildcard means it can be repeated as many times as it needs to be.

So what does this mean the whole expression is doing?

It finds any sequence of characters (that isn’t a fullstop), followed by a single fullstop, followed then by any character.

Another way to look at it is this:

Split this as:
[1] ([^.]*) : It says match all characters except . [ period ]
[2] \. : match a period
[3] (.*) : matches any character

 

For a full list of all the types of regular expression head over to Microsofts site HERE.

So there you have it, my short but sweet experience with learning the basics of regular expression, then using this to complete a codewar’s kata challenge. And by ‘you’ I realise I am writing this for myself, as a way to solidify my learning. Sadly I known even my mum doesn’t read these .. because I can see no traffic coming from France ..

regular expression resource
This isn’t actually my mum I just wanted to show off a picture I took while living in Indonesia

 

2 thoughts on “Regular Expression Resources | Regex

    1. YESSSS I knew I was right to always address these to you ! 🙂 Thanks for the support even if these are totally irrelevant to your life .. I knew I had a fan … :-/ (stay tuned for more badly spelt shout outs then mum ..)

Leave a Reply

Your email address will not be published. Required fields are marked *