Power to Build

Home » Misc » Regex Part 2

Regex Part 2

I earlier posted about exciting world of Regex here.  Let’s try to learn some more in this followup post.

First if you want to play with Regex, here is a nice tool you can use:

http://regexr.com

2017-05-25 17_39_08-RegExr_ Learn, Build, & Test RegEx

Regex Tool online

By default it comes up with a sample pattern and the matches in the given sample text.  You can type in the pattern you are testing, the actual text in the middle box and if you chose Replace, you get to see what the output will look like.

Let’s use this tool to convert our RSVP list in free text to a spreadsheet list.

If you read my original post, we had 4 RSVPs. It looked somewhat like this:

RSVP:

Name:
Will you attend:
Will you be here for lunch:
How many:

Here is my session in Regexr testing this:

2017-05-25 17_44_49-regexr.com_3g1nb

Using Regexr to test/convert our RSVP text

The bottom portion shows the Replaced text – converted and cleaned, readied to be copied out to Excel. The spreadsheet will look like this:

2017-05-25 17_53_11-Microsoft Excel - Book3

Here is the actual regular expression used in this example:

RSVP[0-9]+:\n+Name:[ ]*(.+)\nWill you attend:(.*)\nWill you be here for lunch:(.*)\nHow many:.*([0-9]+).*[\n]*

Let’s go through the regular expression for arriving at this:

First line would have, RSVP:

In Regex any number can be represented by [0-9]+ **. That is a digit is represented by any character between 0 and 9. A whole number would then be one or more digits.

Thus, RSVP: can be written as,

RSVP[0-9]+:

Next text is actually on another line. This can represented by \n. You may know this is newline. Now we are on the second line.

Name:<name>

This can be matched by, Name: (.*)

That would be any character after Name: . If you notice, we have a space after Name:. What if the user didn’t add any space or added many spaces in between? We can rewrite the above as,

Name:[ ]*(.*)

That’s a space inside the square brackets. [ ]* represents 0 or more spaces. This way, we will match it, even if there is no space between the label “Name:” and the name that followis it. If used with a +, it will mean one or more spaces – at least one space must be there.

Next we have (.*). Dot (.) is any character and .* means zero or more characters. By enclosing a pattern within parantheses (), we make a “Capture Group”. This is what will be remembered (like in a variable) when this pattern is matched with a text. This can be returned and useful in Replace functionality. Each captured group will have a number associated with it – like $1*** for the first, $2 for the second etc. This is also called back reference. More on this later.

And we continue with the rest of the lines. The replace pattern will look like this:

$1\t$2\t$3\t\$4\n

That’s all captured variables ($1 – $4) separated by tab characters recognized by Excel as cell separators (we could also use Comma). End it with \n to push to the next line.

Voila! We have successfully converted a free-form text into a spreadsheet.

That was a rudimentary example. And if you noticed, I left out some complications – for e.g., what if they said 1 or 2 people will be coming and I want to take the maximum, meaning 2.

The fact is, Regex is never used alone in a tool like this. Often times, you will build this (and more regex) into a script/program to achieve the full desired results. Perl and Python are heavily invested in Regex. Other programming languages (including Java) do support Regex using some library API.

To be contd…


** As an aside, this is similar to the BNF form for writing language grammar (I’ve considered only positive whole numbers for this discussion):

2017-05-26 09_58_13-Document6 - Microsoft Word.png

*** Sometimes you will see \1 instead of $1.


Comments, please?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: