Text Parsing for Fun and Glory

Not available: Blood, sweat, tears, money, love, etc.

I’m sure that most people have their own thing-they-think-everyone-should-learn-regardless-of-whether-it’s-actually-at-all-useful-to-other-people crusade. Mine is text parsing. I’m the sort of person who frequently has to go prodding around massive databases or web pages of unstructured text, pulling out things that look like they’re home addresses or phone numbers or twitter handles, and because I’m a bit of a nerd (see: https://xkcd.com/208/), I don’t want to do it all by hand. One of the simplest ways of doing this, that doesn’t involve any programming per se, is regular expressions — weird little textual logic puzzles that can be implemented in Excel as easily as they can be implemented in Python.

There are plenty of other simple little parsing techniques we could work on in addition to regular exrepssions; this is just a starting suggestion.

Categories: General