Alex Garnett – THATCamp Vancouver 2014 http://vancouver2014.thatcamp.org Just another THATCamp site Tue, 06 May 2014 22:09:52 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 Text Parsing for Fun and Glory http://vancouver2014.thatcamp.org/2014/01/30/text-parsing-for-fun-and-glory/ Thu, 30 Jan 2014 23:37:03 +0000 http://vancouver2014.thatcamp.org/?p=160

Not available: Blood, sweat, tears, money, love, etc.

I’m sure that most people have their own thing-they-think-everyone-should-learn-regardless-of-whether-it’s-actually-at-all-useful-to-other-people crusade. Mine is text parsing. I’m the sort of person who frequently has to go prodding around massive databases or web pages of unstructured text, pulling out things that look like they’re home addresses or phone numbers or twitter handles, and because I’m a bit of a nerd (see: https://xkcd.com/208/), I don’t want to do it all by hand. One of the simplest ways of doing this, that doesn’t involve any programming per se, is regular expressions — weird little textual logic puzzles that can be implemented in Excel as easily as they can be implemented in Python.

There are plenty of other simple little parsing techniques we could work on in addition to regular exrepssions; this is just a starting suggestion.

]]>