Best practices for Regular ExpressionsΒΆ

As your Regexs grow and form into beautiful pattern-matching powerhouses, it is important to take good care of them, and they have a tendency to grow unruly as they mature.

Here are some several safe practices for handling your Regexs so that they can have a long productive life without getting out of control:

  • Use whitespace and comments in your regular expressions. Whitespace is set to be ignored by default (use /s to match whitespace) so use spaces and newlines to break up sections freely. Regex comments look like (?# this ).
  • Never let a regex become too big to be easily understood. Split up big regex into smaller expressions. (Sensible splits won’t hurt them).
  • Maintain a Matches and Non-Matches
    • Reparse can use this to test your Regex to make sure they are matching properly
    • It helps maintainers see which regular expressions match what quickly
    • It helps show your intention with each expression, so that others can confidently improve or modify them
  • Maintain a description which talks about what you are trying to match with each regex, what you are not matching and why, and possibly a url where they might learn more about that specific format.
  • Having each regex author list his name can be a great boon. It gives them credit for their work, it encourages them to put forth their best effort, and is an easy way to name them. I often name the regex after the the author so I don’t have to come up with unique names for all my regexs, since that are often really similar.

For more information about maintaining a regex-safe environment visit:

http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html