Special

Clearance Sale!

We've been publishing for over five years now and it's time to clear out our inventory of back issues, so we're slashing prices!

RBD Magazines

Check out this amazing clearance sale of all our past issues. Missing some issues? This is a great time to complete your RBD collection. Save up to 40% off the regular price of our printed back issue packages. These prices are only good until the end of the year May 2008 and supplies are limited, so place your order today.

Article Preview


Buy Now

PDF:

Beyond the Limits

Regular Expressions Overdrive

Issue: 1.1 (August/September 2002)
Author: Didier Barbas
Author Bio: Didier has been a dilettante programmer and linguist for more than 20 years. Unusual for a Frenchman, he speaks 11 languages, including Korean and PowerPC machine-language; he manages the Korean branch of a Dutch company that doesn't do banking, chemicals, or consumer products. Go figure!
Article Description: Advanced regular expressions.
Article Length (in bytes): 16,198
Starting Page Number: 46
RBD Number: 1016
Resource File(s):

Download Icon 1016a_sourcecode.sit Updated: Friday, October 17, 2003 at 1:18 PM
Download Icon 1016b_compiledapp.sit Updated: Friday, October 17, 2003 at 1:18 PM

Related Link(s): None
Known Limitations: None

Excerpt of article text...

This article assumes that you already have covered the basics of regular expressions (RegExes), and at least read Matt Neuburg's article on page ## of this issue. We will focus here on techniques that will make your coding (and your life) easier. These techniques are answers to real-life problems, some of my own, and some to questions asked on the REALbasic discussion lists. I will also show that regular expressions are not always the right tool -- some require extra help or are just not fit for the task.

Just don't bother.

A discussion we had some time ago on one of the REALbasic discussion lists was on how to suppress extra spaces in a text. The pattern that will come up immediately to most people is [\t ]+, to be replaced with a single space. In the discussion, it was argued that the correct pattern should be [\t ][\t ]+, since RB's RegEx engine should start matching only when there are at least two tabs or spaces. It was, however, noted that the speed difference on average-sized texts was quite negligible, at least from the stand-point of a human being (applied to this article, which has few double spaces, [\t ][\t ]+ is six times faster than [\t ]+). On the other hand, all this discussion, while fascinating, was quite academic since a) we're talking microseconds or milliseconds, not seconds, and b) another fellow had come up with an example using replaceAll, which was very much faster. I tweaked it a little bit further and made it even faster by changing inStr to inStrB, and by adding a line of code to first remove odd-numbers of spaces:

...End of Excerpt. Please purchase the magazine to read the full article.

Article copyrighted by REALbasic Developer magazine. All rights reserved.


 


|

 


Weblog Commenting and Trackback by HaloScan.com