Clean Regular Expressions In Swift

UPDATE: The initial version of this article recommended using the __conversion function. Embarrassingly, I had missed this post on the dev forums stating it was going to be removed in Swift 1.0. I have updated this article accordingly. Thanks to @frosty for pointing it out.

The Cocoa regular expression API was introduced in iOS 4 and OS X Lion. Until then, there was no Apple-provided way to do regular expressions in a Cocoa or CocoaTouch application. Sadly the API is shackled to the object-oriented nature of Cocoa, making it verbose and overcomplicated. It hasn’t seen much adoption.1

Well, Swift is here and it’s a brand new day. We have prefix, infix, and suffix operators galore, and a few neat tricks up Swift’s sleeve. Projects like ExSwift are making regexes simpler to use, and I’m going to go over a few possible solutions, including ExSwift’s, and I’ll explain which one I like best and why.

• • • • •

First, let’s look at the problem. Suppose I want to find strings that match a command-line option for a parser. The regular expression to do that would be ^(?:-[a-z]|--[a-z]\S*)$. It matches strings with one dash and one letter, and strings with two dashes and one letter followed by any number of non-white-space characters. We want to know if a string matches this regex. How would we go about writing this in vanilla Cocoa-backed Swift?

let regex = NSRegularExpression(pattern:"^(?:-[a-z]|--[a-z]\\S*)$" 
                                options:nil,
                                  error:nil)

let match = regex.numberOfMatchesInString("-v", 
               options:nil
                 range:NSRange(location:0,
                                 length:countElements(testString))

let answer = match > 0

This is no fun. All we wanted was to know if a string matched a pattern, and we have to worry about error-handling (at least in theory), and options both in the creation and the matching calls. These are all things you don’t worry about when writing a Ruby or Python program.

So have Ruby and Python invented a magical potion that makes those questions irrelevant? Of course not. What Ruby and Python do is make the default situation trivial to use. We are programmers, and few of us deal with natural language in text. What we are concerned with is:

That’s not to say that the powerful API of NSRegularExpression is unnecessary, but having to specify everything when the defaults above are all that’s needed in 95% of use cases is less than optimal. Most developers who use regular expressions don’t ever worry about whether newlines match dots, or whether ^ and $ should be interpreted as within the specified range or within the entire passed string. So how does Swift help?

• • • • •

The simplest test we care about is whether a string matches a regular expression. Ruby has a nice, simple operator for that: =~. ExSwift implements it like so:

// Code rewritten to show all the steps
infix func =~(string:String, pattern:String) -> Bool {

    var options: NSRegularExpressionOptions = 
            NSRegularExpressionOptions.DotMatchesLineSeparators

    let regex = NSRegularExpression(pattern:self, 
                                 options: options, 
                                    error: nil)

    var matches = 0
    if let regex = regex {
        matches = regex.numberOfMatchesInString(string,
                          options: nil, 
                            range: NSMakeRange(0, string.length))
    }

    return matches > 0
}

Simple: take a string you want to match against, take a pattern you want it to match, and this function returns true or false. Our initial example boils down to:

let answer = "-v" =~ "^(?:-[a-z]|--[a-z]\\S*)$"

This is much easier to use than the original API, but I do have a reservation. You see, this method does not accept a regular expression as a parameter—it takes a string. That’s a problem if we do want to create a special regex with different options; we can’t then call string =~ regex. I feel the real signature should be:

infix func =~(string:String, regex:NSRegularExpression) -> Bool

But now we’re back to the same problem as before: we have to write a verbose NSRegularExpression initializer call for even the simplest cases.

• • • • •

If you look at Ruby, =~ doesn’t take a string, but a /-delimited token; our code would look like this (not accounting for slight differences in the pattern syntax):

"-v" =~ /^(?:-[a-z]|--[a-z]\S*)$/

Sadly, that’s language-level syntax and we don’t have a macro system in Swift to allow us to add that in. However, Swift supports prefix operators, which means we can achieve something similar:

prefix func /(pattern:String) -> NSRegularExpression {
    var options: NSRegularExpressionOptions = 
            NSRegularExpressionOptions.DotMatchesLineSeparators

    return NSRegularExpression(pattern:pattern, 
                                    options:options, 
                                      error:nil)
}

There is a subtle difference here from what we had before. Since we are not checking for errors, by returning the output of the NSRegularExpression initializer, we are returning an implicitly unwrapped optional. What this means is that we may get a runtime error if, for instance, we pass in an invalid regex (like ?!br|p)). The original implementation of =~ returns false for an invalid regex, but I feel this version is actually an improvement, because a. it breaks on incorrect semantics and b. the break is very likely to be caught early in the development cycle.

With this prefix operator, we can change the signature of =~, and the implementation becomes:

infix func =~ (string: String, regex: NSRegularExpression) -> Bool {
    let matches = regex.numberOfMatchesInString(string, 
                               options: nil, 
                                 range: NSMakeRange(0, string.length))
    return matches > 0
}

Which means the original syntax gets tweaked to:

let answer = "-v" =~ /"^(?:-[a-z]|--[a-z]\\S*)$"

Much cleaner! But the best part is that we can now do the fancier things we were talking about before:

let regex = … //some non-default NSRegularExpression
let answer = "test string" =~ regex

So with one extra character to type, we move the API from String to NSRegularExpression and have something distinctly more powerful.2

• • • • •

Scala adopts a different approach, pointed out by @lightfiend: it adds an r method to String, so that you can compile a regex via "pattern".r. I prefer the prefix operator, but that, too, is a valid option. Either of these is an improvement over the Cocoa API for the most common use cases.

I don’t support terseness in all cases; I think the clarity of Cocoa is one of its strong points, despite the verbosity. However, there are a few choice situations where the defaults are well-understood and very common; in those cases it makes sense to provide operators that will make them easy to use. In the specific case of regular expressions, maybe it will let more people feel at ease trying this very powerful tool.







1 In fact, I have more than once seen developers manually parse strings that should have been trivially matchable with regexes because the API was so onerous.↩︎

2 If you want to really match the Ruby API, also define @postfix func /(regex:Regex) -> Regex { return regex }. But fun though it is, that’s just silly…↩︎

 
269
Kudos
 
269
Kudos

Now read this

Smashing Swift

I have been playing with Swift ever since Apple announced it at this year’s WWDC. I was lucky enough to be there, and I spent a good chunk of my time in the labs, coding and asking the engineers who created it questions. There is nothing... Continue →