Using Regular Expression
Regular expression, also known as regex or regexp, is a standard pattern matching tool used in many scripting languages. It allows you to create filters that can match patterns of text, rather than just single words or phrases.
Content filtering in SpamTitan uses Perl Compatible Regular Expression (PCRE) regular expressions. For more information on regular expression and PCRE see www.regular-expressions.info and www.pcre.org .
Always test your regular expressions for matching and false positives using a testing tool like regexpal.com .
Regular Expression Examples
Example 1: match email from a specific domain | |
---|---|
Match criteria: | Match any email address from the domains example.com and example.net. |
Regular Expression: | (\W|^)[\w.+\-]{0,25}@example\.(com|net)(\W|$) |
Description: |
|
Example 2: match string containing IP addresses in the 192.168.1.0/24 netblock | |
---|---|
Matching criteria: | Match any IP address in 192.168.1.0/24 CIDR address (i.e. 192.168.1.0-192.168.1.255). |
Regular expression: | (\W|^)192\.168\.1\.([1-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))(\W|$) |
Description: |
|
Wildcards
Wildcards are an important part of regular expressions. These are different from the wildcards used in SQL or file system wildcards. Below are some common ones you may use:
Wildcard | Description |
---|---|
. | matches any character, except a newline (includes spaces, tabs, letters, and numbers). |
\d | matches any digit (0-9). |
\s | matches whitespace (space or tabs). |
\w | matches any word character (a-z and A-Z) |
[ ] | matches any character within the brackets. For example, [b,t,z] will match b, t, or z. [a-z0-9] will match a to z or 0 to 9. |
\ | use as an "escape" to strip special properties of any character. For example, if you want a period, use "\." instead of ".". |
(|) | matches any of the patterns inside the parentheses. For example, (\.tw|\.jp) matches ".tw" or ".jp". |
* | matches zero or more of the preceding element or pattern. |
+ | matches one or more of the preceding element or pattern. |
? | matches zero or one of the preceding element or pattern. |
{n} | matches exactly "n" number of the previous character or pattern. |
{n,m} | matches between "n" and "m" number of the previous character or pattern. |
{n,} | matches at least "n" of the previous character or pattern. |
^ | matches the beginning of a string. |
$ | matches the end of a string. |
\b | a word break, which matches the beginning {\<} or end {\>} of a word. It is NOT a catchall for tabs, spaces, etc. between words. |