Intro to RegEx

Introduction to Regular Expressions (RegEx)

Some examples of regular expressions from simple to complex:

International Banking

^[A-Z]{6}[A-Z0-9]{2}([A-Z0-9]{3})?$

This regular expression is designed to match SWIFT (Society for Worldwide Interbank Financial Telecommunication) numbers, also known as BIC (Bank Identifier Codes). Let’s break down the expression to understand how it works:

  1. ^: This asserts the start of a line. It ensures that the matching process begins from the beginning of the string.

  2. [A-Z]{6}: This part matches exactly six characters from ‘A’ to ‘Z’. In SWIFT codes, these first six letters represent the bank code, which is always composed of letters.

  3. [A-Z0-9]{2}: This matches exactly two characters, which can be any letter from ‘A’ to ‘Z’ or any digit from ‘0’ to ‘9’. These two characters represent the country code in the SWIFT number, indicating the country where the bank is located.

  4. ([A-Z0-9]{3})?: This is an optional group (as indicated by the ? at the end), which matches exactly three characters. These can be any letter from ‘A’ to ‘Z’ or any digit from ‘0’ to ‘9’. In a SWIFT code, these three characters are either the location/city code (which is typically letters) or can include the branch code (which might contain numbers). The ? means this entire group of three characters is optional, accommodating both 8-character and 11-character SWIFT codes.

  5. $: This asserts the end of a line, ensuring that the matching process does not continue beyond the intended SWIFT code length.

In summary, this regular expression is structured to accurately identify the standard format of SWIFT codes, which are either 8 or 11 characters long. It starts with a 6-letter bank code, followed by a 2-character country code, and optionally ends with a 3-character code representing the location or branch.

Government Identification Numbers

^(1|2)[0-9]{2}(0[1-9]|1[0-2])(0[1-9]|[1-2][0-9]|3[0-1])[0-9]{6}(0[1-9]|1[0-9]|2[0-9]|3[0-9]|4[0-2])$

This regular expression is structured to match French social security numbers (numéro de sécurité sociale), which have a specific format. Let’s break it down:

  1. ^: This signifies the beginning of the string, ensuring that the pattern matches from the start.

  2. (1|2): Matches either ‘1’ or ‘2’, indicating the individual’s gender (‘1’ for male, ‘2’ for female).

  3. [0-9]{2}: Matches any two digits, representing the year of birth’s last two digits.

  4. (0[1-9]|1[0-2]): A choice pattern for a two-digit month of birth, ranging from ‘01’ to ‘12’.

  5. (0[1-9]|[1-2][0-9]|3[0-1]): Matches a two-digit day of birth, covering all possibilities from ‘01’ to ‘31’.

  6. [0-9]{6}: Matches any sequence of six digits, usually related to the place of birth.

  7. (0[1-9]|1[0-9]|2[0-9]|3[0-9]|4[0-2]): Matches the birth rank, for numbers ranging from ‘01’ to ‘42’, used in cases of multiple births.

  8. $: Signifies the end of the string, ensuring the pattern matches only if it reaches the string’s end.

This regular expression is designed to precisely match the structure of French social security numbers, accounting for components like gender, date of birth, place of birth, and birth order.

Postal Addresses

^(?:(?:(?:(?:(?:(?:(?:(?:[0-9]+-[0-9]+|[0-9]+)\s+(?:(?:(?:[0-9]+|[a-zA-Z]+)\s*)+)(?:(?:(?:[a-zA-Z]+,?\s*)+)(?:(?:[0-9]+|[a-zA-Z]+)\s*)+)?(?:[0-9]+-[0-9]+|[0-9]+))|(?:(?:(?:[a-zA-Z]+,?\s*)+)(?:(?:(?:[0-9]+|[a-zA-Z]+)\s*)+)(?:(?:[0-9]+-[0-9]+|[0-9]+)|[a-zA-Z]+)))|(?:(?:(?:[a-zA-Z]+,?\s*)+)(?:(?:[0-9]+-[0-9]+|[0-9]+)|[a-zA-Z]+)\s+(?:(?:[0-9]+-[0-9]+|[0-9]+)|[a-zA-Z]+))|(?:(?:(?:[0-9]+-[0-9]+|[0-9]+)\s+)(?:(?:[0-9]+-[0-9]+|[0-9]+)\s*)+(?:(?:[a-zA-Z]+,?\s*)+)?(?:(?:[0-9]+-[0-9]+|[0-9]+)|[a-zA-Z]+))))\s+(?:(?:[a-zA-Z]+,?\s*)+)?(?:[a-zA-Z]+)\s+(?:(?:[0-9]+-[0-9]+|[0-9]+)\s*)?|(?:[0-9]+-[0-9]+|[0-9]+)\s+(?:(?:(?:[0-9]+|[a-zA-Z]+)\s*)+)(?:(?:(?:[a-zA-Z]+,?\s*)+)(?:(?:[0-9]+|[a-zA-Z]+)\s*)+)?(?:[0-9]+-[0-9]+|[0-9]+))$

The regular expression provided is tailored to identify various U.S. street address formats. Let’s examine its structure:

  1. ^: This marks the beginning of the string, ensuring the pattern matches from the start.

  2. (?: ... ): Multiple nested optional groups are used throughout the expression. These cater to different parts and formats of street addresses.

  3. [0-9]+: Matches sequences of digits, representing house numbers.

  4. [0-9]+-[0-9]+: Used for addresses with number ranges (e.g., “123-125 Main Street”).

  5. \s+: Denotes spaces between different parts of the address.

  6. [a-zA-Z]+: Matches sequences of letters, typically for street names.

  7. Additional Elements: The regex includes combinations of digits and letters to accommodate elements like apartment numbers or unit designations.

  8. The expression accounts for the variability in street address formats, including different permutations and combinations of these elements.

  9. $: Concludes with this symbol, indicating the end of the string.

This regex is an attempt to create a comprehensive pattern for street addresses, which inherently have many variations, thus making it quite complex and potentially prone to errors.