Regex Cheatsheet
Regular Expressions (short Regex) are very useful for any text-related task. From scraping to analysis, everybody encounters Regex at some point.

That's actual syntax btw.
Some areas to use Regex for:
- data validation
- data scraping
- data wrangling
- string parsing
- string replacement
- syntax highlightning
- Packetsniffing
- File renaming
Basics
Surround every expression with /,like: /abc/
Anchors
Regex | Explanation |
---|---|
^The | Finds strings which start with The |
end$ | Finds strings which end with end |
^The end$ | Finds this exact string |
roar | Finds strings which have roar in them |
Quantifier / Qualifier
Regex | Explanation |
---|---|
abc* | Finds strings which contain ab followed by 0 or more c |
abc+ | Finds strings which contain ab followed by 1 or more c |
abc? | Finds strings which contain ab followed by 0 or 1 c |
abc{2} | Finds strings which contain ab followed by 2 c |
abc{2,} | Finds strings which contain ab followed by 2 or more c |
abc{2,5} | Finds strings which contain ab followed by 2 or up to 5 c |
a(bc)* | Finds strings which contain a followed by 0 or more bc |
a(bc){2,5} | Finds strings which contain a followed by 2 or up to 5 bc |
OR-Operator
Regex does not have an AND-Operator, only an OR.
Regex | Explanation |
---|---|
a(b|c) | Finds strings which contain a followed by b or c (possible: ab, ac, abc, acb) |
a[bc] | Just like before, but this time without capturing b or c |
Character classes
Regex | Explanation |
---|---|
\d | Finds digits |
\w | finds word characters (alphanumeric and _) |
\s | finds a whitespace (including tabs and line breaks) |
. | finds any character (should be used carefully, other classes are faster and more precise) |
\D | negation of \d |
\W | negation of \w |
\S | negation of \s |
You can also search for non printable chars like \t, \n and \r. To search for special chars, simply escape with \ like this: \:, \$, \., \{, \[ ...
An example for a combination: The expression \$\d finds the $-sign in front of digits.
Flags
You can define flags at the end of an expression:
Regex | Explanation |
---|---|
g | (global) does not return after the first match, restarting the subsequent searches from the end of the previous match |
m | (multi-line) when enabled ^ and $ will match the start and end of a line, instead of the whole string |
i | (insensitive) makes the whole expression case-insensitive (for instance /aBc/i would match AbC) |
Grouping and capturing
Regex | Explanation |
---|---|
a(bc) | parentheses create a capturing group (possible: abc) |
a(?:bc)* | using ?: disables the capturing group (possible: a) |
a(?<foo>bc) | using ?<foo> will give the group the name foo |
Naming a group enables the possibility to search through the result like a dictionary (data type), whereby the keys correspond to the respective name of the capturing group.
Bracket expressions
Regex | Explanation |
---|---|
[abc] | finds strings that either have one a or b or c (equal to a|b|c) |
[a-c] | same as before (scope from a to c) |
[a-fA-F0-9] | finds strings that either have one a to f or A to F or 0 to 9 (hexadecimal btw.) |
[0-9]% | finds strings that have 0 to 9 followed by a % |
[^a-zA-Z] | finds strings that have no letter from a to z or from A to Z (in this case ^ is used as negation of the expression) |
Important to note: Every regex and even the \ will lose its meaning inside the [ ] and don't need to be escaped.
Greedy and Lazy match
The quantifiers (* ++ { }) are greedy operators, so they expand the match as far as they can through the provided text.
Given this string: This is a <div>simple div</div> test
The regex <.+> will find <div>simple div</div>. To only find the the tags <div> and </div>, use ? to make the expression lazy: <.+?>
An even better expression would be (to avoid the . operator): <[^<>]+> - Explanation: Matches every sign, except < and > 1 or more times inside < and >.
Advanced stuff
Boundaries
Regex | Explanation |
---|---|
\babc\b | only searches for "whole words" (possible: abc, -abc/ - not possible: ab, abcc, babc |
The \b operator is similar to the anchors ^ and $, where one side is a word character (such as \w) and the other side is a non-word character (for example the beginning of a string or a space).
It also has a negation \B:
Regex | Explanation |
---|---|
\Babc\B | matches only if the pattern is fully surrounded by word characters (possible: babcd - not possible: ab, abc, abcc |
Back-references
Regex | Explanation |
---|---|
([abc])\1 | using \1 it matches the same text that was matched by the first capturing group |
([abc])([de])\2\1 | use \2 (\3, \4, etc.) to identify the same text that was matched by the second (third, fourth, etc.) capturing group |
(?<foo>[abc])\k<foo> | the foo group is referenced later (\k<foo>). The result is the same of the first regex. |
Look-ahead and Look-behind
Regex | Explanation |
---|---|
d(?=r) | matches a d only if is followed by r, but r will not be part of the overall regex match |
(?<=r)d | matches a d only if is preceded by an r, but r will not be part of the overall regex match |
This can also be negated:
Regex | Explanation |
---|---|
d(?!r) | matches a d only if is not followed by r, but r will not be part of the overall regex match |
(?<!r)d | matches a d if is not preceded by an r, but r will not be part of the overall regex match |
Useful expressions
Regex | Explanation |
---|---|
^(.*)(\r?\n\1)+$ | Finds duplicates in consecutive lines (unique, unique, duplicate, duplicate, unique ...) |
<[^>]*> | Removes all HTML tags: <b>test</b> becomes test and <a href="https://www.google.de/">Google</a> becomes Google |
^(?:[\t ]*(?:\r?\n|\r))+ | Removes all empty lines |
Hint: This article is based and extended uppon this post I found during my research.