CopperSpice API
1.9.2
|
Provides pattern matching using regular expressions. More...
Public Methods | |
QRegularExpression () = default | |
QRegularExpression (const QRegularExpression &other) = default | |
QRegularExpression (const S &pattern, QPatternOptionFlags options=QPatternOption::NoPatternOption) | |
QRegularExpression (QRegularExpression &&other) = default | |
int | captureCount () const |
S | errorString () const |
QList< QRegularExpressionMatch< S > > | globalMatch (const S &str) const |
QList< QRegularExpressionMatch< S > > | globalMatch (const S &str, typename S::const_iterator offset, QMatchType matchType=QMatchType::NormalMatch, QMatchOptionFlags matchOptions=QMatchOption::NoMatchOption) const |
QList< QRegularExpressionMatch< S > > | globalMatch (QStringView< S > str) const |
QList< QRegularExpressionMatch< S > > | globalMatch (QStringView< S > str, typename S::const_iterator offset, QMatchType matchType=QMatchType::NormalMatch, QMatchOptionFlags matchOptions=QMatchOption::NoMatchOption) const |
bool | isValid () const |
QRegularExpressionMatch< S > | match (const S &str) const |
QRegularExpressionMatch< S > | match (const S &str, typename S::const_iterator offset, QMatchType matchType=QMatchType::NormalMatch, QMatchOptionFlags matchOptions=QMatchOption::NoMatchOption) const |
QRegularExpressionMatch< S > | match (QStringView< S > str) const |
QRegularExpressionMatch< S > | match (QStringView< S > str, typename S::const_iterator offset, QMatchType matchType=QMatchType::NormalMatch, QMatchOptionFlags matchOptions=QMatchOption::NoMatchOption) const |
QList< S > | namedCaptureGroups () const |
QRegularExpression & | operator= (const QRegularExpression &other) = default |
QRegularExpression & | operator= (QRegularExpression &&other) = default |
S | pattern () const |
int | patternErrorOffset () const |
QPatternOptionFlags | patternOptions () const |
void | setPattern (const S &pattern) |
void | setPatternOptions (QPatternOptionFlags options) |
void | swap (QRegularExpression &other) |
Static Public Methods | |
static S | escape (const S &str) |
Related Functions | |
These are not member functions | |
enum | QMatchOption |
enum | QMatchType |
enum | QPatternOption |
A regular expression is a sequence of characters that define a search pattern and is often abbreviated as a regex or regexp. Regular expressions can be used for searching and extracting text from an existing string.
The following table shows various uses for regular expressions.
Category | Description | Sample Usage |
---|---|---|
Validation | test whether a substring meets a given set of criteria specified by a pattern | does the string contain an integer, does it contain any whitespaces |
Searching | can match a more complex pattern than a generic substring match | does the string contain the word apple while ignoring the word pineapple |
Search & Replace | replace a substring with a different substring | replace all occurrences of & with & except where the & is already followed by an amp; |
String Splitting | identify where a string should be split apart | break a string on sentence boundaries |
Several examples are provided which show the syntax for updating source code in your application which used the obsolete QRegExp class. Your code may need to be updated in order to use the CopperSpice QRegularExpression class. For more information refer to QRegularExpression Migration.
In order to use a regular expression a search pattern is required. The pattern consists of some combination of the following categories or grammar types.
The search pattern [ABCD] will match an A or a B or a C or a D. This same expression can be written using a range [A-D].
An expression to match any capital letter in the English alphabet is written as [A-Z].
Some of the character classes and quantifiers are so common they have been given special symbols to represent them. The expression \d is a shorthand notation for [0-9]. Refer to the Shorthand Character Classes table.
The search pattern x{1,1} means match one and only one x. If the search pattern was x{1,5}, this means match a sequence of x characters that contains at least one x but no more than five characters.
If the search pattern was ^b the intent is to match the letter at the beginning of the search text. If the text string was bubble only the first b will match and nothing more. If the search pattern was changed to b$ to match at the end of the string, there is no match.
A word boundary is another very common type of assertion. A pair of word boundaries will match a full word and not a portion of a word. A single word boundary allows matching the beginning or end of a word.
Using the following search pattern and string, a match is found for apples but not for pineapples.
Using the following search pattern and string, a match is will be found for es three times in apples, oranges, and pineapples.
Refer to the Assertion Table below for additional expressions.
Defining a group marks part of a regular expression as a single unit. It is denoted by using parentheses. There are several reasons to use a group in a regular expression.
Using the following search pattern and string, a match is found for the group an every time it occurs. The pattern will match bana, banana, bananana. The underline portion shows what the indicated group matches.
Another reason to group part of a regular expression is to create a captured group. If the regular expression match is successful the group matches are automatically numbered. The capturing group number #0 can be used to retrieve the substring matched by the entire pattern. Additional numeric capturing groups will exist for each subsequent match.
For additional information and examples refer to the section about extracting captured substrings.
Non-Printable Characters
Expression | Description | ASCII |
---|---|---|
\a | Match bell | BEL, 0x07 |
\f | Match form feed | FF, 0x0C |
\n | Match line feed | LF, 0x0A, Unix newline |
\r | Match carriage return | CR, 0x0D |
\t | Match horizontal tab | HT, 0x09 |
\v | Match vertical tab | VT, 0x0B |
Special Characters
Expression | Description | Notes |
---|---|---|
\xhhhh | Match the character for the hexadecimal number hhhh | Value 0x0000 to 0xFFFF |
\0ooo | Match the character for the octal number ooo | Value 0 to 0377 |
\n | Back reference number n | \1, \2, etc |
Syntax for Character Classes
Expression | Description | Notes |
---|---|---|
- | Indicates a range of characters | [W-Z] matches 'W', 'X', 'Y', or 'Z' |
^ | Negates the character class when the caret is the first character | [^abc] match any character except 'a' or 'b' or 'c' |
Character Classes
Expression | Description | Notes |
---|---|---|
[:alnum:] | Alphanumeric character | |
[:alpha:] | Alpha character | QChar32::isLetter() |
[:blank:] | Space or tab | |
[:cntrl:] | Control character | |
[:digit:] | Any Digit | QChar32::isDigit() |
[:graph:] | Alphanumeric or punctuation | |
[:lower:] | Lower case character | QChar32::isLower() |
[:print:] | Printable character | QChar32::isPrint() |
[:punct:] | Punctuation character | QChar32::isPunct() |
[:space:] | Whitespace character | QChar32::isSpace() |
[:word:] | Letter, number, mark, or underscore | QChar32::isLetterOrNumber() QChar32::isMark() |
[:upper:] | Upper case character | QChar32::isUpper() |
[:xdigit:] | Hex digit |
Expression | Description | Notes |
---|---|---|
. (dot) | Any character including a newline | |
\d | Any Digit | QChar32::isDigit() |
\l | Lower case character | QChar32::isLower() |
\D | Non-digit | |
\h | Space or tab | |
\s | Whitespace character | QChar32::isSpace() |
\S | Non-whitespace character | |
\u | Upper case character | QChar32::isUpper() |
\w | Letter, number, mark, or underscore | QChar32::isLetterOrNumber() QChar32::isMark() |
\W | Anything which is not matched by \w |
Quantifiers Table
Expression | Description |
---|---|
* | Match zero or more times |
+ | Match one or more times |
? | Match zero or one time |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n,m} | Match from n to m times |
Assertion Table
Expression | Description |
---|---|
^ | Match the beginning of the string |
$ | Match the end of the string |
\b | Word boundary |
\B | Not a word boundary |
(?=EXP) | Positive lookahead, "EXP" can be any regular expression |
(?!EXP) | Negative lookahead, "EXP" can be any regular expression |
Operator Table
Operators | Description |
---|---|
| | OR operator |
( ) | Defines a group |
[ ] | Defines a character class |
Problem: Define a regular expression which will match integers in the range 0 to 99.
At least one digit is required so we start with the expression [0-9]{1,1} which matches a single digit exactly once. This regular expression matches integers in the range 0 to 9. To match integers up to 99 increase the maximum number of occurrences to 2 so the regular expression becomes [0-9]{1,2}.
This regular expression satisfies the original requirement to match integers from 0 to 99 and it will also match integers which occur in the middle of strings. If we want the match integer to be the whole string then use the anchor assertions ^ and $.
The regular expression now becomes ^[0-9]{1,2}$.
Problem: Optimize the search pattern ^[0-9]{1,2}$
The character class [0-9] can be replaced with the shortcut symbol \d. So the regular expression now becomes ^\d{1,2}$.
The quantifier part {1,2} can be rewritten as {0,1} by repeating the expression it quantifies. So the regular expression can now be represented by ^\d\d{0,1}$. This can be further enhanced by using a ? which is shorthand for the quantifier {0,1}.
The regular expression when it is optimized becomes ^\d\d?$.
Problem: Replace ampersand characters with the HTML entity &.
The regular expression to match an ampersand is &.
However, this will also match any string that has already been converted. We want to replace only ampersands which are not already followed by amp;. The negative lookahead assertion can be used to peek ahead and check what follows the ampersand.
Adding the lookahead will change the regular expression to &(?!amp;)
A QRegularExpression is comprised of two parts.
A regular expression can be created from a pattern string by passing the search pattern to the QRegularExpression constructor.
The search pattern can also be passed directly to setPattern().
If the search pattern is a string literal all backslashes must be escaped with another backslash.
The pattern() method returns the search pattern from a QRegularExpression instance.
The way the search pattern is processed can be configured by setting one or more pattern options. For example, case sensitivity can be turned off by using the class enum QPatternOption::CaseInsensitiveOption. The options can be passed to the QRegularExpression constructor as shown in the following example.
Alternatively, setPatternOptions() can be used to configure the options for an existing QRegularExpression. The following will match any line in the string which contains only digits (but at least one).
The pattern options currently set can be retrieved by using the method patternOptions().
Refer to QPatternOption for more information about the available pattern options.
The last two arguments for the methods match() and globalMatch() set the match type and the match options. The match type is a value of the QRegularExpression::MatchType enum. The "traditional" matching algorithm is chosen by using the NormalMatch match type (the default). It is also possible to set partial matching of the regular expression with a string. Refer to partial matching for more details.
The match options are values from the QMatchOption enum. These options change the way a specific match of a regular expression with a string is done.
In order to do a regular expression match pass a string to one of the match() methods. The result is a QRegularExpressionMatch instance which contains all the information about how the regular expression does or does not match.
If match is successful the capturing group number #0 can be used to retrieve the substring matched by the entire pattern. Refer to the section about extracting captured substrings.
It is also possible to start a match at an arbitrary offset inside the string by passing the offset as an argument of match(). In the following example "12 abc" is not matched because the match is started at offset 1.
If a match is found when calling one of the match() methods, a single QRegularExpressionMatch object is returned.
In contrast, for a global match a QRegularExpressionMatch is returned for every
match found. All of the QRegularExpressionMatch objects are returned in a QList.
The following example will extract all of the words from a given string in one operation.
The QRegularExpressionMatch object contains information about the substrings captured by the capturing groups in the pattern string. QRegularExpressionMatch::captured() will return the string captured by a specific capturing group number.
Capturing groups in the pattern are numbered starting from 1. The capturing group #0 is used to capture the substring which matched the entire pattern.
It is also possible to retrieve the starting and the ending offsets inside the string of each captured substring by using QRegularExpressionMatch::capturedStart() and QRegularExpressionMatch::capturedEnd().
These methods have overloads which take a string as a parameter in order to extract named captured substrings.
A partial match is obtained when the end of the string is reached and more characters are needed to successfully complete the match. A partial match is less efficient than a full match since some optimizations do not apply.
A partial match must be explicitly requested by specifying a match type of PartialPreferCompleteMatch or PartialPreferFirstMatch when calling QRegularExpression::match or QRegularExpression::globalMatch. If a partial match is found then calling QRegularExpressionMatch::hasMatch() on the QRegularExpressionMatch object returned by match() will return false, but QRegularExpressionMatch::hasPartialMatch() will return true.
When a partial match is found no captured substrings are returned, and the (implicit) capturing group 0 corresponding to the whole match captures the partially matched substring of the subject string.
Asking for a partial match can still lead to a complete match if one is found. In this case, QRegularExpressionMatch::hasMatch() will return true and QRegularExpressionMatch::hasPartialMatch() false. It never happens that a QRegularExpressionMatch reports both a partial and a complete match.
Partial matching is mainly useful in two scenarios: (1) validating user input in real time and (2) incremental / multi-segment matching.
Suppose we want input a date in a specific format like the following: "MMM dd, yyyy". We can check the input validity with a pattern as shown below:
This pattern is only intended to validate the month name.
As another example, say we want to validate input with a regular expression while the user is typing it. This can be advantageous, so we can report an error right away and let the user know they typed an invalid key. In order to do so we must distinguish between three cases:
These three cases represent exactly the possible states of a QValidator.
In particular, in the last case we want the regular expression engine to report a partial match: we are successfully matching the pattern against the subject string but the matching cannot continue because the end of the subject is encountered. Notice, however, that the matching algorithm should continue and try all possibilities, and in case a complete (non-partial) match is found, then this one should be reported, and the input string accepted as fully valid.
This behavior is implemented by the PartialPreferCompleteMatch match type.
If matching the same regular expression against the subject string leads to a complete match it is reported as usual.
Another example with a different pattern, showing the behavior of preferring a complete match over a partial one.
In this case the subpattern abc\\w+X partially matches the subject string. However, the subpattern def matches the subject string completely and therefore a complete match is reported.
If multiple partial matches are found when matching (but no complete match), then the QRegularExpressionMatch object will report the first one that is found. For instance:
Incremental matching is another use case of partial matching. Suppose that we want to find the occurrences of a regular expression inside a large text (that is, substrings matching the regular expression). In order to do so we would like to "feed" the large text to the regular expression engines in smaller chunks. The obvious problem is what happens if the substring that matches the regular expression spans across two or more chunks.
In this case, the regular expression engine should report a partial match, so that we can match again adding new data and (eventually) get a complete match. This implies that the regular expression engine may assume that there are other characters "beyond the end" of the subject string.
QRegularExpression implements this behavior when using the PartialPreferFirstMatch match type. This match type reports a partial match as soon as it is found, and other match alternatives are not tried (even if they could lead to a complete match). For instance:
This happens because when matching the first branch of the alternation operator a partial match is found, and therefore matching stops, without trying the second branch.
This shows what could seem a counterintuitive behavior of quantifiers: since ? is greedy, then the engine tries first to continue the match after having matched "abc" but then the matching reaches the end of the subject string, and therefore a partial match is reported. This is even more surprising in the following example.
Remember, the engine expects the subject string to be only a substring of the whole text we are looking for. Since the * quantifier is greedy, then reporting a complete match could be an error, because after the current subject "abc" there may be other occurrences of "abc". For instance, the complete text could have been "abcabcX", and therefore the right match to report (in the complete text) would have been "abcabc" by matching only against the leading "abc" we instead get a partial match.
It is possible for a QRegularExpression object to be invalid because of syntax errors in the pattern string. The isValid() function will return true if the regular expression is valid, or false otherwise:
You can get more information about the specific error by calling errorString().
If a match is attempted with an invalid QRegularExpression then the returned QRegularExpressionMatch object will be invalid as well. QRegularExpressionMatch::isValid() will return false. The same applies for calling globalMatch().
When the regular expression pattern matches the entire search string this is referred to as an exact match. If you need an exact math use the QPatternOption::ExactMatchOption when constructing your regular expression.
The following examples demonstrate how to update application code which used QRegExp and now uses QRegularExpression.
|
default |
Constructs a QRegularExpression with an empty search pattern and no pattern options.
|
explicit |
Constructs a QRegularExpression using the search pattern and the pattern options.
|
default |
Copy constructs a new QRegularExpression from other.
|
default |
Move constructs a new QRegularExpression from other.
int QRegularExpression< S >::captureCount | ( | ) | const |
Returns the number of capturing groups inside the pattern string or -1 if the regular expression is not valid. The count does not include capture group #0, this is a special capture group which is always present.
|
inline |
Returns a textual description of the error found when checking the validity of the regular expression, or "no error" if no error was found.
|
static |
Returns the given str with all special characters prepended with a backslash. If the resulting string is used as a pattern it will match literally without any wildcards.
|
inline |
Attempts to match the regular expression with the string str.
QList< QRegularExpressionMatch< S > > QRegularExpression< S >::globalMatch | ( | const S & | str, |
typename S::const_iterator | offset, | ||
QMatchType | matchType = QMatchType::NormalMatch , |
||
QMatchOptionFlags | matchOptions = QMatchOption::NoMatchOption |
||
) | const |
Attempts to match the regular expression with the string str, starting at the position iterator offset, Uses a match of type matchType and honoring the given matchOptions.
Each element in the QList represents the results of a single match.
|
inline |
Attempts to match the regular expression with the string view str.
QList< QRegularExpressionMatch< S > > QRegularExpression< S >::globalMatch | ( | QStringView< S > | str, |
typename S::const_iterator | offset, | ||
QMatchType | matchType = QMatchType::NormalMatch , |
||
QMatchOptionFlags | matchOptions = QMatchOption::NoMatchOption |
||
) | const |
Attempts to match the regular expression with the string view str, starting at the position iterator offset, Uses a match of type matchType and honoring the given matchOptions.
Each element in the QList represents the results of a single match.
|
inline |
Returns true if the regular expression is a valid regular expression, or false otherwise. Use errorString() to obtain a textual description of the error.
|
inline |
Refer to QRegularExpression::match().
QRegularExpressionMatch< S > QRegularExpression< S >::match | ( | const S & | str, |
typename S::const_iterator | offset, | ||
QMatchType | matchType = QMatchType::NormalMatch , |
||
QMatchOptionFlags | matchOptions = QMatchOption::NoMatchOption |
||
) | const |
Attempts to match the regular expression with the string str, starting at the iterator position offset. Uses a match of type matchType and honoring the matchOptions.
The returned QRegularExpressionMatch object contains the results of the match.
|
inline |
Refer to QRegularExpression::match().
QRegularExpressionMatch< S > QRegularExpression< S >::match | ( | QStringView< S > | str, |
typename S::const_iterator | offset, | ||
QMatchType | matchType = QMatchType::NormalMatch , |
||
QMatchOptionFlags | matchOptions = QMatchOption::NoMatchOption |
||
) | const |
Attempts to match the regular expression with the string view str, starting at the iterator position offset. Uses a match of type matchType and honoring the matchOptions.
The returned QRegularExpressionMatch object contains the results of the match.
QList< S > QRegularExpression< S >::namedCaptureGroups | ( | ) | const |
Returns a list of elements containing the names of the named capture groups in the pattern string. If the regular expression is not valid an empty list is returned.
Given the following regular expression, namedCaptureGroups() will return the list shown below.
|
default |
Copy assigns from other and returns a reference to this object. Both the pattern and the pattern options are copied.
|
default |
Move assigns from other and returns a reference to this object. Both the pattern and the pattern options are moved.
|
inline |
Returns the pattern string of the regular expression.
int QRegularExpression< S >::patternErrorOffset | ( | ) | const |
Returns the position of the code point in the regular expression where an error was encountered.
|
inline |
Returns the pattern options for this regular expression.
void QRegularExpression< S >::setPattern | ( | const S & | pattern | ) |
Sets the pattern string of the regular expression to pattern. The pattern options are left unchanged.
|
inline |
Sets the given options as the pattern options of the regular expression. The string is left unchanged.
|
inline |
Swaps the match result other with this match result. This operation is very fast and never fails.
|
related |
QMatchOption is a class enum defining where the pattern can match in the string.
Constant | Description |
---|---|
NoMatchOption | No match options are set |
AnchoredMatchOption | A match is only allowed at the beginning of the string as if the pattern started with ^ |
|
related |
The QMatchType enum defines how the string pattern will be matched with the string.
Constant | Description |
---|---|
NormalMatch | Normal match is done |
PartialPreferCompleteMatch | If a partial match is found other matching alternatives are tried as well. If a complete match is found then it is preferred to the partial match. Only the complete match is used. If instead no complete match is found and there is a partial match, the partial match is used. |
NoMatch | Set when using a default constructed QRegularExpressionMatch, if passed to the match methods it will be ignored |
|
related |
The QPatternOption enum defines modifiers to the way the pattern string should be interpreted, and therefore the way the pattern matches against a subject string.
Constant | Description | Notes |
---|---|---|
NoPatternOption | No pattern options are set | |
CaseInsensitiveOption | Pattern is matched with the string case insensitively | Similar to Perl /i modifier |
DotMatchesEverythingOption | Dot metacharacter (.) in the pattern is allowed to match any character in the string including newlines, normally the dot does not match newlines | Similar to Perl /s modifier |
MultilineOption | Caret (^) and dollar sign ($) in the pattern are allowed to match immediately after and immediately before any newline in the string as well as the beginning and the end of the string | Similar to Perl /m modifier |
ExtendedPatternSyntaxOption | Any whitespace in the pattern string which is not escaped and outside a character class is ignored. An unescaped sharp (#) outside a character class causes all the following characters until the first newline (included) to be ignored. | Similar to Perl /x modifier |
ExactMatchOption | Force the regular expression to match the entire search string, otherwise the match will fail | |
DontCaptureOption | Non-named capturing groups do not capture substrings, only named groups and the special group number #0 will capture | |
WildcardOption | Interpret the pattern as a file wild card | |
WildcardUnixOption | Interpret the pattern as a file wild card, question mark and star can be escaped |