I'm trying to construct a regular expression to match an input/test String
variable against a list of strings. The logic needs to be as follows :
-
If there is an item in the list that is simply a single asterisk "*", consider that a match
-
If there is an item in the list where the entirety of the string matches the beginning of the test string, consider that a match
Test String | List Item | Expected Result |
---|---|---|
ABCD | A | Match |
ABCD | AB | Match |
ABCD | ABC | Match |
ABCD | ABCD | Match |
ABCD | ABCDE | Not a match |
ABCD | AC | Not a match |
ABCD | ABD | Not a match |
ABCD | XA | Not a match |
ABCD | XABC | Not a match |
- If there is an item in the list which contains one or more period characters ("."), they should each be treated as a single-character wildcard (equivalent of "?" or could also be [A-Z]{1})
Test String | List Item | Expected Result |
---|---|---|
ABCD | A.C | Match |
ABCD | AB.D | Match |
ABCD | A.CD | Match |
ABCD | A..D | Match |
ABCD | A.B | Not a match |
ABCD | A.BC | Not a match |
- Similarly, any period characters in the test string should also be treated as single-character wildcards
Test String | List Item | Expected Result |
---|---|---|
A..D | ABCD | Match |
A..D | AB.D | Match |
A..D | A.CD | Match |
A..D | A..D | Match |
A..D | ABC | Not a match |
A..D | ACBD | Not a match |
A..D | A.BC | Not a match |
(NB : Period characters cannot appear at the very start or very end of either the test string nor any of the list items - only surrounded by alpha characters)
So, taking the "ABCD" example, a (poor) regular expression that (I think?) would work would be something like :
^((\*)|(A)|(AB)|(ABC)|(ABCD)|(A\.CD)|(AB\.D)|(A\.\.D))$
The "A..D" example is slightly more straightforward (again, I think?) :
^((\*)|(A)|(A.)|(A..)|(A..D))$
However - the test string is dynamic (string variable) so that would mean I would have to construct this pattern with some kind of loop or nested loops based on the characters in the test string every time I need to run a pattern match. Which is fine if the test string is short like "ABCD" but the pattern complexity grows exponentially as the length of the test string increases.
For example, if "ABCD" is changed to "ABCDE", the equivalent pattern becomes :
^((\*)|(A)|(AB)|(ABC)|(ABCD)|(ABCDE)|(A\.CDE)|(AB\.DE)|(ABC\.E)|(A\.\.DE)|(AB\.\.E)|(A\.\.\.E))$
So... I'm just wondering if there's a smarter way of constructing a regex pattern that meets these rules, based on an input/test String
variable of arbitrary length?
Aucun commentaire:
Enregistrer un commentaire