dimanche 26 septembre 2021

Catch text using a regex with specific rules

Ok, I'm trying to catch text using a regex with the following rules:

  • Each new line starts with the word type or tag, and : comes after that. | type or tag should be the capture group 1
  • A varchar might come after : | That varchar should be the capture group 2
  • \\ comes after that
  • A number comes after \\ | That number should be the capture group 3
  • ? might come after the number
  • If we have ?, a varchar might come after ? | That varchar should be the capture group 4
  • If we have ? + a varchar, then : might come after that
  • If we have ?+ a varchar + :, then a varchar might come after that | That varchar should be the capture group 5

Examples:

type:test\\1?value12:value9        // Should get: Group 1 = type, Group 2 = test, Group 3 = 1, Group 4 = value12, Group 5 = value9

type:\\22?value62:value3        // Should get: Group 1 = type, Group 2 = NULL, Group 3 = 22, Group 4 = value62, Group 5 = value3

My regex is:

/(type|tag):([^\\]+)?\\\\([0-9]{1,3})?\??([^\:]+):([^\:]+)?/i

I believe that it's not accurate. It can be written to be better and more strict, for example:

type:\\1p?hello:iii

The current regex matches p?hello as Group 4, however, it should capture group 4 as hello

Anyone can help please? Thanks!

Aucun commentaire:

Enregistrer un commentaire