lundi 25 mai 2020

How can I imitate escaping characters in Lua?

I'm making a parser and I want the input string to support escape characters so that, for example, if parse("Hello [world]") yields: Hello world, then parse("Hello /[world]") would just yield: Hello [world]. I have an implementation that works, however it's constricting.

local function escapeStr(str)
    local escapedStr = str:gsub("/(%[%])", "\1") -- for parsing
    local regStr = str:gsub("/(%[%])", "%1") -- for displaying

    return escapedStr, regStr
end

This function is creating 2 versions of the input string. The first one (escapedStr) replaces an escaped character /[ with an empty character \1. This is the version of the string that the parser uses which I iterate over with gmatch, and it ignores special characters because they've been replaced with \1. Then, during the iteration, I use regStr:sub(start, end) when I want to extract a substring that is going to be displayed to the user, since regStr is what the escaped string should look like when displayed, and regStr and escapedStr are always the same length.

This solution is constricting in that in order to do regStr:sub(start, end), I need to keep track of the position in the string as I'm iterating over it, which is not ideal in more complex situations. It doesn't seem too bad here, for example:

local str = "hello [world], wonderful day today"
local escapedStr, regStr = escapeStr(str)

for begin, stop in escapedStr:gmatch("()%[.-%]()") do
    print(regStr:sub(begin, stop - 1)) --> [world]
end

...but this is only because I'm not matching anything else other than everything between the square brackets. If I wanted to match more patterns in the substring, I'd have to add more captures in my initial string pattern which would quickly get out of hand (messy/lengthy).

for begin0, begin1, stop1, stop0 in escapedStr:gmatch("()%[%a+:()%a+()%]()") do
    local entire_match = regStr:sub(begin0, stop0 - 1)
    local second_match = regStr:sub(begin1, stop1 - 1)

    print(entire_match) --> [world:earth]
    print(second_match) --> earth
end

And in my case I have a lot of matching within the substrings the parser initially selects, and I would like to do something like: "%[(.-)%]" to return the data I need rather than "()%[.-%]()" paired with regStr:sub(start, end) to accomplish the same thing.

I feel like I'm using a very unconventional way of implementing escape characters, so if anyone has a better solution, I would really appreciate it!

Aucun commentaire:

Enregistrer un commentaire