jeudi 17 décembre 2020

regex patterns on R: n-dash, m-dash, parentheses

I have two regex questions:

Part 1: I have a character vector with strings like this:

raw_strings <- c("hello world (abc)", "no hi world (abc(d))")

And I want to extract the content inside the first set of parenthesis, like this:

clean_strings <- c("abc", "abc(d)")

So far, I have been using this:

str_extract(raw_strings, "(?<=\\().+?(?=\\))")

However, that results in this:

"abc" "abc(d"

How could I change the expression to keep the final parenthesis?

Part 2: I have some strings that looks like this:

b_strings <- c("5.2 ko – word (longer word)", "5.9 ko - two words (long)")

I would like to have this:

b_strings_clean <- c("word", "two words")

So far I have done this:

str_extract(ac_meta, "\\s[^-–]*$")

Which results in:

"word (longer word)" "two words (long)"

How can I remove everything after the parenthesis (including the parenthesis)?

Thank you.

Aucun commentaire:

Enregistrer un commentaire