lundi 15 janvier 2018

R text-mining including patterns and numbers

My dataset contains a free text field with information on building plans. I need to split the content of the field in 2 parts, the first part contains only the number of planned buildings, the other only the type of building. I have a reference lexicon list with the types of buildings.

Example

Plans<- c("build 10 houses ","5 luxury apartments with sea view", "renovate 20 cottages"," transform 2 bungalows and a school", "1 hotel")

Reference list

Types <-c("houses", "cottages", "bungalows", "luxury apartments")

Desired Output 2 colums, Number and Type, with this content:

Number Type

10 houses

5 apartments

20 cottages

2 bungalows

Tried

matches <- unique (grep(paste(Types,collapse="|"), Plans, value=TRUE))

I can match the plans and types, but I can’t extract the numbers and types into two columns. I tried str_split_fixed and gepl using :digit: and :alpha: but it isn’t working.

Many thanks for help!

Aucun commentaire:

Enregistrer un commentaire