all.
I've met the case when I need to clean up a dataset. Specifically, I need to clean up EVTYPE variable with list of patterns. Anyone knows ways to resolve that?
I have a dtaset:
> str(truncData)
'data.frame': 902297 obs. of 8 variables:
$ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
$ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
$ FATALITIES : num 0 0 0 0 0 0 0 0 1 0 ...
$ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
$ PROPDMG : num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
$ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
$ sum.inj.fat: num 15 0 2 2 2 6 1 0 15 0 ...
$ sum.dmg : num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
EVTYPE field has messy data and it happens that the same Events have different values due to uppercase, lowercase, spaces used incorrectly while data was put into the system. I have a standard list of event types which I used to create a list of patterns
EVTYPE DESIGNATOR pattern
1 Astronomical Low Tide Z ^Astronomical Low Tide
2 Avalanche Z ^Avalanche
3 Blizzard Z ^Blizzard
4 Coastal Flood Z ^Coastal Flood
5 Cold/Wind Chill Z ^Cold/Wind Chill
6 Debris Flow C ^Debris Flow
I've tried to apply the list of patterns to the dataset to define and replace incorrect values. It could be really hard exercise but to do it manually the last thing I wish.
I also tried to code a function but the result is not I really expected:
find.pattern<- function(dataset, pattern, var.replace) {
var1 <- subset(dataset
, grepl(pattern
, dataset[2]
, ignore.case = T
)
)
var1 <- gsub(pattern, var.replace, var1)
return(var1)
}
x <- mapply(find.pattern
,truncData
, types.list$pattern
, types.list$EVTYPE)
result is:
> str(x)
List of 48
$ STATE__ : chr(0)
$ EVTYPE : chr(0)
$ FATALITIES : chr(0)
$ INJURIES : chr(0)
$ PROPDMG : chr(0)
$ CROPDMG : chr(0)
$ sum.inj.fat: chr(0)
$ sum.dmg : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
$ NA : chr(0)
Aucun commentaire:
Enregistrer un commentaire