I have a stream of events contains different type of activities, in spark streaming application need to do different persistence and aggregation processing based on different activity type, is there a good design pattern I could use in the spark streaming application to avoid simply filter/select by activity type ? currently I am doing something like below:
inputDF.persist()
val userActivityDF = inputDF.filter(col("activityType") isin("clicked", "chatted", "opened"))
userActivityDF.writeStream....
val userClosedDF = inputDF.filter(col("activityType") = "closed")
processUserClosed(userClosedDF)
val userSkippedDF = inputDF.filter(col("activityType") = "skipped")
processUserSkipped(userSkippedDF)
.....
inputDF.unpersist()
Aucun commentaire:
Enregistrer un commentaire