vendredi 28 mai 2021

Spark Streaming Design Pattern

I have a stream of events contains different type of activities, in spark streaming application need to do different persistence and aggregation processing based on different activity type, is there a good design pattern I could use in the spark streaming application to avoid simply filter/select by activity type ? currently I am doing something like below:

inputDF.persist()

val userActivityDF = inputDF.filter(col("activityType") isin("clicked", "chatted", "opened")) 
userActivityDF.writeStream....

val userClosedDF = inputDF.filter(col("activityType") = "closed")
processUserClosed(userClosedDF) 

val userSkippedDF = inputDF.filter(col("activityType") = "skipped")
processUserSkipped(userSkippedDF) 

.....

inputDF.unpersist()

Aucun commentaire:

Enregistrer un commentaire