mercredi 24 août 2016

FPGrowth with uncertain unique series

I just implemented Spark's MLLib FPGrowth for my data series that look like:

x y a z r x a s g
z f a e g z

etc. etc.

and I got the exception about uniqueness, and I realized that I am not supposed to have the same letter appearing twice in one row. I am using user behavior event data, where each line represents a session and each letter an event that is an element that has been clicked. Since one event can re-appear through the pass of time I am not allowed to use FPGrowth. My objective is to be able to train a model that, given a window of 3-4 events can predict what will come next based on the average behavior that all these sessions had. Is there another alternative that does something similar and can allow duplicates?

Thank you

Aucun commentaire:

Enregistrer un commentaire