jeudi 10 janvier 2019

Prefixspan frequent item set how to use in production environment

I used the prefixspan algorithm to mine the behavior data of APP users to get frequent item sets, which I used to distinguish different users.

I have some problems:

How to clean the timing sequence of the behavioral data of the same user? Is it necessary to divide different behavioral paths of the same user by time interval. I tried intervals of half an hour and intervals of two hours.

There are many frequent item sets. How do you use these frequent item sets, pick the important ones or use them all Here is an example of the resulting frequent itemsets:

<blink>
   FreqSequence(sequence=[['click.jq_qjq_jkxq']], freq=2463)
   FreqSequence(sequence=[['click.jq_qjq_jkxq', 'input.jq_qjq_zfmm']], 
      freq=2440)
   FreqSequence(sequence=[['click.xfd_smrz']], freq=2455)
   FreqSequence(sequence=[['click.xfd_smrz', 'click.xfd_yhkrz']], 
      freq=2434)
   FreqSequence(sequence=[['click.xfd_smrz', 'click.xfd_yhkrz', 
      'click.xfd_yhkrz_yzm']], freq=2370)
   FreqSequence(sequence=[['click.xfd_smrz', 'click.xfd_yhkrz', 
      'input.xfd_yhkrz_yzm']], freq=2381)
   FreqSequence(sequence=[['click.xfd_smrz', 'click.xfd_yhkrz', 
      'input.xfd_yhkrz_yzm', 'click.xfd_yhkrz_yzm']], freq=2328)
   FreqSequence(sequence=[['click.xfd_smrz', 'click.xfd_yhkrz_yzm']], 
      freq=2379)
   FreqSequence(sequence=[['click.xfd_smrz', 'input.xfd_yhkrz_yzm']], 
      freq=2391)
   FreqSequence(sequence=[['click.xfd_smrz', 'input.xfd_yhkrz_yzm', 
      'click.xfd_yhkrz_yzm']], freq=2337)
   FreqSequence(sequence=[['click.xfd_smrz_fmz']], freq=2472)
   FreqSequence(sequence=[['click.xfd_smrz_fmz', 'click.xfd_smrz']], 
     freq=2450)
   FreqSequence(sequence=[['click.xfd_smrz_fmz', 'click.xfd_smrz', 
     'click.xfd_yhkrz']], freq=2432)
   FreqSequence(sequence=[['click.xfd_smrz_fmz', 'click.xfd_smrz', 
     'click.xfd_yhkrz', 'click.xfd_yhkrz_yzm']], freq=2367)
   FreqSequence(sequence=[['click.xfd_smrz_fmz', 'click.xfd_smrz', 
    'click.xfd_yhkrz', 'input.xfd_yhkrz_yzm']], freq=2378)
   FreqSequence(sequence=[['click.xfd_smrz_fmz', 'click.xfd_smrz', 
    'click.xfd_yhkrz', 'input.xfd_yhkrz_yzm', 'click.xfd_yhkrz_yzm']], 
     freq=2325)
</blink>

I've got about 10,000 frequent item sets like that. Because frequent item sets are more frequent, do I need to pick the important ones or use them all.

How do you apply frequent item sets to a production environment? My idea is to use frequent item sets as a rule for differentiating between different user groups, which goes back to the old problem of too many frequent item sets, how do you pick important frequent item sets.

Thanks for any help!

Aucun commentaire:

Enregistrer un commentaire