mercredi 18 septembre 2019

Identify patterns within list of words with pattern threshold

Working on a pattern recognition function in Python that suppose to return an array of patterns with a counter

Let's imagine a list of strings:

m = ['ABA','ABB', 'ABC','BCA','BCB','BCC','ABBC', 'ABBA', 'ABBC']

at the high-level, what I would like to get back is:

Pattern | Count
----------------
   AB   |   6
  ABB   |   3
   BC   |   2
----------------

The problem: all I know that patterns begin with 2 characters and are leading characters for each string value (i.e. XXZZZ, XXXZZZ (where XX is a pattern that I'm looking for)). I would like to be able to parametrize minimal length of a pattern as a function's input to optimize the run time.

PS. each item in the list is a single word already.

my problem is that I need to iterate for each letter starting from the threshold, and I'm getting stuck there. I'd prefer to use startswith('AB')

Aucun commentaire:

Enregistrer un commentaire