Basically the issue I am having is with separating object and direct object pronouns from verbs.
Ie 'aprenderlo' should ideally be tokenized as two separate entities, 'dimelo' should be tokenized as three. I have tried a variety of taggers in both libraries, and so far nothing has produced the results I want. However, I am sure this must be a common problem - any ideas?
Aucun commentaire:
Enregistrer un commentaire