samedi 23 février 2019

Pattern module issues (NLP learning)

I've been learning NLP text classification via book "Text Analytics with Python". It's required several modules to be installed in a virtual environment. I use Anaconda env. I created a blank env with Python 3.7 and installed required pandas, numpy, nltk, gensim, sklearn... then, I have to install Pattern. The first problem is that I can't install Pattern via conda because of a conflict between Pattern and mkl_random.

(nlp) D:\Python\Text_classification>conda install -c mickc pattern Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - mkl_random
  - pattern
Use "conda info <package>" to see the dependencies for each package.

It's impossible to remove mkl_random because there're related packages: gensim, numpy, scikit-learn etc. I don't know what to do, I didn't find any suitable conda installations for Pattern that is accepted in my case. Then, I installed Pattern using pip. Installation was successful. Is it okay to have packages from conda and from pip at the same time?

The second problem, I think, is connected with the first one. I downloaded the book's example codes from https://github.com/dipanjanS/text-analytics-with-python/tree/master/Old-First-Edition/source_code/Ch04_Text_Classification, added brackets to Python 2.x 'print' functions and run classification.py The program raised an exception:

Traceback (most recent call last):
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 609, in _read
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "classification.py", line 50, in <module>
    norm_train_corpus = normalize_corpus(train_corpus)
  File "D:\Python\Text_classification\normalization.py", line 96, in normalize_corpus
    text = lemmatize_text(text)
  File "D:\Python\Text_classification\normalization.py", line 67, in lemmatize_text
    pos_tagged_text = pos_tag_text(text)
  File "D:\Python\Text_classification\normalization.py", line 58, in pos_tag_text
    tagged_text = tag(text)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 188, in tag
    for sentence in parse(s, tokenize, True, False, False, False, encoding, **kwargs).split():
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 169, in parse
    return parser.parse(s, *args, **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1172, in parse
    s[i] = self.find_tags(s[i], **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\en\__init__.py", line 114, in find_tags
    return _Parser.find_tags(self, tokens, **kwargs)
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 1113, in find_tags
    lexicon = kwargs.get("lexicon", self.lexicon or {}),
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 376, in __len__
    return self._lazy("__len__")
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 368, in _lazy
    self.load()
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in load
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
  File "C:\Users\PC\Anaconda3\envs\nlp\lib\site-packages\pattern\text\__init__.py", line 625, in <genexpr>
    dict.update(self, (x.split(" ")[:2] for x in _read(self._path) if len(x.split(" ")) > 1))
RuntimeError: generator raised StopIteration

I don't understand what is happening. Is the exception raised because my installation with pip, or the problem is in the wrong or deprecated code in the book... and is it possible to install Pattern in conda with all other necessary packages.

Thank you in advance!

Aucun commentaire:

Enregistrer un commentaire