jeudi 29 juin 2017

Java regex not working in extracted text

I'm trying to use regex to match a part of the text. First I extract the text using Tika (it's a somewhat large pdf, 3mb of text) then I try to use the matcher on this text but it doesnt work

My text follow this pattern

Word: Match match match match

My regex is written this way

private final Pattern pattern = Pattern.compile("(word:\\s)(.*$)", Pattern.CASE_INSENSITIVE);

When I run this on a small sample, like the example above it'll work as expected. I'll have group(2) with "Match match match match"

But with the big text it doesnt work. I saved the file in .txt after and tried the ctrl+f with the regex expression and it works but not in my code.

What could it be ?

Aucun commentaire:

Enregistrer un commentaire