lundi 27 juillet 2015

Android: Parsing of ics file using Pattern/Regex not working

I am confused, the following work on Eclipse, so I assumed it'll work on Android too. Apparently not.

In .ics file, generally, each line follows either one of the following format:

  • FIELD;miscellaneousstring:VALUE
  • FIELD:VALUE

I'm interested in obtaining all possible information, FIELD, VALUE, and miscellaneousstring (if possible). Each line is separated by a colon, or a colon and a semicolon (although there are rare cases where they end with another colon too).

Here is my code:

//Pattern to determine if string follows the format:
// (a) FIELD;miscellaneousstring:VALUE
// (b) FIELD:VALUE
private static final String withSemiColon = "^([A-Z]+);(.*):([^:]+):?";
private static final String withColon = "^([A-Z]+ ):([^:]+):?";

Pattern stringWithSemiColon = Pattern.compile(withSemiColon);
Pattern stringWithColon = Pattern.compile(withColon);

String line;
//line is each line of the ics file

Matcher matchWithSemiColon = stringWithSemiColon.matcher(line);
Matcher matchWithColon = stringWithColon.matcher(line);

if (matchWithSemiColon.matches()) {
    field = matchWithSemiColon.group(1);
    intermediate = matchWithSemiColon.group(2);
    value = matchWithSemiColon.group(3);
}
//Otherwise, split string as per normal
else if (matchWithColon.matches()) {
    field = matchWithSemiColon.group(1);
    value = matchWithSemiColon.group(2);
    //Clear value of intermediate
    intermediate = null;
} else {
    Log.i("UNMATCHED", line);
    continue;
}

It appears that no matter what I do, I will always log UNMATCHED for every single line of the file.

To put things in perspective, here is an extract of an ics file:

BEGIN:VCALENDAR
CALSCALE:GREGORIAN
PRODID:-//Cyrusoft International\, Inc.//Mulberry v4.0//EN
VERSION:2.0
X-WR-CALNAME:PayDay
BEGIN:VTIMEZONE
LAST-MODIFIED:20040110T032845Z
TZID:US/Eastern
BEGIN:DAYLIGHT
DTSTART:20000404T020000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20001026T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20050211T173501Z
DTSTART;VALUE=DATE:20040227
RRULE:FREQ=MONTHLY;BYDAY=-1MO,-1TU,-1WE,-1TH,-1FR;BYSETPOS=-1
SUMMARY:PAY DAY
UID:DC3D0301C7790B38631F1FBB@ninevah.local
END:VEVENT
END:VCALENDAR

I've applied what I read from Mastering Regular Expression, by Jeffrey E. F. Friedl, I've tested the above with Eclipse and it works. So I'm not sure where I went wrong.

Aucun commentaire:

Enregistrer un commentaire