input example:
<pdf>
<page 1><addressbox><value>adress 1</value></addressbox></page>
<page 2><addressbox><value>adress 2</value></addressbox></page>
<page 3><addressbox><value>adress 2</value></addressbox></page>
<page 4><addressbox><value>adress 2</value></addressbox></page>
<page 5><addressbox><value>adress 3</value></addressbox></page>
<page 6><addressbox><value></value></addressbox></page>
<page 7><addressbox><value>adress 3</value></addressbox></page>
<page 8><addressbox><value>adress 4</value></addressbox></page>
<page 9><addressbox><value>i am not a adress</value></addressbox></page>
<page 10>
<addressbox>
<value>adress 6</value>
<collect>true</collect>
</addressbox>
</page>
<page 11><addressbox><value>address 7</value></addressbox></page>
<page 12><addressbox><value>address 2</value></addressbox></page>
</pdf>
</code>
goal: collect pages to logical units
definition: If the value of the string starts with "address" then it's a valid address otherwise not.
rules:
- first page starts a logical unit
- the following pages belong to the previous logical unit if it has the same value or is a empty value or collect=true
the results should look like this:
<unit>
<unit 1>page1</unit>
<unit 2>page2, page3, page4</unit>
<unit 3>page5, page6, page7</unit>
<unit 4>page8</unit>
<unit 5>page9</unit>
<unit 6>page10, page11</unit>
<unit 7>page12</unit>
</unit>
I would like to solve the problem with design pattern, since the rules might change slightly. Got the Pattern Visitor, Decorator, Composite and Chain of responsibility watched and not really found something.
Which pattern would you recommend?
Aucun commentaire:
Enregistrer un commentaire