jeudi 3 décembre 2020

XSD validation issue with pattern when containing carriage return ( )

I have to check the validity of URLs in a large XML file.

I have to use this pattern to check if the URLs are valid or not:

<xs:simpleType name="url">
    <xs:restriction base="xs:string">
         <xs:pattern value="https?://([^/:]+\.[a-zA-Z]{2,10}|([0-9]{1,3}\.){3}[0-9]{1,3})(:[0-9]+)?((/|\?).*)?"/>
    </xs:restriction>
  </xs:simpleType>

Here's the java code to check the XML file against the XSD file (I use the library javax.xml.*):

Validator validator = xmlSchema.newValidator();
        
SAXSource sourceXML = new SAXSource(
        new NamespaceFilter(XMLReaderFactory.createXMLReader()),
        new InputSource(new FileInputStream(new File(pathXmlFile)))
    );
        
validator.validate(sourceXML, null);

(note : NamespaceFilter is just a class extending XMLFilterImpl to avoid to check the namespace of the XML file)

Unfortunately, in the XML file, there are some URLs with the character "&#13;" which represents a carriage return: http://xxx.yyy.zz/exampleofurl\ containinganannoyingcarriagereturn

When I run my code with a XML file containing this kind of URL, I get this error:

org.xml.sax.SAXParseException; lineNumber: 238719; columnNumber: 129; cvc-pattern-valid: Value 'http://xxx.yyy.zz/exampleofurl
containinganannoyingcarriagereturn' is not facet-valid with respect to pattern 'https?://([^/:]+\.[a-zA-Z]{2,10}|([0-9]{1,3}\.){3}[0-9]{1,3})(:[0-9]+)?((/|\?).*)?' for type 'url'.

It seems like the XSD validator interprets the "&#13;" as a carriage return and cause this issue when applying the pattern.

Is there a way to force the validator to not interpret this ASCII code?

From my point of view, the URL is valid and respect the pattern.

Aucun commentaire:

Enregistrer un commentaire