I have to check the validity of URLs in a large XML file.
I have to use this pattern to check if the URLs are valid or not:
<xs:simpleType name="url">
<xs:restriction base="xs:string">
<xs:pattern value="https?://([^/:]+\.[a-zA-Z]{2,10}|([0-9]{1,3}\.){3}[0-9]{1,3})(:[0-9]+)?((/|\?).*)?"/>
</xs:restriction>
</xs:simpleType>
Here's the java code to check the XML file against the XSD file (I use the library javax.xml.*):
Validator validator = xmlSchema.newValidator();
SAXSource sourceXML = new SAXSource(
new NamespaceFilter(XMLReaderFactory.createXMLReader()),
new InputSource(new FileInputStream(new File(pathXmlFile)))
);
validator.validate(sourceXML, null);
(note : NamespaceFilter is just a class extending XMLFilterImpl to avoid to check the namespace of the XML file)
Unfortunately, in the XML file, there are some URLs with the character " " which represents a carriage return: http://xxx.yyy.zz/exampleofurl\ containinganannoyingcarriagereturn
When I run my code with a XML file containing this kind of URL, I get this error:
org.xml.sax.SAXParseException; lineNumber: 238719; columnNumber: 129; cvc-pattern-valid: Value 'http://xxx.yyy.zz/exampleofurl
containinganannoyingcarriagereturn' is not facet-valid with respect to pattern 'https?://([^/:]+\.[a-zA-Z]{2,10}|([0-9]{1,3}\.){3}[0-9]{1,3})(:[0-9]+)?((/|\?).*)?' for type 'url'.
It seems like the XSD validator interprets the " " as a carriage return and cause this issue when applying the pattern.
Is there a way to force the validator to not interpret this ASCII code?
From my point of view, the URL is valid and respect the pattern.
Aucun commentaire:
Enregistrer un commentaire