Robert Newman Consulting

Technology Architect, Senior Software Engineer, Server & Client-side Web Development

Using PyXB in Django to validate XML docs from XSD schemas

Posted by Rob Newman on April 3, 2013

In one of my daytime work projects I built a Django interface to contain a series of web-services. One of the model components is the service WADL which is a machine-readable XML document that describes the RESTful service. Users cut and paste the XML document into a models.TextField.

We wanted to validate the pasted WADL XML TextField before it is saved in the database, so I hunted around for a pure Python-based XML validator. The best one seemed to be PyXB (pronounced pixbee) which although is not a validator per se, it does XML validation as part of its core.

PyXB is a pure Python package that generates Python source code for classes that correspond to data structures defined by an XMLSchema. In a nutshell, you can pass the installed PyXB command line tools an XSD schema file (in our WADLs case, this was wadl.xsd ), and PyXB will convert this to a Python file containing Python classes and methods. You can then import this new file into your Python (read – Django) apps in the standard way import myconvertedxsd and validate on the fly via Django’s ModelAdmin forms. PyXB even has a whole host of validation exceptions that you can catch in your code and actually give meaningful error messages to your users via the Django admin interface. Sweet! Details:

1. Install PyXB

In your virtual environment, install PyXB:

pip install pyxb
Downloading/unpacking pyxb
  Downloading PyXB-1.2.1.tar.gz (8.3Mb): 8.3Mb downloaded
  Running setup.py egg_info for package pyxb
    Found bundle in /path/to/build/pyxb/pyxb/bundles/common
    Found bundle in /path/to/build/pyxb/pyxb/bundles/dc
    Found bundle in /path/to/build/pyxb/pyxb/bundles/saml20
    Found bundle in /path/to/build/pyxb/pyxb/bundles/wssplat
    warning: no files found matching 'MANIFEST'
    no previously-included directories found matching 'pyxb/bundles/core/schemas'
    no previously-included directories found matching 'pyxb/bundles/core/remote'
    no previously-included directories found matching 'pyxb/bundles/wssplat/schemas'
    no previously-included directories found matching 'pyxb/bundles/wssplat/remote'
    no previously-included directories found matching 'pyxb/bundles/opengis/schemas'
    no previously-included directories found matching 'pyxb/bundles/opengis/remote'
    no previously-included directories found matching 'doc/_build'
    warning: no previously-included files found matching 'doc/*.eap'
    no previously-included directories found matching 'doc/W3C'
    warning: no previously-included files matching '*~' found anywhere in distribution
Installing collected packages: pyxb
  Running setup.py install for pyxb
    Found bundle in /path/to/build/pyxb/pyxb/bundles/common
    Found bundle in /path/to/build/pyxb/pyxb/bundles/dc
    Found bundle in /path/to/build/pyxb/pyxb/bundles/saml20
    Found bundle in /path/to/build/pyxb/pyxb/bundles/wssplat
    changing mode of build/scripts-2.7/pyxbgen from 644 to 755
    changing mode of build/scripts-2.7/pyxbwsdl from 644 to 755
    changing mode of build/scripts-2.7/pyxbdump from 644 to 755
    warning: no files found matching 'MANIFEST'
    no previously-included directories found matching 'pyxb/bundles/core/schemas'
    no previously-included directories found matching 'pyxb/bundles/core/remote'
    no previously-included directories found matching 'pyxb/bundles/wssplat/schemas'
    no previously-included directories found matching 'pyxb/bundles/wssplat/remote'
    no previously-included directories found matching 'pyxb/bundles/opengis/schemas'
    no previously-included directories found matching 'pyxb/bundles/opengis/remote'
    no previously-included directories found matching 'doc/_build'
    warning: no previously-included files found matching 'doc/*.eap'
    no previously-included directories found matching 'doc/W3C'
    warning: no previously-included files matching '*~' found anywhere in distribution
    changing mode of /path/to/bin/pyxbdump to 755
    changing mode of /path/to/bin/pyxbgen to 755
    changing mode of /path/to/bin/pyxbwsdl to 755
Successfully installed pyxb
Cleaning up...

2. Grab your XSD and convert to Python

Using your freshly created pyxbgen command line utility, you can grab any online XSD file and write it out to your localhost as a Python file:

$ pyxbgen -u http://www.w3.org/Submission/wadl/wadl.xsd -m wadl
WARNING:pyxb.binding.generate:Attribute None.value renamed to value_
Python for http://wadl.dev.java.net/2009/02 requires 1 modules
$ ls -la wadl*
-rw-r--r--  1 rnewman  staff  115686 Apr  3 11:50 wadl.py

If you take a peek at wadl.py you will see it has a series of Python classes and methods for validation. Cool!

From the PyXB docs:

The -u parameter identifies a schema document describing contents of a namespace. The parameter may be a path to a file on the local system, or a URL to a network-accessible location like http://www.weather.gov/forecasts/xml/DWMLgen/schema/DWML.xsd. The -m parameter specifies the name to be used by the Python module holding the bindings generated for the namespace in the preceding schema…

3. Integrate into Django

We now need to make sure that any XML that is inserted into our database gets validated using our shiny new WADL Python schema validator. We do this by creating some custom validation in our Django apps admin.py file. This is nicely documented (as usual) in the Django docs.

Firs we have to create a custom clean method, which takes the standard name <clean_fieldname>, where our model fieldname is wadl_raw, then we use PyXB’s CreateFromDocument method in our Python version of the WADL XSD. Catch exceptions in the usual way and use Django’s built in forms.ValidationError to return a meaningful message back to the admin interface:

from django.contrib import admin
from django import forms
# Models
from webservicedoc.models import Webservicedoc
# Utilities
import pyxb # For catching the exceptions
import wadl # The Python-version WADL XSD validator
"""
Custom Validation For WADLs
"""
class MyWebservicedocAdminForm(forms.ModelForm):
    class Meta:
        model = Webservicedoc
    def clean_wadl_raw(self):
        # Custom WADL validation
        wadl_raw = self.cleaned_data['wadl_raw']
        try:
            this_wadl = wadl.CreateFromDocument(wadl_raw)
        except pyxb.UnrecognizedContentError as e:
            raise forms.ValidationError("Error validating response: %s" % e.details())
        except Exception, e:
            raise forms.ValidationError("Unknown validation error: %s" % e)
        return wadl_raw

We then call this as our form in the main Admin section:

"""
Customize Admin
"""
class WebservicedocAdmin(admin.ModelAdmin):
    form = MyWebservicedocAdminForm
    save_as = True
    # lots more unrelated custom admin such as field definitions

Then we register this with the core Django admin app:

admin.site.register(Webservicedoc, WebservicedocAdmin)

And that’s it! When someone tries to submit some raw XML that does not pass our WADL.xsd schema validation, a standard inline Django admin error is returned. No 500 error. Just pure Django goodness.

You could argue for the use of Django’s signals modules (pre_save(), etc) but this is overkill IMHO.

blog comments powered by Disqus

Slug

using-pyxb-django-validate-xml-docs-xsd-schemas