XML Schema Essentials

XML schema definition languages are based on the recommendations of the World Wide Web Consortium W3C . They use XML 1.0 syntax & aim to explicit describe the structure of XML documents & constrain the data which they may contain. They offer a distinct improvement on the more limited schema features offered by the Document Type Definition DTD recommendation which formed part of the original XML specification released in 1998. The most widely used schema language is the one defined by the W3C in 2001 W3C XML Schema. However there are alternatives such as RELAX NG & Schematron.

Schema documents are the successors to DTDs & overcome some key limitations associated with them. Firstly DTDs do not support data types. Secondly DTDs do not support namespaces. Thirdly DTDs do not allow developers to accurately define the number of permitted occurrences of elements within their parent element.

An XML schema describes the structure of an XML instance document by defining what each element must or may contain. An element is limited by its type. For example an element of complex type can contain child elements & attributes whereas a simple-type element can only contain text. The diagram below gives a first look at the types of XML Schema elements.

Schema documents have three main purposes. Firstly they can be used to validate XML documents. Secondly they can be used as a dictionary or grammar for the creation of a given class of XML document. And thirdly they can be used to provide documentation for XML documents.

An XML schema is itself an XML document & it contains definitions of all elements & attributes permitted in a class of XML documents. The schema also specifies the structure or hierarchy to which elements must adhere & the type of content each particular element may contain. Elements may be of the simple or complex type. Complex type elements may contain child elements as well as attributes. Simple type element may only contain data. XML documents using a particular schema are referred to as instances of the schema. An XML instance that correctly adheres to its associated schema is said to be valid.

Validation is usually the principal role of schema documents. Validation offers many benefits. It ensures the consistency of data within a document. It ensures that data has the right structure & internal hierarchy. It ensures that data within the document structure is of the correct type. It allows us to receive data from multiple sources.

Most XML documents are produced by programs & scripts written to extract information held in databases & transform it into XML. However it also possible for human beings to create XML documents. Schemas can be used during this process to assist in the document creation process. XML schemas also provide a mechanism for documenting XML documents & form an important part of the specification of XML vocabularies.

The author is trainer & developer with Macresource Computer Training a UK IT training company offering XML & XSLT training classes in London & throughout the UK.