Automata for XML

Validating XML DTDs (Document Type Definitions), navigating and query processing become difficult when we work not only with tags but also data values. Abstractly this involves reasoning about languages over infinite (data) alphabets.

Inspired by XML applications, recent research has started looking for automata theoretic studies of data languages: these are sets of words over an alphabet Sigma X D, where Sigma is a finite alphabet (as usual), but D is an infinite set of data values, over which equality is decidable. An example instance is the XPath query: find all books by Thomas with the same editor.

While such automata are typically too expressive (undecidable emptiness problem, for instance), there have been some encouraging results recently. In this lecture, we survey automata over infinite alphabets designed for this purpose