http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Overview

Compiler design

Whitespace
xsl:sort
Keys
Comment design

lang()
Unparsed entities

Runtime

Internal DOM
Namespaces

Translet & TrAX

To-do list

Compiler overview
 

The input stylesheet is parsed using the SAX 1-based parser from Sun's Project X:

  • com.sun.xml.parser.Parser

This parser builds a DOM from the stylesheet document, and hands this DOM over to the compiler. The compiler uses its own specialised parser to parse XPath expressions and patterns:

  • com.sun.xslt.compiler.XPathParser

Both parsers are encapsulated in XSLTC's parser class:

  • com.sun.xslt.compiler.Parser

Building an Abstract Syntax Tree
 

The SAX parser builds a standard W3C DOM from the source stylesheet. This DOM does not contain all the information needed to represent the whole stylesheet. ( Remember that XSL is two languages; XML and XPath. The DOM only covers XML. ) The compiler uses the DOM to build an abstract syntax tree (AST) that contains all the nodes from the DOM, plus additional nodes for the XPath expressions.

Mapping stylesheets elements to Java classes
 

Each XSL element is represented by a class in the com.sun.xslt.compiler package. The Parser class contains a Hashtable that that maps XSL instructions to classes that inherit from a common parent class 'Instruction' (which again inherits from 'SyntaxTreeNode'). This mapping is set up in the initClasses() method:

    private void initStdClasses() {
	try {
	    initStdClass("template",    "Template");
	    initStdClass("param",       "Param");
	    initStdClass("with-param",  "WithParam");
	    initStdClass("variable",    "Variable");
	    initStdClass("output",      "Output");
	    :
	    :
	    :
	}
    }

    private void initClass(String elementName, String className)
	throws ClassNotFoundException {
	_classes.put(elementName,
		     Class.forName(COMPILER_PACKAGE + '.' + className));
    }

Building a DOM tree from the input XSL file
 

The parser instanciates a DOM that holds the input XSL stylesheet. The DOM can only handle XML files and will not break up and identify XPath patterns/expressions (these are stored as attributes to the various nodes in the tree) or calls to XSL functions(). Each XSL instruction gets its own node in the DOM, and the XPath patterns/expressions are stored as attributes of these nodes. A stylesheet looking like this:

    <xsl:stylesheet .......>
      <xsl:template match="chapter">
        <xsl:text>Chapter</xsl:text>
        <xslvalue-of select=".">
      </xsl:template>
    </xsl>stylesheet>

will be stored in the DOM as indicated in the following picture:


Figure 1: DOM containing XSL stylesheet

The pattern 'match="chapter"' and the expression 'select="."' are stored as attributes for the nodes 'xsl:template' and 'xsl:value-of' respectively. These attributes are accessible through the DOM interface.


Creating the Abstract Syntax Tree from the DOM
 

What we have to do next is to create a tree that also holds the XSL specific elements; XPath expressions and patterns (with possible filters) and calls to XSL functions. This is done by parsing the DOM and creating an instance of a subclass of 'SyntaxTreeNode' for each node in the DOM. A node in the DOM containing an XSL instruction (for example, "xsl:template") results in an instance of the correspoding class derived from the HashTable created by the parser (in this case in instance of the 'Template' class).

Each class that inherits SyntaxTreeNode has a vector called '_contents' that holds references to all the children of the node (if any). Each node has a method called 'parseContents()'. It is the responsibility of this method to parse any XPath expressions/patterns that are expected and found in the node's attributes. The XPath patterns and instructions are tokenised using the auto-generated class 'XPathParser' (generated using JavaCup and JLex). The tokenised expressions/patterns will result in a small sub-tree owned by the syntax tree node.

XSL nodes holding expressions has a pointer called '_select' that points to a sub-tree representing the expression. This can be seen for instance in the 'Template' class:


Fiugre 2: Sample Abstract Syntax Tree

In this example _select only points to a single node. In more complex expressions the pointer will point to an whole sub-tree.



Type-check and Cast Expressions
 

In many cases we will need to typecast the top node in the expression sub-tree to suit the expected result-type of the expression, or to typecast child nodes to suit the allowed types for the various operators in the expression. This is done by calling 'typeCheck()' on the root-node in the XSL tree. Each SyntaxTree node is responsible for its own type checking (ie. the typeCheck() method must be overridden). Let us say that our pattern was:

<xsl:value-of select="1+2.73"/>


Figure 3: XPath expression type conflict

The number 1 is an integer, and the number 2.73 is a real number, so the 1 has to be promoted to a real. This is done ny inserting a new node between the [1] and the [+]. This node will convert the 1 to a real number:


Figure 4: Type casting

The inserted node is an object of the class CastExpr. The SymbolTable that was instanciated in (1) is used to determine what casts are needed for the various operators and what return types the various expressions will have.


Code generation
 

A general rule is that all classes that represent elements in the XSL tree/document, i.e., classes that inherit from SyntaxTreeNode, output bytecode in the 'translate()' method.

Compiling top-level elements
 

The bytecode that handles top-level elements must be generated before any other code. The 'translate()' method in these classes are mainly called from these methods in the Stylesheet class:

    private String compileBuildKeys(ClassGenerator classGen);
    private String compileTopLevel(ClassGenerator classGen, Enumeration elements);
    private void compileConstructor(ClassGenerator classGen, Output output);

These methods handle most top-level elements, such as global variables and parameters, <xsl:output> and <xsl:decimal-format> instructions.


Compiling template code
 

All XPath patterns in <xsl:apply-template> instructions are converted into numeric values (known as the pattern's kernel 'type'). All templates with identical pattern kernel types are grouped together and inserted into a table with its assigned type. (This table is found in the Mode class. There will be one such table for each mode that is used in the stylesheet). This table is used to build a big switch() statement in the translet's applyTemplates() method. This method is initially called with the root node of the input document.

The applyTemplates() method determines the node's type and passes this type to the switch() statement to look up the matching template.

There may be several templates that share the same pattern kernel type. Here are a few examples of templates with patterns that all have the same kernel type:

    <xsl:template match="A/C">
    <xsl:template match="A/B/C">
    <xsl:template match="A | C">

All these templates will be grouped under the type for <C> and will all get the same kernel type (the type for "C"). The last template will be grouped both under "C" and "A". If the type identifier for "C" in this case is 8, all these templates will be put under case 8: in applyTemplates()'s big switch() statement. The Mode class will insert extra code to choose which template code to invoke.


Compiling XSL instructions and functions
 

The template code is generated by calling translate() on each Template object in the abstract syntax tree. This call will be propagated down the tree and every element will output the bytecodes necessary to complete its task.

Each node will call 'translate()' on its children, and possibly on objects representing the node's XPath expressions, before outputting its own bytecode. In that way the correct sequence of instructions is generated. Each one of the child nodes is responsible of creating code that leaves the node's output value (if any) on the stack. The typical procedure for the parent node is to create code that consumes these values off the stack and then leave its own output on the stack for its parent.

The tree-structure of the stylesheet is in this way closely tied with the stack-based JVM. The design does not offer any obvious way of extending the compiler to output code for other VMs or processors.




Copyright © 2001 The Apache Software Foundation. All Rights Reserved.