Original version of PlasTeX documentation rendered using PlasTeX and Parempi-renderer See http://www.cs.helsinki.fi/group/parempi/LICENSE for license plasTeX — A Python Framework for Processing LaTeX Documents Kevin D. Smith 20 November 2006 Contents chapter {\numberline {1}Introduction chapter {\numberline {2}\program {plastex} --- The Command-Line Interface 2.1 Command-Line and Configuration Options 2.1.1 General Options 2.1.2 Document Properties 2.1.3 Counters 2.1.4 Document Links 2.1.5 Input and Output Files 2.1.6 Image Options chapter {\numberline {3}The plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Document 3.1 Sections 3.1.1 Navigation and Links 3.1.2 Table of Contents 3.2 Paragraphs 3.3 Complex Structures 3.3.1 Lists 3.3.2 Bibliography 3.3.3 Arrays and Tabular Environments subsubsection {Borders subsubsection {Alignments subsubsection {Longtables 3.3.4 Indexes chapter {\numberline {4}Understanding Macros and Packages 4.1 Defining Macros in \LaTeX 4.2 Defining Macros in Python 4.2.1 Python Classes subsubsection {The \texttt {args} Attribute subsubsection {The \texttt {invoke} Method subsubsection {The \texttt {digest} Method subsubsection {Other Nifty Methods and Attributes paragraph {The \texttt {level} attribute paragraph {The \texttt {macroName} attribute paragraph {The \texttt {counter} attribute paragraph {The \texttt {ref} attribute paragraph {The \texttt {title} attribute paragraph {The \texttt {fullTitle} attribute paragraph {The \texttt {tocEntry} attribute paragraph {The \texttt {fullTocEntry} attribute paragraph {The \texttt {style} attribute paragraph {The \texttt {id} attribute paragraph {The \texttt {source} attribute paragraph {The \texttt {currentSection} attribute paragraph {The \texttt {expand} method paragraph {The \texttt {paragraphs} method 4.2.2 INI Files 4.2.3 The Document Context 4.3 Packages chapter {\numberline {5}Renderers 5.1 Simple Renderer Example 5.1.1 Extending the Simple Renderer 5.2 Renderable Objects 5.2.1 Determining the Correct Rendering Method 5.2.2 Generating Files 5.2.3 Generating Images 5.2.4 Generating Vector Images 5.2.5 Static Images 5.3 Page Template Renderer 5.3.1 Defining and Using Templates subsubsection {Template Overrides 5.3.2 Defining and Using Themes 5.3.3 Zope Page Template Tutorial subsubsection {Template Attribute Language Expression Syntax (TALES) paragraph {path: operator paragraph {exists: operator paragraph {nocall: operator paragraph {not: operator paragraph {string: operator paragraph {python: operator paragraph {stripped: operator subsubsection {Template Attribute Language (TAL) Attributes paragraph {tal:define paragraph {tal:condition paragraph {tal:repeat paragraph {tal:content paragraph {tal:replace paragraph {tal:attributes paragraph {tal:omit-tag 5.4 XHTML Renderer 5.4.1 Themes 5.5 tBook Renderer 5.6 DocBook Renderer chapter {\numberline {6}plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Frameworks and APIs 6.1 \texttt {plasTeX} --- The Python Macro and Document Interfaces 6.1.1 Macro Objects 6.2 \texttt {plasTeX.ConfigManager} --- plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Configuration 6.2.1 ConfigManager Objects 6.2.2 ConfigSection Objects 6.2.3 Configuration Option Types 6.3 \texttt {plasTeX.DOM} --- The plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Document Object Model (DOM) 6.3.1 plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ vs. XML 6.3.2 Node Objects 6.3.3 DocumentFragment Objects 6.3.4 Element Objects 6.3.5 Text Objects 6.3.6 Document Objects 6.3.7 Command Objects 6.3.8 Environment Objects 6.3.9 TeXFragment Objects 6.3.10 TeXDocument Objects 6.4 \texttt {plasTeX.TeX} --- The T\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Stream 6.4.1 TeX Objects 6.5 \texttt {plasTeX.Context} --- The T\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Context 6.5.1 Context Objects 6.6 \texttt {plasTeX.Renderers} --- The plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Rendering Framework 6.6.1 Renderer Objects 6.6.2 Renderable MixIn 6.7 \texttt {plasTeX.Imagers} --- The plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ Imaging Framework 6.7.1 Imager Objects 6.7.2 Image Objects chapter {\numberline {A}About This Document chapter {\numberline {B}Frequently Asked Questions B.1 Parsing \LaTeX B.1.1 How can I make plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ work with my complicated macros? B.1.2 How can I get plasT\kern -.1667em\lower .5ex\hbox {E}\kern -.125emX\spacefactor \@m \ to find my \LaTeX \ packages? 1 Introduction plasis a collection of Python frameworks that allow you to process documents. This processing includes, but is not limited to, conversion of documents to various document formats. Of course, it is capable of converting to HTML or XML formats such as DocBook and tBook, but it is an open framework that allows you to drive any type of rendering. This means that it could be used to drive a COM object that creates a MS Word Document. The plasframework allows you to control all of the processes including tokenizing, object creation, and rendering through API calls. You also have access to all of the internals such as counters, the states of “if” commands, locally and globally defined macros, labels and references, etc. In essence, it is a document processor that gives you the advantages of an XML document in the context of a language as superb as Python. Here are some of the main features and benefits of plas. The API for processing a document is simple enough that you can write a to HTML converter in one line of code (not including the Python import lines). Just to prove it, here it is! import sys from plasTeX.TeX import TeX from plasTeX.Renderers.XHTML import Renderer Renderer().render(TeX(file=sys.argv[-1]).parse()) The configuration object included with plascan be extended to include your own options. The tokenizer in plasworks very much like the tokenizer in itself. In your macro classes, you can actually control the draining of tokens and even change category codes. While most other converters translate from source another type of markup, plasactually converts the document into a document object very similar to the DOM used in XML. Of course, there are many Python constructs built on top of this object to make it more Pythonic, so you don’t have to deal with the objects using only DOM methods. What’s really nice about this is that you can actually manipulate the document object prior to rendering. While this may be an esoteric feature, not many other converters let you get between the parser and the renderer. In plas  you get full control over the renderer. There is a Zope Page Template (ZPT) based renderer included for HTML and XML applications, but that is merely an example of what you can do. A renderer is simply a collection of functions[fn1]. During the rendering process, each node in the document object is passed to the function in the renderer that has the same name as the node. What that function does is up to the renderer. In the case of the ZPT-based renderer, the node is simply applied to the template using the expand() method. If you don’t like ZPT, there is nothing preventing you from populating a renderer with functions that invoke other types of templates, or functions that simply generate markup with print statements. You could even drive a COM interface to create a MS Word document. [fn1]“functions” is being used loosely here. Actually, any callable Python object (i.e. function, method, or any object with the __call__ method implemented) can be used. 2 {bf plastex} — The Command-Line Interface While plasmakes it possible to parse directly from Python code, most people will simply use the supplied command-line interface, {bf plastex}. {bf plastex} will invoke the parsing processes and apply a specified renderer. By default, {bf plastex} will convert to HTML, although this can be changed in the {bf plastex} configuration. Invoking {bf plastex} is very simple. To convert a document to HTML using all of the defaults, simply type the following at shell prompt. plastex mylatex.tex where ‘mylatex.tex’ is the name of your file. The source will be parsed, all packages will be loaded and macros expanded, and converted to HTML. Hopefully, at this point you will have a lovely set of HTML files that accurately reflect the source document. Unfortunately, converting to other formats can be tricky, and there are many pitfalls. If you are getting warnings or errors while converting your document, you may want to check the FAQ in the appendix to see if your problem is addressed. Running {bf plastex} with the default options may not give you output exactly the way you had envisioned. Luckily, there are many options that allow you to change the rendering behavior. These options are described in the following section. 2.1 Command-Line and Configuration Options There are many options to {bf plastex} that allow you to control things input and output file encodings, where files are generated and what the filenames look like, rendering parameters, etc. While {bf plastex} is the interface where the options are specified, for the most part these options are simply passed to the parser and renderers for their use. It is even possible to create your own options for use in your own Python-based macros and renderers. The following options are currently available on the {bf plastex} command. They are categorized for convenience. 2.1.1 General Options {bf Command-Line Options:} {bf --config=config-file} or {bf -c config-file} {bf Config File:} [ general ] config specifies a configuration file to load. This should be the first option specified on the command-line. {bf Command-Line Options:} {bf --kpsewhich=program} {bf Config File:} [ general ] kpsewhich {bf Default:} kpsewhich specifies the {bf kpsewhich} program to use to locate files and packages. {bf Command-Line Options:} {bf --renderer=renderer-name} {bf Config File:} [ general ] renderer {bf Default:} XHTML specifies which renderer to use. {bf Command-Line Options:} {bf --theme=theme-name} {bf Config File:} [ general ] theme {bf Default:} default specifies which theme to use. {bf Command-Line Options:} {bf --copy-theme-extras} or {bf --ignore-theme- extras} {bf Config File:} [ general ] copy-theme-extras {bf Default:} yes indicates whether or not extra files that belong to a theme (if there are any) should be copied to the output directory. 2.1.2 Document Properties{label0} {bf Command-Line Options:} {bf --base-url=url} {bf Config File:} [ document ] base-url specifies a base URL to prepend to the path of all links. {bf Command-Line Options:} {bf --index-columns=integer} {bf Config File:} [ document ] index-columns specifies the number of columns to group the index into. {bf Command-Line Options:} {bf --sec-num-depth=integer} {bf Config File:} [ document ] sec-num-depth {bf Default:} 6 specifies the section level depth that should appear in section numbers. This value overrides the value of the secnumdepth counter in the document. {bf Command-Line Options:} {bf --title=string} {bf Config File:} [ document ] title specifies a title to use for the document instead of the title given in the source document {bf Command-Line Options:} {bf --toc-depth=integer} {bf Config File:} [ document ] toc-depth specifies the number of levels to include in each table of contents. {bf Command-Line Options:} {bf --toc-non-files} {bf Config File:} [ document ] toc-non-files specifies that sections that do not create files should still appear in the table of contents. By default, only sections that create files will show up in the table of contents. 2.1.3 Counters It is possible to set the initial value of a counter from the command-line using the {bf --counter} option or the “counters” section in a configuration file. The configuration file format for setting counters is very simple. The option name in the configuration file corresponds to the counter name, and the value is the value to set the counter to. [counters] chapter=4 part=2 The sample configuration above sets the chapter counter to 4, and the part counter to 2. The {bf --counter} can also set counters. It accepts multiple arguments which must be surrounded by square brackets ([]). Each counter set in the {bf --counter} option requires two values: the name of the counter and the value to set the counter to. An example of {bf --counter} is shown below. plastex --counter [ part 2 chapter 4 ] file.tex Just as in the configuration example, this command-line sets the part counter to 2, and the chapter counter to 4. {bf Command-Line Options:} {bf --counter=[ counter-name initial-value ]} specifies the initial counter values. 2.1.4 Document Links{label1} The links section of the configuration is a little different than the others. The options in the links section are not preconfigured, they are all user-specified. The links section includes information to be included in the navigation object available on all sections in a document. By default, the section’s navigation object includes things like the previous and next objects in the document, the child nodes, the sibling nodes, etc. The table below lists all of the navigation objects that are already defined. The names for these items came from the link types defined at . Of course, it is up to the renderer to actually make use of them. {bf Name}&{bf Description} | -------------------------------------------------------------- home&the first section in the document | start&same as home | begin&same as home | first&same as home | end&the last section in the document | last&same as end | next&the next section in the document | prev&the previous section in the document | previous&same as prev | up&the parent section | top&the top section in the document | origin&same as top | parent&the parent section | child&a list of the subsections | siblings&a list of the sibling sections | document&the document object | part&the current part object | chapter&the current chapter object | section&the current section object | subsection&the current subsection object | navigator&the top node in the document object | toc&the node containing the table of contents | contents&same as toc | breadcrumbs&a list of the parent objects of the current node | Since each of these items references an object that is expected to have a URL and a title, any user-defined fields should contain these as well (although the URL is optional in some items). To create a user-defined field in this object, you need to use two options: one for the title and one for the URL, if one exists. They are specified in the config file as follows: [links] next-url=http://myhost.com/glossary next-title=The Next Document mylink-title=Another Title These option names are split on the dash (-) to create a key, before the dash, and a member, after the dash. A dictionary is inserted into the navigation object with the name of the key, and the members are added to that dictionary. The configuration above would create the following Python dictionary. { 'next': { 'url':'http://myhost.com/glossary', 'title':'The Next Document' }, 'mylink': { 'title':'Another Title' } } While you can not override a field that is populated by the document, there are times when a field isn’t populated. This occurs, for example, in the prev field at the beginning of the document, or the next field at the end of the document. If you specify a prev or next field in your configuration, those fields will be used when no prev or next is available. This allows you to link to external documents at those points. {bf Command-Line Options:} {bf --links=[ key optional-url title ]} specifies links to be included in the navigation object. Since at least two values are needed in the links (key and title, with an optional URL), the values are grouped in square brackets on the command-line ([]). 2.1.5 Input and Output Files{label2} If you have a renderer that only generates one file, specifying the output filename is simple: use the {bf --filename} option to specify the name. However, if the renderer you are using generates multiple files, things get more complicated. The {bf --filename} option is also capable of handling multiple names, as well as giving you a templating way to build filenames. Below is a list of all of the options that affect filename generation. {bf Command-Line Options:} {bf --bad-filename-chars=string} {bf Config File:} [ files ] bad-chars {bf Default:} :#$%^&*!~‘"’=?/[]()|<>;\,. specifies all characters that should not be allowed in a filename. These characters will be replaced by the value in {bf --bad-filename-chars-sub}. {bf Command-Line Options:} {bf --bad-filename-chars-sub}=string {bf Config File:} [ files ] bad-chars-sub {bf Default:} - specifies a string to use in place of invalid filename characters ( specified by the {bf --bad-chars-sub} option) {bf Command-Line Options:} {bf --dir=directory} or {bf -d directory} {bf Config File:} [ files ] directory {bf Default:} $jobname specifies a directory name to use as the output directory. {bf Command-Line Options:} {bf --escape-high-chars} {bf Config File:} [ files ] escape-high-chars {bf Default:} False some output types allow you to represent characters that are greater than 7-bits with an alternate representation to alleviate the issue of file encoding. This option indicates that these alternate representations should be used. {bf Note:} The renderer is responsible for doing the translation into the alternate format. This might not be supported by all output types. {bf Command-Line Options:} {bf --filename=string} {bf Config File:} [ files ] filename specifies the templates to use for generating filenames. The filename template is a list of space separated names. Each name in the list is returned once. An example is shown below. index.html toc.html file1.html file2.html If you don’t know how many files you are going to be reproducing, using static filenames like in the example above is not practical. For this reason, these filenames can also contain variables as described in Python’s string Templates (e.g. $title, $id). These variables come from the namespace created in the renderer and include: $id, the ID (i.e. label) of the item, $title, the title of the item, and $jobname, the basename of the file being processed. One special variable is $num. This value in generated dynamically whenever a filename with $num is requested. Each time a filename with $num is successfully generated, the value of $num is incremented. The values of variables can also be modified by a format specified in parentheses after the variable. The format is simply an integer that specifies how wide of a field to create for integers (zero-padded), or, for strings, how many space separated words to limit the name to. The example below shows $num being padded to four places and $title being limited to five words. sect$num(4).html $title(5).html The list can also contain a wildcard filename (which should be specified last). Once a wildcard name is reached, it is used from that point on to generate the remaining filenames. The wildcard filename contains a list of alternatives to use as part of the filename indicated by a comma separated list of alternatives surrounded by a set of square brackets ([ ]). Each of the alternatives specified is tried until a filename is successfully created (i.e. all variables resolve). For example, the specification below creates three alternatives. $jobname_[$id, $title, sect$num(4)].html The code above is expanded to the following possibilities. $jobname_$id.html $jobname_$title.html $jobname_sect$num(4).html Each of the alternatives is attempted until one of them succeeds. In order for an alternative to succeed, all of the variables referenced in the template must be populated. For example, the $id variable will not be populated unless the node had a \$label macro pointing to it. The title variable would not be populated unless the node had a title associated with it (e.g. such as section, subsection, etc.). Generally, the last one should contain no variables except for $num as a fail-safe alternative. {bf Command-Line Options:} {bf --input-encoding=string} {bf Config File:} [ files ] input-encoding {bf Default:} utf-8 specifies which encoding the source file is in {bf Command-Line Options:} {bf --output-encoding=string} {bf Config File:} [ files ] output-encoding {bf Default:} utf-8 specifies which encoding the output files should use. {bf Note:} This depends on the output format as well. While HTML and XML use encodings, a binary format like MS Word, would not. {bf Command-Line Options:} {bf --split-level=integer} {bf Config File:} [ files ] split-level {bf Default:} 2 specifies the highest section level that generates a new file. Each section in a document has a number associated with its hierarchical level. These levels are -2 for the document, -1 for parts, 0 for chapters, 1 for sections, 2 for subsections, 3 for subsubsections, 4 for paragraphs, and 5 for subparagraphs. A new file will be generated for every section in the hierarchy with a value less than or equal to the value of this option. This means that for the value of 2, files will be generated for the document, parts, chapters, sections, and subsections. 2.1.6 Image Options{label3} Images are created by renderers when the output type in incapable of rendering the content in any other way. This method is commonly used to display equations in HTML output. The following options control how images are generated. {bf Command-Line Options:} {bf --image-base-url=url} {bf Config File:} [ images ] base-url specifies a base URL to prepend to the path of all images. {bf Command-Line Options:} {bf --image-compiler=program} {bf Config File:} [ images ] compiler {bf Default:} latex specifies which program to use to compile the images document. {bf Command-Line Options:} {bf --enable-images} or {bf --disable-images} {bf Config File:} [ images ] enabled {bf Default:} yes indicates whether or not images should be generated. {bf Command-Line Options:} {bf --enable-image-cache} or {bf --disable-image- cache} {bf Config File:} [ images ] cache {bf Default:} yes indicates whether or not images should use a cache between runs. {bf Command-Line Options:} {bf --imager=program} {bf Config File:} [ images ] imager {bf Default:} dvipng dvi2bitmap gsdvipng gspdfpng OSXCoreGraphics specifies which converter will be used to take the output from the compiler and convert it to images. You can specify a space delimited list of names as well. If a list of names is specified, each one is verified in order to see if it works on the current machine. The first one that succeeds is used. You can use the value of “none” to turn the imager off. {bf Command-Line Options:} {bf --image-filenames=filename-template} {bf Config File:} [ images ] filenames {bf Default:} images/img-$num(4).png specifies the image naming template to use to generate filenames. This template is the same as the templates used by the {bf --filename} option. {bf Command-Line Options:} {bf --vector-imager=program} {bf Config File:} [ images ] vector-imager {bf Default:} dvisvgm specifies which converter will be used to take the output from the compiler and convert it to vector images. You can specify a space delimited list of names as well. If a list of names is specified, each one is verified in order to see if it works on the current machine. The first one that succeeds is used. You can use the value of “none” to turn the vector imager off. {bf Note:} When using the vector imager, a bitmap image is also created using the regular imager. This bitmap is used to determine the depth information about the vector image and can also be used as a backup if the vector image is not supported by the viewer. 3 The plasDocument{label4} The plasdocument is very similar to an XML DOM structure. In fact, you can use XML DOM methods to create and populate nodes, delete or move nodes, etc. The biggest difference between the plasdocument and an XML document is that in XML the attributes of an element are simply string values, whereas attributes in a plasdocument are generally document fragments that contain the arguments of a macro. Attributes can be canfigured to hold other Python objects like lists, dictionaries, and strings as well (see the section {ref5} for more information). While XML document objects have a very strict syntax, documents are a little more free-form. Because of this, the plasframework does a lot of normalizing of the document to make it conform to a set of rules. This set of rules means that you will always get a consistent output document which is necessary for easy manipulation and programability. The overall document structure should not be surprising. There is a document element at the top level which corresponds to the XML Document node. The child nodes of the Document node begin with the preamble to the document. This includes things like the \documentclass, \newcommands, \title, \author, counter settings, etc. For the most part, these nodes can be ignored. While they are a useful part of the document, they are generally only used by internal processes in plas. What is important is the last node in the document which corresponds to ’s document environment. The document environment has a very simple structure. It consists solely of paragraphs (actually \pars in ’s terms) and sections[fn2]. In fact, all sections have this same format including parts, chapters, sections, subsections, subsubsections, paragraphs, and subparagraphs. plascan tell which pieces of a document correspond to a sectioning element by looking at the level attribute of the Python class that corresponds to the given macro. The section levels in plasare the same as those used by : -1 for part, 0 for chapter, 1 for section, etc. You can create your own sectioning commands simply by subclassing an existing macro class, or by setting the level attribute to a value that corresponds to the level of section you want to mimic. All level values less than 100 are reserved for sectioning so you aren’t limited to ’s sectioning depth. Figure {ref6} below shows an example of the overall document structure. [fn2]“sections” in this document is used loosely to mean any type of section: part, chapter, section, etc. {Figure 3.1: The overall plasdocument structure{label6} (docstructure)(Page )} This document is constructed during the parsing process by calling the digest method on each node. The digest method is passed an iterator of document nodes that correspond to the nodes in the document that follow the current node. It is the responsibility of the current node to only absorb the nodes that belong to it during the digest process. Luckily, the default digest method will work in nearly all cases. See section {ref5} for more information on the digestion process. Part of this digestion process is grouping nodes into paragraphs. This is done using the paragraphs method available in all Macro based classes. This method uses the same technique as to group paragraphs of content. Section {ref7} has more information about the details of paragraph grouping. In addition to the level attribute of sections, there is also a mixin class that assists in generating the table of contents and navigation elements during rendering. If you create your own sectioning commands, you should include plasTeX.Base.LaTeX.Sectioning.SectionUtils as a base class as well. All of the standard section commands already inherit from this class, so if you subclass one of those, you’ll get the helper methods for free. For more information on these helper methods see section {ref8}. The structure of the rest of the document is also fairly simple and well- defined. commands are each converted into a document node with it’s arguments getting placed into the attributes dictionary. environments also create a single node in the document, where the child nodes of the environment include everything between the \begin and \end commands. By default, the child nodes of an environment are simply inserted in the order that they appear in the document. However, there are some environments that require further processing due to their more complex structures. These structures include arrays and tabular environments, as well as itemized lists. For more information on these structures see sections {ref9} and {ref10}, respectively. Figures {ref11} and {ref12} shows a common document fragment and the resulting plasdocument node structure. {begin{center}Every textbf{good} boy does textit{fine}.end{center}Figure 3.2: Sample document fragment code{label11}} {Figure 3.3: Resulting plasdocument node structure{label12} (docfrag)(Page )} You may have noticed that in the document structure in Figure {ref12} the text corresponding to the argument for \textbf and \textit is actually a child node and not an attribute. This is actually a convenience feature in plas. For macros like this where there is only one argument and that argument corresponds to the content of the macro, it is common to put that content into the child nodes. This is done in the args attribute of the macro class by setting the argument’s name to “self”. This magical value will link the attribute called “self” to the child nodes array. For more information on the args attribute and how it populates the attributes dictionary see section {ref5}. In the plasframework, the input document is parsed and digested until the document is finished. At this point, you should have an output document that conforms to the rules described above. The document should have a regular enough structure that working with it programatically using DOM methods or Python practices should be fairly straight-forward. The following sections give more detail on document structure elements that require extra processing beyond the standard parse-digest process. 3.1 Sections{label8} “Sections” in plasrefer to any macro that creates a section-like construct in a document including the document environment, \part, \chapter, \section, \subsection, \subsubsection, \paragraph, and \subparagraph. While these are the sectioning macros defined by , you are not limited to using just those commands to create sections in your own documents. There are two elements that must exist for a Python macro class to act like a section: 1) the level attribute must be set to a value less than 100, and 2) the class should inherit from plasTeX.Base.LaTeX.Sectioning.SectionUtils. The level attribute refers to the section level in the document. The values for this attribute are the same values that uses for its section levels, namely: corresponds to \part corresponds to \chapter corresponds to \section corresponds to \subsection corresponds to \subsubsection corresponds to \paragraph corresponds to \subparagraph plasadds the following section related levels: corresponds to the document environment and is always the top-level section this level was added to correspond to the sixth level of headings defined in HTML flag that indicates the last possible section nesting level. This is mainly used for internal purposes. plasuses the level attribute to build the appropriate document structure. If all you need is a proper document structure, the level attribute is the only thing that needs to be set on a macro. However, there are many convenience properties in the plasTeX.Base.LaTeX.Sectioning.SectionUtils class that are used in the rendering process. If you plan on rendering your document, your section classes should inherit from this class. Below is a list of the additional properties and their purpose. {bf Name}&{bf Purpose} | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- allSections&contains a sequential list of all of the sections within andincluding the current section | documentSections&contains a sequential list of all of the sections withinthe entire document | links&contains a dictionary contain various amounts of navigationinformation corresponding mostly to the link types described at . Thisincludes things like breadcrumb trails, previous and next links, links tothe overall table of contents, etc. See section {ref13} for moreinformation. | siblings&contains a list of all of the sibling sections | subsections&contains a list of all of the sections within the currentsection | tableofcontents&contains an object that corresponds to the table of contentsfor the section. The table of contents is configurable as well. For moreinformation on how to configure the table of contents see section {ref14} | {bf Note:} When first accessed, each of these properties actually navigates the document and builds the returned object. Since these operations can be rather costly, the values are cached. Therefore, if you modify the document after accessing one of these properties you will not see the change reflected. 3.1.1 Navigation and Links{label13} The plasTeX.Base.LaTeX.Sectioning.SectionUtils class has a property named links that contains a dictionary of many useful objects that assist in creating navigation bars and breadcrumb trails in the rendered output. This dictionary was modeled after the links described at . Some of the objects in this dictionary are created automatically, others are created with the help of the linkType attribute on the document nodes, and yet others can be added manually from a configuration file or command-line options. The automatically generated values are listed in the following table. {bf Name}&{bf Purpose} | ---------------------------------------------------------------------------------------------------------- begin&the first section of the document | breadcrumbs&a list containing the entire parentage of the current section(including the current section) | chapter&the current chapter node | child&a list of the subsections | contents&the section that contains the top-level table of contents | document&the document level node | end&the last section of the document | first&the first section of the document | home&the first section of the document | home&the first section of the document | last&the last section of the document | navigator&the section that contains the top-level table of contents | next&the next section in the document | origin&the section that contains the top-level table of contents | parent&the parent node | part&the current part node | prev&the previous section in the document | previous&the previous section in the document | section&the current section | sibling&a list of the section siblings | subsection&the current subsection | start&the first section of the document | toc&the section that contains the top-level table of contents | top&the first section of the document | up&the parent section | {bf Note:} The keys in every case are simply strings. {bf Note:} Each of the elements in the table above is either a section node or a list of section nodes. Of course, once you have a reference to a node you can acces the attributes and methods of that object for further introspection. An example of accessing these objects from a section instance is shown below. previousnode = sectionnode.links['prev'] nextnode = sectionnode.links['next'] The next method of populating the links table is semi-automatic and uses the linkType attribute on the Python macro class. There are certain parts of a document that only occur once such as an index, glossary, or bibliography. You can set the linkType attribute on the Python macro class to a string that corresponds to that sections role in the document (i.e. ‘index’ for the index, ‘glossary’ for the glossary, ‘bibliography’ for the bibliography). When a node with a special link type is created, it is inserted into the dictionary of links with the given name. This allows you to have links to indexes, glossaries, etc. appear in the links object only when they are in the current document. The example below shows the theindex environment being configured to show up under the ‘index’ key in the links dictionary. class theindex(Environment, SectionUtils): nodeType = 'index' level = Environment.SECTION_LEVEL {bf Note:} These links are actually stored under the ‘links’ key of the owner document’s userdata dictionary (i.e. self.ownerDocument.userdata[’links’]). Other objects can be added to this dictionary manually. The final way of getting objects into the links dictionary is through a configuration file or command-line options. This method is described fully in section {ref1}. 3.1.2 Table of Contents{label14} The table of contents object returned by the tableofcontents property of SectionUtils is not an actual node of the document, but it is a proxy object that limits the number of levels that you can traverse. The number of levels that you are allowed to traverse is determined by document:toc-depth section of the configuration (see section {ref0}). Other than the fact that you can only see a certain number of levels of subsections, the object otherwise acts just like any other section node. In addition to limiting the number of levels of a table of contents, you can also determine whether or not sections that do not generate new files while rendering should appear in the table of contents. By default, only sections that generate a new file while rendering will appear in the table of contents object. If you set the value of document:toc-non-files in the configuration to True, then all sections will appear in the table of contents. 3.2 Paragraphs{label7} Paragraphs in a plasdocument are grouped in the same way that they are grouped in : essentially anything within a section that isn’t a section itself is put into a paragraph. This is different than the HTML model where tables and lists are not grouped into paragraphs. Because of this, it is likely that HTML generated that keeps the same paragraph model will not be 100% valid. However, it is highly unlikely that this variance from validity will cause any real problems in the browser rendering the correct output. Paragraphs are grouped using the paragraphs method available on all Python macro classes. When this method is invoked on a node, all of the child nodes are grouped into paragraphs. If there are no paragraph objects in the list of child nodes already, one is created. This is done to make sure that the document is fully normalized and that paragraphs occur everywhere that they can occur. This is most noteworthy in constructs like tables and lists where some table cells or list items have multiple paragraphs and others do not. If a paragraph weren’t forced into these areas, you could have inconsistently paragraph-ed content. Some areas where paragraphs are allowed, but not necessarily needed might not want the forced paragraph to be generated, such as within a grouping of curly braces ({}). In these cases, you can use the force=False keyword argument to paragraphs. This still does paragraph grouping, but only if there is a paragraph element already in the list of child nodes. 3.3 Complex Structures{label15} While much of a plasdocument mirrors the structure of the source document, some constructs do require a little more work to be useful in the more rigid structure. The most noteworthy of these constructs are lists, arrays (or tabular environments), and indexes. These objects are described in more detail in the following sections. 3.3.1 Lists{label10} Lists are normalized slightly more than the rest of the document. They are treated almost like sections in that they are only allowed to contain a minimal set of child node types. In fact, lists can only contain one type of child node: list item. The consequence of this is that any content before the first item in a list will be thrown out. In turn, list items will only contain paragraph nodes. The structure of all list structures will look like the structure in Figure {ref16}. {Figure 3.4: Normalized structure of all lists{label16} (liststruct)(Page )} This structure allows you to easily traverse a list with code like the following. # Iterate through the items in the list node for item in listnode: # Iterate through the paragraphs in each item for par in item: # Print the text content of each paragraph print par.textContent # Print a blank line to separate each item print 3.3.2 Bibliography The bibliography is really just another list structure with a few enhancements to allow referencing of the items throughout the document. Bibliography processing is left to the normal tools. plas expects a properly ‘.bbl’ file for the bibliography. The bibliography is the format used by default; however, the natbib package is also included with plasfor more complex formatting of bibliographies. 3.3.3 Arrays and Tabular Environments{label9} Arrays and tabular environments are the most complex structures in a plasdocument. This because tables can include spanning columns, spanning rows, and borders specified on the table, rows, and individual cells. In addition, there are alignments associated with each column and alignments can be specified by any \multicolumn command. It is also possible with some packages to create your own column declarations. Add to that the fact that the longtable package allows you to specify multiple headers, footers, and coptions, and you can see why tabular environments can be rather tricky to deal with. As with all parts of the document, plastries to normalize all tables to have a consistent structure. The structure for arrays and tables is shown in Figure {ref17}. {Figure 3.5: Normalized structure of all tables and arrays{label17} (tablestruct)(Page )} Luckily, the array macro class that comes with plaswas made to handle all of the work for you. In fact, it also handles the work of some extra packages such as longtable to make processing them transparent. The details of the tabular environments are described in the following sections. With this normalized structure, you can traverse all array and table structures with code like the following. # Iterate through all rows in the table for row in tablenode: # Iterate through all cells in the row for cell in row: # Iterate through all paragraphs in the cell for par in cell: # Print the text content of each cell print ' ' + par.textContent # Print a blank line after each cell print # Print a blank line after each row print 3.3.3.1 Borders Borders in a tabular environment are generally handled by \hline, \vline, \cline, as well as the column specifications on the tabular environment and the \multicolumn command. plasmerges all of the border specifications and puts them into CSS formatted values in the style attribute of each of the table cell nodes. To get the CSS information formatted such that it can be used in an inline style, simply access the inline property of the style object. Here is an example of a tabular environment. \begin{tabular}{|l|l|}\hline x & y \\ 1 & 2 \\\hline \end{tabular} The table node can be traversed as follows. # Print the CSS for the borders of each cell for rownum, row in enumerate(table): for cellnum, cell in enumerate(row): print '(%s,%s) %s -- %s' % (rownum, cellnum, cell.textContent.strip(), cell.style.inline) The code above will print the following output (whitespace has been added to make the output easier to read). (0,0) x -- border-top-style:solid; border-left:1px solid black; border-right:1px solid black; border-top-color:black; border-top-width:1px; text-align:left (0,1) y -- border-top-style:solid; text-align:left; border-top-color:black; border-top-width:1px; border-right:1px solid black (1,0) 1 -- border-bottom-style:solid; border-bottom-width:1px; border-left:1px solid black; border-right:1px solid black; text-align:left; border-bottom-color:black (1,1) 2 -- border-bottom-color:black; border-bottom-width:1px; text-align:left; border-bottom-style:solid; border-right:1px solid black 3.3.3.2 Alignments Alignments can be specified in the column specification of the tabular environment as well as in the column specification of \multicolumn commands. Just like the border information, the alignment information is also stored in CSS formatted values in each cell’s style attribute. 3.3.3.3 Longtables Longtables are treated just like regular tables. Only the first header and the last footer are supported in the resulting table structure. To indicate that these are verifiable header or footer cells, the isHeader attribute of the corresponding cells is set to True. This information can be used by the renderer to more accurately represent the table cells. 3.3.4 Indexes All index building and sorting is done internally in plas. It is done this way because the information that tools like {bf makeindex} generate is only useful to itself since the refence to the place where the index tag was inserted is simply a page number. Since plaswants to be able to reference the index tag node, it has to do all of the index processing natively. There are actually two index structures. The default structure is simply the index nodes sorted and grouped into the appropriate hierarchies. This structure looks like the structure pictured in Figure {ref18}. {Figure 3.6: Default index structure{label18} (defaultindex)(Page )} Each item, subitem, and subsubitem has an attribute called key that contains a document fragment of the key for that index item. The document nodes that this key corresponds to are held in a list in the pages attribute. These nodes are the actual nodes corresponding to the index entry macros from the document. The content of the node is a number corresponding to the index entry that is formatted according to the formatting rules specified in the index entry. While the structure above works well for paged media, it is sometimes nice to have the index entries grouped by first letter and possibly even arranged into multiple columns. This alternate representation can be accessed in the groups property. The structure for this type of index is shown in Figure {ref19}. {Figure 3.7: Grouped index structure{label19} (groupedindex)(Page )} In this case, the item, subitem, and subsubitem nodes are the same as in the default scheme. The group has a title attribute that contains the first letter of the entries in that group. Entries that start with something other than a letter or an underscore are put into a group called “Symbols”. The columns are approximately equally sized columns of index entries. The number of columns is determined by the document:index-columns configuration item. 4 Understanding Macros and Packages{label5} Macros and packages in plaslive a dual life. On one hand, macros can be defined in files and expanded by plasitself. On the other hand, macros can also be implemented as Python classes. Packages are the same way. plascan handle some packages natively. Others may have to be implemented in Python. In most cases, both implementations work transparently together. If you don’t define that many macros, and the ones that you do define are simple or even of intermediate complexity, it’s probably better to just let plashandle them natively. However, there are some reasons that you may want to implement Python versions of your macros: - Python versions of macros are generally faster - You have more control over what gets inserted into the output document - You can store information in the document’s userdata dictionary for use later - You can prevent a macro from being expanded into primitive commands, so that a custom renderer can be used on that node - Some macros just don’t make sense in a plasdocument - Some macros are just too complicated for plas If any of these reasons appeal to you, read the following sections on how to implement macros and packages in plas. 4.1 Defining Macros in Defining macros in using plasis no different than the way you would normally define you macros; however, there is a trick that you can use to improve you macros for plas, if needed. While plas can handle fairly complicated macros, some macros might do things that don’t make sense in the context of a plasdocument, or they might just be too complicated for the plasengine to handle. In cases such as these, you can use the \ifplastex construct. As you may know in , you can define your own \if commands using the \newif primitive. There is an \if command called \ifplastex built into the plasengine that is always set to true. In you document, you can define this command and set it to false (as far as is concerned) as follows. \newif\ifplastex \plastexfalse Now you can surround the portions of your macros that plashas trouble with, or even write alternative versions of the macro for and plas. Here is an example. \newcommand{\foo}[1]{ \ifplastex\else\vspace*{0.25in}\fi \textbf{\Large{#1}} \ifplastex\else\vspace*{1in}\fi } \ifplastex \newenvironment{coolbox}{}{} \else \newenvironment{coolbox} {fbox\bgroup\begin{minipage}{5in}} {\end{minipage}\egroup} \fi 4.2 Defining Macros in Python Defining macros using Python classes (or, at least through Python interfaces) is done in one of three ways: INI files, Python classes, and the document context. These three methods are described in the following sections. 4.2.1 Python Classes{label20} Both command and environments can be implemented in Python classes. plasincludes a base class for each one: Command for commands and Environment for environments. For the most part, these two classes behave in the same way. They both are responsible for parsing their arguments, organizing their child nodes, incrementing counters, etc. much like their counterparts. The Python macro class feature set is based on common conventions. So if the macro you are implementing in Python uses standard conventions, you job will be very easy. If you are doing unconventional operations, you will probably still succeed, you just might have to do a little more work. The three most important parts of the Python macro API are: 1) the args attribute, 2) the invoke method, and 3) the digest method. When writing your own macros, these are used the most by far. 4.2.1.1 The args Attribute The args attribute is a string attribute on the class that indicates what the arguments to the macro are. In addition to simply indicating the number of arguments, whether they are mandatory or optional, and what characters surround the argument as in , the args string also gives names to each of the argument and can also indicate the content of the argument (i.e. int, float, list, dictionary, string, etc.). The names given to each argument determine the key that the argument is stored under in the the attributes dictionary of the class instance. Below is a simple example of a macro class. from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} """ args = '[ width ] [ pos ] text' In the args string of the \framebox macro, three arguments are defined. The first two are optional and the third one is mandatory. Once each argument is parsed, in is put into the attributes dictionary under the name given in the args string. For example, the attributes dictionary of an instance of \framebox will have the keys “width”, “pos”, and “text” once it is parsed and can be accessed in the usual Python way. self.attributes['width'] self.attributes['pos'] self.attributes['text'] In plas, any argument that isn’t mandatory (i.e. no grouping characters in the args string) is optional[fn3]. This includes arguments surrounded by parentheses (()), square brackets ([]), and angle brackets (<>). This also lets you combine multiple versions of a command into one macro. For example, the \framebox command also has a form that looks like: \framebox(x_dimen,y_dimen)[pos]{text}. This leads to the Python macro class in the following code sample that encompasses both forms. [fn3]While this isn’t always true when expands the macros, it will not cause any problems when plascompiles the document because plasis less stringent. from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} or \framebox(x_dimen,ydimen)[pos]{text} """ args = '( dimens ) [ width ] [ pos ] text' The only thing to keep in mind is that in the second form, the pos attribute is going to end up under the width key in the attributes dictionary since it is the first argument in square brackets, but this can be fixed up in the invoke method if needed. Also, if an optional argument is not present on the macro, the value of that argument in the attributes dictionary is set to None. As mentioned earlier, it is also possible to convert arguments to data types other than the default (a document fragment). A list of the available types is shown in the table below. {bf Name}&{bf Purpose} | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- str&expands all macros then sets the value of the argument in the attributesdictionary to the string content of the argument | chr&same as ‘str’ | char&same as ‘str’ | cs&sets the attribute to an unexpanded control sequence | label&expands all macros, converts the result to a string, then sets thecurrent label to the object that is in the currentlabel attribute of thedocument context. Generally, an object is put into the currentlabelattribute if it incremented a counter when it was invoked. The value storedin the attributes dictionary is the string value of the argument. | id&same as ‘label’ | idref&expands all macros, converts the result to a string, retrieves theobject that was labeled by that value, then adds the labeled object to theidref dictionary under the name of the argument. This type of argument isused in commands like \ref that must reference other abjects. The nice thingabout ‘idref’ is that it gives you a reference to the object itself whichyou can then use to retrieve any type of information from it such as thereference value, title, etc. The value stored in the attributes dictionaryis the string value of the argument. | ref&same as ‘idref’ | nox&just parses the argument, but doesn’t expand the macros | list&converts the argument to a Python list. By default, the list itemseparator is a comma (,). You can change the item separator in the argsstring by appending a set of parentheses surrounding the separator characterimmediately after ‘list’. For example, to specify a semi-colon separatedlist for an argument called “foo” you would use the args string:“foo:list(;)”. It is also possible to cast the type of each item byappending another colon and the data type from this table that you want eachitem to be. However, you are limited to one data type for every item in thelist. | dict&converts the argument to a Python dictionary. This is commonly used byarguments set up using ’s ‘keyval’ package. By default, key/value pairs areseparated by commas, although this character can be changed in the same wayas the delimiter in the ‘list’ type. You can also cast each value of thedictionary using the same method as the ‘list’ type. In all cases, keys areconverted to strings. | dimen&reads a dimension and returns an instance of dimen | dimension&same as ‘dimen’ | length&same as ‘dimen’ | number&reads an integer and returns a Python integer | count&same as ‘number’ | int&same as ‘number’ | float&reads a decimal value and returns a Python float | double&same as ‘float’ | There are also several argument types used for more low-level routines. These don’t parse the typical arguments, they are used for the somewhat more free-form arguments. {bf Name}&{bf Purpose} | ----------------------------------------------------------------- Dimen&reads a dimension and returns an instance of dimen | Length&same as ‘Dimen’ | Dimension&same as ‘Dimen’ | MuDimen&reads a mu-dimension and returns an instance of mudimen | MuLength&same as ‘MuDimen’ | Glue&reads a glue parameter and returns an instance of glue | Skip&same as ‘MuLength’ | Number&reads a integer parameter and returns a Python integer | Int&same as ‘Number’ | Integer&same as ‘Number’ | Token&reads an unexpanded token | Tok&same as ‘Token’ | XToken&reads an expanded token | XTok&same as ‘XToken’ | Args&reads tokens up to the first begin group (i.e. {) | To use one of the data types, simple append a colon (:) and the data type name to the attribute name in the args string. Going back to the \framebox example, the argument in parentheses would be better represented as a list of dimensions. The width parameter is also a dimension, and the pos parameter is a string. from plasTeX import Command, Environment class framebox(Command): """ \framebox[width][pos]{text} or \framebox(x_dimen,ydimen)[pos]{text} """ args = '( dimens:list:dimen ) [ width:dimen ] [ pos:chr ] text' 4.2.1.2 The invoke Method The invoke method is responsible for creating a new document context, parsing the macro arguments, and incrementing counters. In most cases, the default implementation will work just fine, but you may want to do some extra processing of the macro arguments or counters before letting the parsing of the document proceed. There are actually several methods in the API that are called within the scope of the invoke method: preParse, preArgument, postArgument, and postParse. The order of execution is quite simple. Before any arguments have been parsed, the preParse method is called. The preArgument and postArgument methods are called before and after each argument, respectively. Then, after all arguments have been parsed, the postParse method is called. The default implementations of these methods handle the stepping of counters and setting the current labeled item in the document. By default, macros that have been “starred” (i.e. have a ‘*’ before the arguments) do not increment the counter. You can override this behavior in one of these methods if you prefer. The most common reason for overriding the invoke method is to post-process the arguments in the attributes dictionary, or add information to the instance. For example, the \color command in ’s color package could convert the color to the correct CSS format and add it to the CSS style object. from plasTeX import Command, Environment def latex2htmlcolor(arg): if ',' in arg: red, green, blue = [float(x) for x in arg.split(',')] red = min(int(red * 255), 255) green = min(int(green * 255), 255) blue = min(int(blue * 255), 255) else: try: red = green = blue = float(arg) except ValueError: return arg.strip() return '#%.2X%.2X%.2X' % (red, green, blue) class color(Environment): args = 'color:str' def invoke(self, tex): a = Environment.invoke(tex) self.style['color'] = latex2htmlcolor(a['color']) While simple things like attribute post-processing is the most common use of the invoke method, you can do very advanced things like changing category codes, and iterating over the tokens in the processor directly like the verbatim environment does. One other feature of the invoke method that may be of interest is the return value. Most invoke method implementations do not return anything (or return None). In this case, the macro instance itself is sent to the output stream. However, you can also return a list of tokens. If a list of tokens is returned, instead of the macro instance, those tokens are inserted into the output stream. This is useful if you don’t want the macro instance to be part of the output stream or document. In this case, you can simply return an empty list. 4.2.1.3 The digest Method The digest method is responsible for converting the output stream into the final document structure. For commands, this generally doesn’t mean anything since they just consist of arguments which have already been parsed. Environments, on the other hand, have a beginning and an ending which surround tokens that belong to that environment. In most cases, the tokens between the \begin and \end need to be absorbed into the childNodes list. The default implementation of the digest method should work for most macros, but there are instances where you may want to do some extra processing on the document structure. For example, the \caption command within figures and tables uses the digest method to populate the enclosing figure/table’s caption attribute. from plasTeX import Command, Environment class Caption(Command): args = '[ toc ] self' def digest(self, tokens): res = Command.digest(self, tokens) # Look for the figure environment that we belong to node = self.parentNode while node is not None and not isinstance(node, figure): node = node.parentNode # If the figure was found, populate the caption attribute if isinstance(node, figure): node.caption = self return res class figure(Environment): args = '[ loc:str ]' caption = None class caption_(Caption): macroName = 'caption' counter = 'figure' More advanced uses of the digest method might be to construct more complex document structures. For example, tabular and array structures in a document get converted from a simple list of tokens to complex structures with lots of style information added (see section {ref9}). One simple example of a digest that does something extra is shown below. It looks for the first node with the name “item” then bails out. from plasTeX import Command, Environment class toitem(Command): def digest(self, tokens): """ Throw away everything up to the first 'item' token """ for tok in tokens: if tok.nodeName == 'item': # Put the item back into the stream tokens.push(tok) break One of the more advanced uses of the digest is on the sectioning commands: \section, \subsection, etc. The digest method on sections absorb tokens based on the level attribute which indicates the hierarchical level of the node. When digested, each section absorbs all tokens until it reaches a section that has a level that is equal to or higher than its own level. This creates the overall document structure as discussed in section {ref4}. 4.2.1.4 Other Nifty Methods and Attributes There are many other attributes and methods on macros that can be used to affect their behavior. For a full listing, see the API documentation in section {ref21}. Below are descriptions of some of the more commonly used attributes and methods. 4.2.1.4.1 The level attribute The level attribute is an integer that indicates the hierarchical level of the node in the output document structure. The values of this attribute are taken from : \part is -1, \chapter is 0, \section is 1, \subsection is 2, etc. To create your owne sectioning commands, you can either subclass one of the existing sectioning macros, or simply set its level attribute to the appropriate number. 4.2.1.4.2 The macroName attribute The macroName attribute is used when you are creating a macro whose name is not a legal Python class name. For example, the macro \@ifundefined has a ‘@’ in the name which isn’t legal in a Python class name. In this case, you could define the macro as shown below. class ifundefined_(Command): macroName = '@ifundefined' 4.2.1.4.3 The counter attribute The counter attribute associates a counter with the macro class. It is simply a string that contains the name of the counter. Each time that an instance of the macro class is invoked, the counter is incremented (unless the macro has a ‘*’ argument). 4.2.1.4.4 The ref attribute The ref attribute contains the value normally returned by the \ref command. 4.2.1.4.5 The title attribute The title attribute retrieves the “title” attribute from the attributes dictionary. This attribute is also overridable. 4.2.1.4.6 The fullTitle attribute The same as the title attribute, but also includes the counter value at the beginning. 4.2.1.4.7 The tocEntry attribute The tocEntry attribute retrieves the “toc” attribute from the attributes dictionary. This attribute is also overridable. 4.2.1.4.8 The fullTocEntry attribute The same as the tocEntry attribute, but also includes the counter value at the beginning. 4.2.1.4.9 The style attribute The style attribute is a CSS style object. Essentially, this is just a dictionary where the key is the CSS property name and the value is the CSS property value. It has an attribute called inline which contains an inline version of the CSS properties for use in the style= attribute of HTML elements. 4.2.1.4.10 The id attribute This attribute contains a unique ID for the object. If the object was labeled by a \label command, the ID for the object will be that label; otherwise, an ID is generated. 4.2.1.4.11 The source attribute The source attribute contains the source representation of the node and all of its contents. 4.2.1.4.12 The currentSection attribute The currentSection attribute contains the section that the node belongs to. 4.2.1.4.13 The expand method The expand method is a thin wrapper around the invoke method. It simply invokes the macro and returns the result of expanding all of the tokens. Unlike invoke, you will always get the expanded node (or nodes); you will not get a None return value. 4.2.1.4.14 The paragraphs method The paragraphs method does the final processing of paragraphs in a node’s child nodes. It makes sure that all content is wrapped within paragraph nodes. This method is generally called from the digest method. 4.2.2 INI Files{label22} Using INI files is the simplest way of creating customized Python macro classes. It does require a little bit of knowledge of writing macros in Python classes (section {ref20}), but not much. The only two pieces of information about Python macro classes you need to know are 1) the args string format, and 2) the superclass name (in most cases, you can simply use Command or Environment). The INI file features correspond to Python macros in the following way. {bf INI File}&{bf Python Macro Use} | ----------------------------------------------- section name&the Python class to inherit from | option name&the name of the macro to create | option value&the args string for the macro | Here is an example of an INI file that defines several macros. [Command] ; \program{ self } program=self ; \programopt{ self } programopt=self [Environment] ; \begin{methoddesc}[ classname ]{ name { args } ... \end{methoddesc} methoddesc=[ classname ] name args ; \begin{memberdesc}[ classname ]{ name { args } ... \end{memberdesc} memberdesc=[ classname ] name args [section] ; \headi( options:dict )[ toc ]{ title } headi=( options:dict ) [ toc ] title [subsection] ; \headii( options:dict )[ toc ]{ title } headii=( options:dict ) [ toc ] title In the INI file above, six macro are being defined. \program and \programopt both inherit from Command, the generic macro superclass. They also both take a single mandatory argument called “self.” There are two environments defined also: methoddesc and memberdesc. Each of these has three arguments where the first argument is optional. The last two macros actually inherit from standard sectioning commands. They add an option, surrounded by parentheses, to the options that \section and \subsection already had defined. INI versions of plaspackages are loaded much in the same way as Python plaspackages. For details on how packages are loaded, see section {ref23}. 4.2.3 The Document Context{label24} It is possible to define commands using the same interface that is used by the plasengine itself. This interface belongs to the Context object (usually accessed through the document object’s context attribute). Defining commands using the context object is generally done in the ProcessOptions function of a package. The following methods of the context object create new commands. {bf Method}&{bf Purpose} | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ newcounter&creates a new counter, and also creates a command called\thecounter which generates the formatted version of the counter. This macrocorresponds to the \newcounter macro in . | newcount&corresponds to ’s \newcount command. | newdimen&corresponds to ’s \newdimen command. | newskip&corresponds to ’s \newskip command. | newmuskip&corresponds to ’s \newmuskip command. | newif&corresponds to ’s \newif command. This command also generates macrosfor \ifcommandtrue and \ifcommandfalse. | newcommand&corresponds to ’s \newcommand macro. | newenvironment&corresponds to ’s \newenvironment macro. | newdef&corresponds to ’s \def command. | chardef&corresponds to ’s \chardef command. | {bf Note:} Since many of these methods accept strings containing markup, you need to remember that the category codes of some characters can be changed during processing. If you are defining macros using these methods in the ProcessOptions function in a package, you should be safe since this function is executed in the preamble of the document where category codes are not changed frequently. However, if you define a macro with this interface in a context where the category codes are not set to the default values, you will have to adjust the markup in your macros accordingly. Below is an example of using this interface within the context of a package to define some commands. For the full usage of these methods see the API documentation of the Context object in section {ref25}. def ProcessOptions(options, document): context = document.context # Create some counters context.newcounter('secnumdepth', initial=3) context.newcounter('tocdepth', initial=2) # \newcommand{\config}[2][general]{\textbf{#2:#1} context.newcommand('config', 2, r'\textbf{#2:#1}', opt='general') # \newenvironment{note}{\textbf{Note:}}{} context.newenvironment('note', 0, (r'\textbf{Note:}', r'')) 4.3 Packages{label23} Packages in plasare loaded in one of three ways: standard package, Python package, and INI file. packages are loaded in much the same way that itself loads packages. The {bf kpsewhich} program is used to locate the requested file which can be either in the search path of your distribution or in one of the directories specified in the TEXINPUTS environment variable. plasread the file and expand the macros therein just as would do. Python packages are located using Python’s search path. This includes all directories listed in sys.path as well as those listed in the PYTHONPATH environment variable. After a package is loaded, it is checked to see if there is a function called ProcessOptions in its namespace. If there is, that function is called with two arguments: 1) the dictionary of options that were specified when loading the package, and 2) the document object that is currently being processed. This function allows you to make adjustments to the loaded macros based on the options specified, and define new commands in the document’s context (see section {ref24} for more information). Of course, you can also define Python based macros (section {ref20}) in the Python package as well. The last type of packages is based on the INI file format. This format is discussed in more detail in section {ref22}. INI formatted packages are loaded in conjunction with a or Python package. When a package is loaded, an INI file with the same basename is searched for in the same director as the package. If it exists, it is loaded as well. For example, if you had a package called ‘python.sty’ and a file called ‘python.ini’ in the same package directory, ‘python.sty’ would be loaded first, then ‘python.ini’ would be loaded. The same operation applies for Python based packages. 5 Renderers Renderers allow you to convert a plasdocument object into viewable output such as HTML, RTF, or PDF, or simply a data structure format such as DocBook or tBook. Since the plasdocument object gives you everything that you could possibly want to know about the document, it should, in theory, be possible to generate any type of output from the plasdocument object while preserving as much information as the output format is capable of. In addition, since the document object is not affected by the rendering process, you can apply multiple renderers in sequence so that the document only needs to be parsed one time for all output types. While it is possible to write a completely custom renderer, one possible rendeerer implementation is included with the plasframework. While the rendering process in this implementation is fairly simple, it is also very powerful. Some of the main features are listed below. - ability to generate multiple output files - automatic splitting of files is configurable by section level, or can be invoked using ad-hoc methods in the filenameoverride property - powerful output filename generation utility - image generation for portions of the document that cannot be easily rendered in a particular output formate (e.g. equations in HTML) - themeing support - hooks for post-processing of output files - configurable output encodings The API of the renderer itself is very small. In fact, there are only a couple of methods that are of real interest to an end user: render and cleanup. The render method is the method that starts the rendering process. It’s only argument is a plasdocument object. The cleanup method is called at the end of the rendering process. It is passed the document object and a list of all of the files that were generated. This method allows you to do post-processing on the output files. In general, this method will probably only be of interest to someone writing a subclass of the Renderer class, so most users of plaswill only use the render method. The real work of the rendering process is handled in the Renderable class which is discussed later in this chapter. The Renderer class is a subclass of the Python dictionary. Each key in the renderer corresponds to the name of a node in the document object. The value stored under each key is a function. As each node in the document object is traversed, the renderer is queried to see if there is a key that matches the name of the node. If a key is found, the value at that key (which must be a function) is called with the node as its only argument. The return value from this call must be a unicode object that contains the rendered output. Based on the configuration, the renderer will handle all of the file generation and encoding issues. If a node is traversed that doesn’t correspond to a key in the renderer dictionary, the default rendering method is called. The default rendering method is stored in the default attribute. One exception to this rule is for text nodes. The default rendering method for text nodes is actually stored in textDefault. Again, these attributes simply need to reference any Python function that returns a unicode object of the rendered output. The default method in both of these attributes is the unicode built-in function. As mention previously, most of the work of the renderer is actually done by the Renderable class. This is a mixin class[fn4] that is mixed into the Node class in the render method. It is unmixed at the end of the render method. The details of the Renderable class are discussed in section {ref26}. [fn4] A mixin class is simply a class that is merely a collection of methods that are intended to be included in the namespace of another class. 5.1 Simple Renderer Example It is possible to write a renderer with just a couple of methods: default and textDefault. The code below demonstrates how one might create a generic XML renderer that simply uses the node names as XML tag names. The text node renderer escapes the <, >, and & characters. import string from plasTeX.Renderers import Renderer class Renderer(Renderer): def default(self, node): """ Rendering method for all non-text nodes """ s = [] # Handle characters like \&, \$, \%, etc. if len(node.nodeName) == 1 and node.nodeName not in string.letters: return self.textDefault(node.nodeName) # Start tag s.append('<%s>' % node.nodeName) # See if we have any attributes to render if node.hasAttributes(): s.append('') for key, value in node.attributes.items(): # If the key is 'self', don't render it # these nodes are the same as the child nodes if key == 'self': continue s.append('<%s>%s' % (key, unicode(value), key)) s.append('') # Invoke rendering on child nodes s.append(unicode(node)) # End tag s.append('' % node.nodeName) return u'\n'.join(s) def textDefault(self, node): """ Rendering method for all text nodes """ return node.replace('&','&').replace('<','<').replace('>','>') To use the renderer, simply parse a document and apply the renderer using the render method. # Import renderer from previous code sample from MyRenderer import Renderer from plasTeX.TeX import TeX # Instantiate a TeX processor and parse the input text tex = TeX() tex.ownerDocument.config['files']['split-level'] = -100 tex.ownerDocument.config['files']['filename'] = 'test.xml' tex.input(r''' \documentclass{book} \begin{document} Previous paragraph. \section{My Section} \begin{center} Centered text with <, >, and \& charaters. \end{center} Next paragraph. \end{document} ''') document = tex.parse() # Render the document renderer = Renderer() renderer.render(document) The output from the renderer, located in ‘test.xml’, looks like the following. Previous paragraph.
None <*modifier*>None My Section
Centered text with <, >, and & charaters.
Next paragraph.
5.1.1 Extending the Simple Renderer Now that we have a simple renderer working, it is very simple to extend it to do more specific operations. Let’s say that the default renderer is fine for most nodes, but for the \section node we want to do something special. For the section node, we want the title argument to correspond to the title attribute in the output XML[fn5]. To do this we need a method like the following. [fn5]This will only work properly in XML if the content of the title is plain text since other nodes will generate markup. def handle_section(node): return u'\n\n<%s title="%s">\n%s\n\n' % \ (node.nodeName, unicode(node.attributes['title']), unicode(node), node.nodeName) Now we simply insert the rendering method into the renderer under the appropriate key. Remember that the key in the renderer should match the name of the node you want to render. Since the above rendering method will work for all section types, we’ll insert it into the renderer for each sectioning command. renderer = Renderer() renderer['section'] = handle_section renderer['subsection'] = handle_section renderer['subsubsection'] = handle_section renderer['paragraph'] = handle_section renderer['subparagraph'] = handle_section renderer.render(document) Running the same document as in the previous example, we now get this output. Previous paragraph.
Centered text with <, >, and & charaters.
Next paragraph.
Of course, you aren’t limited to using just Python methods. Any function that accepts a node as an argument can be used. The Zope Page Template (ZPT) renderer included with plasis an example of how to write a renderer that uses a templating language to render the nodes (see section {ref27}). 5.2 Renderable Objects{label26} The Renderable class is the real workhorse of the rendering process. It traverses the document object, looks up the appropriate rendering methods in the renderer, and generates the output files. It also invokes the image generating process when needed for parts of a document that cannot be rendered in the given output format. Most of the work of the Renderable class is done in the __unicode__ method. This is rather convenient since each of the rendering methods in the renderer are required to return a unicode object. When the unicode function is called with a renderable object as its argument, the document traversal begins for that node. This traversal includes iterating through each of the node’s child nodes, and looking up and calling the appropriate rendering method in the renderer. If the child node is configured to generate a new output file, the file is created and the rendered output is written to it; otherwise, the rendered output is appended to the rendered output of previous nodes. Once all of the child nodes have been rendered, the unicode object containing that output is returned. This recursive process continues until the entire document has been rendered. There are a few useful things to know about renderable objects such as how they determine which rendering method to use, when to generate new files, what the filenames will be, and how to generate images. These things are discussed below. 5.2.1 Determining the Correct Rendering Method Looking up the correct rendering method is quite straight-forward. If the node is a text node, the textDefault attribute on the renderer is used. If it is not a text node, then the node’s name determines the key name in the renderer. In most cases, the node’s name is the same name as the macro that created it. If the macro used some type of modifier argument (i.e. *, +, -), a name with that modifier applied to it is also searched for first. For example, if you used the tablular* environment in your document, the renderer will look for “tabular*” first, then “tabular”. This allows you to use different rendering methods for modified and unmodified macros. If no rendering method is found, the method in the renderer’s default attribute is used. 5.2.2 Generating Files Any node in a document has the ability to generate a new file. During document traversal, each node is queried for a filename. If a non-None is returned, a new file is created for the content of that node using the given filename. The querying for the filename is simply done by accessing the filename property of the node. This property is added to the node’s namespace during the mixin process. The default behavior for this property is to only return filenames for sections with a level less than the split- level given in the configuration (see section {ref2}). The filenames generated by this routine are very flexible. They can be statically given names, or names based on the ID and/or title, or simply generically numbered. For more information on configuring filenames see section {ref2}. While the filenaming mechanism is very powerful, you may want to give your files names based on some other information. This is possible through the filenameoverride attribute. If the filenameoverride is set, the name returned by that attribute is used as the filename. The string in filenameoverride is still processed in the same way as the filename specifier in the configuration so that you can use things like the ID or title of the section in the overridden filename. The string used to specify filenames can also contain directory paths. This is not terribly useful at the moment since there is no way to get the relative URLs between two nodes for linking purposes. If you want to use a filename override, but want to do it conditionally you can use a Python property to do this. Just calculate the filename however you wish, if you decide that you don’t want to use that filename then raise an AttributeError exception. An example of this is shown below. class mymacro{Command): args = '[ filename:str ] self' @property def filenameoverride(self): # See if the attributes dictionary has a filename if self.attributes['filename'] is not None: return self.attributes['filename'] raise AttributeError, 'filenameoverride' {bf Note:} The filename in the filenameoverride attribute must contain any directory paths as well as a file extension. 5.2.3 Generating Images Not all output types that you might render are going to support everything that is capable of. For example, HTML has no way of representing equations, and most output types won’t be capable of rendering ’s picture environment. In cases like these, you can let plasgenerate images of the document node. Generating images is done with a subclass of plasTeX.Imagers.Imager. The imager is responsible for creating a document from the requested document fragments, compiling the document and converting each page of the output document into individual images. Currently, there are two Imager subclasses included with plas. Each of them use the standard compiler to generate a DVI file. The DVI file is then converted into images using one of the available imagers (see section {ref3} on how to select different imagers). To generate an image of a document node, simply access the image property during the rendering process. This property will return an plasTeX.Imagers.Image instance. In most cases, the image file will not be available until the rendering process is finished since most renderers will need the generated document to be complete before compiling it and generating the final images. The example below demonstrates how to generate an image for the equation environment. # Import renderer from first renderer example from MyRenderer import Renderer from plasTeX.TeX import TeX def handle_equation(node): return u'
' % node.image.url # Instantiate a TeX processor and parse the input text tex = TeX() tex.input(r''' \documentclass{book} \begin{document} Previous paragraph. \begin{equation} \Sigma_{x=0}^{x+n} = \beta^2 \end{equation} Next paragraph. \end{document} ''') document = tex.parse() # Instantiate the renderer renderer = Renderer() # Insert the rendering method into all of the environments that might need it renderer['equation'] = handle_equation renderer['displaymath'] = handle_equation renderer['eqnarray'] = handle_equation # Render the document renderer.render(document) The rendered output looks like the following, and the image is generated is located in ‘images/img-0001.png’. Previous paragraph.
Next paragraph.
The names of the image files are determined by the document’s configuration. The filename generator is very powerful, and is in fact, the same filename generator used to create the other output filenames. For more information on customizing the image filenames see section {ref3}. In addition, the image types are customizable as well. plasuses the Python Imaging Library (PIL) to do the final cropping and saving of the image files, so any image format that PIL supports can be used. The format that PIL saves the images in is determined by the file extension in the generated filenames, so you must use a file extension that PIL recognizes. It is possible to write your own Imager subclass if necessary. See the Imager API documentation for more information (see {ref28}). 5.2.4 Generating Vector Images If you have a vector imager configured (such as dvisvg or dvisvgm), you can generate a vector version of the requested image as well as a bitmap. The nice thing about vector versions of images is that they can scale infinitely and not loose resolution. The bad thing about them is that they are not as well supported in the real world as bitmaps. Generating a vector image is just as easy as generating a bitmap image, you simply access the vectorImage property of the node that you want an image of. This will return an plasTeX.Imagers.Image instance that corresponds to the vector image. A bitmap version of the same image can be accessed through the image property of the document node or the bitmap variable of the vector image object. Everything that was described about generating images in the previous section is also true of vector images with the exception of cropping. plasdoes not attempt to crop vector images. The program that converts the output to a vector image is expected to crop the image down to the image content. plasuses the information from the bitmap version of the image to determine the proper depth of the vector image. 5.2.5 Static Images There are some images in a document that don’t need to be generated, they simply need to be copied to the output directory and possibly converted to an appropriate formate. This is accomplished with the imageoverride attribute. When the image property is accessed, the imageoverride attribute is checked to see if an image is already available for that node. If there is, the image is copied to the image output directory using a name generated using the same method as described in the previous section. The image is copied to that new filename and converted to the appropriate image format if needed. While it would be possible to simply copy the image over using the same filename, this may cause filename collisions depending on the directory structure that the original images were store in. Below is an example of using imageoverride for copying stock icons that are used throughout the document. from plasTeX import Command class dangericon(Command): imageoverride = 'danger.gif' class warningicon(Command): imageoverride = 'warning.gif' It is also possible to make imageoverride a property so that the image override can done conditionally. In the case where no override is desired in a property implementation, simply raise an AttributeError exception. 5.3 Page Template Renderer{label27} The Page Template (PT) renderer is a renderer for plasdocument objects that supports various page template engines such as Zope Page Templates (ZPT), Cheetah templates, Kid templates, Genshi templates, Python string templates, as well as plain old Python string formatting. It is also possible to add support for other template engines. Note that all template engines except ZPT, Python formats, and Python string templates must be installed in your Python installation. They are not included. ZPT is the most supported page template language at the moment. This is the template engine that is used for all of the plasdelivered templates in the XHTML renderer; however, the other templates work in a very similar way. The actual ZPT implementation used is SimpleTAL (). This implementation implements almost all of the ZPT API and is very stable. However, some changes were made to this package to make it more convenient to use within plas. These changes are discussed in detail in the ZPT Tutorial (see section {ref29}). Since the above template engines can be used to generate any form of XML or HTML, the PT renderer is a general solution for rendering XML or HTML from a plasdocument object. When switching from one DTD to another, you simply need to use a different set of templates. As in all Renderer-based renderers, each key in the PT renderer returns a function. These functions are actually generated when the template files are parsed by the PT renderer. As is the case with all rendering methods, the only argument is the node to be rendered, and the output is a unicode object containing the rendered output. In addition to the rendering methods, the textDefault method escapes all characters that are special in XML and HTML (i.e. <, >, and &). The following sections describe how templates are loaded into the renderer, how to extend the set of templates with your own, as well as a theming mechanism that allows you to apply different looks to output types that are visual (e.g. HTML). 5.3.1 Defining and Using Templates {bf Note:} If you are not familiar with the ZPT language, you should read the tutorial in section {ref29} before continuing in this section. See the links in the previous section for documentation on the other template engines. By default, templates are loaded from the directory where the renderer module was imported from. In addition, the templates from each of the parent renderer class modules are also loaded. This makes it very easy to extend a renderer and add just a few new templates to support the additions that were made. The template files in the module directories can have three different forms. The first is HTML. HTML templates must have an extension of ‘.htm’ or ‘.html’. These templates are compiled using SimpleTAL’s HTML compiler. XML templates, the second form of template, uses SimpleTAL’s XML compiler, so they must be well-formed XML fragments. XML templates must have the file extension ‘.xml’, ‘.xhtml’, or ‘.xhtm’. In any case, the basename of the template file is used as the key to store the template in the renderer. Keep in mind that the names of the keys in the renderer correspond to the node names in the document object. The extensions used for all templating engines are shown in the table below. {bf Engine}&{bf Extension}&{bf Output Type} | --------------------------------------------- ZPT&.html, .htm, .zpt&HTML | &.xhtml, .xhtm, .xml&XML/XHTML | Python string formatting&.pyt&Any | Python string templates&.st&Any | Kid&.kid&XML/XHTML | Cheetah&.che&XML/XHTML | Genshi&.gen&HTML | The file listing below is an example of a directory of template files. In this case the templates correspond to nodes in the document created by the description environment, the tabular environment, \textbf, and \textit. description.xml tabular.xml textbf.html textit.html Since there are a lot of templates that are merely one line, it would be inconvenient to have to create a new file for each template. In cases like this, you can use the ‘.zpts’ extension for collections of ZPT templates, or more generally ‘.pts’ for collections of various template types. Files with this extension have multiple templates in them. Each template is separated from the next by the template metadata which includes things like the name of the template, the type (xml, html, or text), and can also alias template names to another template in the renderer. The following metadata names are currently supported. {bf Name}&{bf Purpose} | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- engine&the name of the templating engine to use. At the time of thiswriting, the value could be zpt, tal (same as zpt), html (ZPT HTMLtemplate), xml (ZPT XML template), python (Python formatted string), string(Python string template), kid, cheetah, or genshi. | name&the name or names of the template that is to follow. This name is usedas the key in the renderer, and also corresponds to the node name that willbe rendered by the template. If more than one name is desired, they aresimply separated by spaces. | type&the type of the template: xml, html, or text. XML templates mustcontain a well-formed XML fragment. HTML templates are more forgiving, butdo not support all features of ZPT (see the SimpleTAL documentation). | alias&specifies the name of another template that the given names should bealiased to. This allows you to simply reference another template to userather than redefining one. For example, you might create a new sectionheading called \introduction that should render the same way as \section. Inthis case, you would set the name to “introduction” and the alias to“section”. | There are also some defaults that you can set at the top of the file that get applied to the entire file unles overridden by the meta-data on a particular template. {bf Name}&{bf Purpose} | ----------------------------------------------------------------------------- default-engine&the name of the engine to use for all templates in the file. | default-type&the default template type for all templates in the file. | The code sample below shows the basic format of a zpts file. name: textbf bfseries bold content name: textit italic content name: introduction introduction* alias: section name: description type: xml
definition term
definition content
The code above is a zpts file that contains four templates. Each template begins when a line starts with “name:”. Other directives have the same format (i.e. the name of the directive followed by a colon) and must immediately follow the name directive. The first template definition actually applies to two types of nodes textbf and bfseries. You can specify ony number of names on the name line. The third template isn’t a template at all; it is an alias. When an alias is specified, the name (or names) given use the same template as the one specified in the alias directive. Notice also that starred versions of a macro can be specified separately. This means that they can use a different template than the un-starred versions of the command. The last template is just a simple XML formatted template. By default, templates in a zpts file use the HTML compiler in SimpleTAL. You can specify that a template is an XML template by using the type directive. Here is an example of using various types of templates in a single file. name: textbf type: python %(self)s name: textit type: string ${self} name: textsc type: cheetah ${here} name: textrm type: kid normal text name: textup type: genshi upcase text There are several variables inserted into the template namespace. Here is a list of the variables and the templates that support them. ------------------------------------------------------------------------------------------------ | {bf Object} | {bf ZPT/Python Formats/String Template} | {bf Cheetah} | {bf Kid/Genshi} | ------------------------------------------------------------------------------------------------ | document node | self or here | here | here | | parent node | container | container | container | | document config | config | config | config | | template instance | template | | | | renderer instance | templates | templates | templates | You’ll notice that Kid and Genshi templates require some extra processing of the variables in order to get the proper markup. By default, these templates escape characters like <, >, and &. In order to get HTML/XML markup from the variables you must wrap them in the code shown in the example above. Hopefully, this limitation will be removed in the future. 5.3.1.1 Template Overrides{label30} It is possible to override the templates located in a renderer’s directory with templates defined elsewhere. This is done using the *TEMPLATES environment variable. The “*” in the name *TEMPLATES is a wildcard and must be replaced by the name of the renderer. For example, if you are using the XHTML renderer, the environment variable would be XHTMLTEMPLATES. For the PageTemplate renderer, the environment variable would be PAGETEMPLATETEMPLATES. The format of this variable is the same as that of the PATH environment variable which means that you can put multiple directory names in this variable. In addition, the environment variables for each of the parent renderers is also used, so that you can use multiple layers of template directories. You can actually create an entire renderer just using overrides and the PT renderer. Since the PT renderer doesn’t actually define any templates, it is just a framework for defining other XML/HTML renderers, you can simply load the PT renderer and set the PAGETEMPLATETEMPLATES environment variable to the locations of your templates. This method of creating renderers will work for any XML/HTML that doesn’t require any special post-processing. 5.3.2 Defining and Using Themes In addition to the templates that define how each node should be rendered, there are also templates that define page layouts. Page layouts are used whenever a node in the document generates a new file. Page layouts generally include all of the markup required to make a complete document of the desired DTD, and may include things like navigation buttons, tables of contents, breadcrumb trails, etc. to link the current file to other files in the document. When rendering files, the content of the node is generated first, then that content is wrapped in a page layout. The page layouts are defined the same way as regular templates; however, they all include “-layout” at the end of the template name. For example the sectioning commands in would use the layout templates “section-layout”, “subsection-layout”, “subsubsection- layout”, etc. Again, these templates can exist in files by themselves or multiply specified in a zpts file. If no layout template exists for a particular node, the template name “default-layout” is used. Since there can be several themes defined within a renderer, theme files are stored in a subdirectory of a renderer directory. This directory is named ‘Themes’. The ‘Themes’ directory itself only contains directories that correspond to the themes themselves where the name of the directory corresponds to the name of the theme. These theme directories generally only consist of the layout files described above, but can override other templates as well. Below is a file listing demonstrating the structure of a renderer with multiple themese. # Renderer directory: contains template files XHTML/ # Theme directory: contains theme directories XHTML/Themes/ # Theme directories: contain page layout templates XHTML/Themes/default/ XHTML/Themes/fancy/ XHTML/Themes/plain/ {bf Note:} If no theme is specified in the document configuration, a theme with the name “default” is used. Since all template directories are created equally, you can also define themes in template directories specified by environment variables as described in section {ref30}. Also, theme files are searched in the same way as regular templates, so any theme defined in a renderer superclass’ directory is valid as well. 5.3.3 Zope Page Template Tutorial{label29} The Zope Page Template (ZPT) language is actually just a set of XML attributes that can be applied to markup of an DTD. These attributes tell the ZPT interpreter how to process the element. There are seven different attributes that you can use to direct the processing of an XML or HTML file (in order of evaluation): define, condition, repeat, content, replace, attributes, and omit-tag. These attributes are described in section {ref31}. For a more complete description, see the official ZPT documentation at . 5.3.3.1 Template Attribute Language Expression Syntax (TALES) The Template Attribute Language Expression Syntax (TALES) is used by the attribute language described in the next section. The TALES syntax is used to evaluate expressions based on objects in the template namespace. The results of these expressions can be used to define variables, produce output, or be used as booleans. There are also several operators used to modify the behavior or interpretation of an expression. The expressions and their modifiers are described below. 5.3.3.1.1 path: operator{label32} A “path” is the most basic form on an expression in ZPT. The basic form is shown below. [path:]string [ | TALES expression ] The path: operator is actually optional on all paths. Leaving it off makes no difference. The “string” in the above syntax is a ’/’ delimited string of names. Each name refers to a property of the previous name in the string. Properties can include attributes, methods, or keys in a dictionary. These properties can in turn have properties of their own. Some examples of paths are shown below. # Access the parentNode attribute of chapter, then get its title chapter/parentNode/title # Get the key named 'foo' from the dictionary bar bar/foo # Call the title method on the string in the variable booktitle booktitle/title It is possible to specify multiple paths separated by a pipe (|). These paths are evaluated from left to right. The first one to return a non-None value is used. # Look for the title on the current chapter node as well as its parents chapter/title | chapter/parentNode/title | chapter/parentNode/parentNode/title # Look for the value of the option otherwise get its default value myoptions/encoding | myoptions/defaultencoding There are a few keywords that can be used in place of a path in a TALES expression as well. {bf Name}&{bf Purpose} | -------------------------------------------------------------------------- nothing&same as None in Python | default&keeps whatever the existing value of the element or attribute is | options&dictionary of values passed in to the template when instatiated | repeat&the repeat variable (see {ref33}) | attrs&dictonary of the original attributes of the element | CONTEXTS&dictionary containing all of the above | 5.3.3.1.2 exists: operator This operator returns true if the path exists. If the path does not exist, the operator returns false. The syntax is as follows. exists:path The “path” in the code above is a path as described in section {ref32}. This operator is commonly combined with the not: operator. 5.3.3.1.3 nocall: operator By default, if a property that is retrieved is callable, it will be called automatically. Using the nocall: operator, prevents this execution from happening. The syntax is shown below. nocall:path 5.3.3.1.4 not: operator The not: operator simply negates the boolean result of the path. If the path is a boolean true, the not: operator will return false, and vice versa. The syntax is shown below. not:path 5.3.3.1.5 string: operator The string: operator allows you to combine literal strings and paths into one string. Paths are inserted into the literal string using a syntax much like that of Python Templates: $path or ${path}. The general syntax is: string:text Here are some examples of using the string: operator. string:Next - ${section/links/next} string:($pagenumber) string:[${figure/number}] ${figure/caption} 5.3.3.1.6 python: operator The python: operator allows you to evaluate a Python expression. The syntax is as follows. python:python-code The “python-code” in the expression above can include any of the Python built-in functions and operators as well as four new functions that correspond to the TALES operators: path, string, exists, and nocall. Each of these functions takes a string containing the path to be evaluated (e.g. path(’foo/bar’), exists(’chapter/title’), etc.). When using Python operators, you must escape any characters that would not be legal in an XML/HTML document (i.e. <>&). For example, to write an expression to test if a number was less than or greater than two numbers, you would need to do something like the following example. # See if the figure number is less than 2 or greater than 4 python: path('figure/number') < 2 or path('figure/number') > 4 5.3.3.1.7 stripped: operator The stripped: operator only exists in the SimpleTAL distribution provided by plas. It evaluates the given path and removes any markup from that path. Essentially, it is a way to get a plain text representation of the path. The syntax is as follows. stripped:path 5.3.3.2 Template Attribute Language (TAL) Attributes{label31} 5.3.3.2.1 tal:define The tal:define attribute allows you to define a variable for use later in the template. Variables can be specifies as local (only for use in the scope of the current element) or global (for use anywhere in the template). The syntax of the define attribute is shown below. tal:define="[ local | global ] name expression [; define-expression ]" The define attributes sets the value of “name” to “expression.” By default, the scope of the variable is local, but can be specified as global by including the “global” keyword before the name of the variable. As shown in the grammar above, you can specify multiple variables in one tal:define attribute by separating the define expressions by semi-colons. Examples of using the tal:define attribute are shown belaw.

...

5.3.3.2.2 tal:condition The tal:condition attribute allows you to conditionally include an element. The syntax is shown below. tal:condition="expression" The tal:condition attribute is very simple. If the expression evaluates to true, the element and its children will be evaluated and included in the output. If the expression evaluates to false, the element and its children will not be evaluated or included in the output. Valid expressions for the tal:condition attribute are the same as those for the expressions in the tal:define attribute.

Caption for paragraph ...

5.3.3.2.3 tal:repeat{label33} The tal:repeat attribute allows you to repeat an element multiple times; the syntax is shown below. tal:repeat="name expression" When the tal:repeat attribute is used on an element, the result of“expression” is iterated over, and a new element is generated for each item in the iteration. The value of the current item is set to “name” much like in the tal:define attribute. Within the scope of the repeated element, another variable is available: repeat. This variable contains several properties related to the loop. {bf Name}&{bf Purpose} | ----------------------------------------------------------------------------------------------------------- index&number of the current iteration starting from zero | number&number of the current iteration starting from one | even&is true if the iteration number is even | odd&is true if the iteration number is odd | start&is true if this is the first iteration | end&is true if this is the last iteration; This is never true if the repeatexpression returns an iterator | length&the length of the sequence being iterated over; This is set tosys.maxint for iterators. | letter&lower case letter corresponding to the current iteration numberstarting with ’a’ | Letter&upper case letter corresponding to the current iteration numberstarting with ’A’ | roman&lower case Roman numeral corresponding to the current iteration numberstarting with ’i’ | Roman&upper case Roman numeral corresponding to the current iteration numberstarting with ’I’ | To access the properties listed above, you must use the property of the repeat variable that corresponds to the repeat variable name. For example, if your repeat variable name is “item”, you would access the above variables using the expressions repeat/item/index, repeat/item/number, repeat/item/even, etc. A simple example of the tal:repeat attribute is shown below.
  1. option name
One commonly used feature of rendering tables is alternating row colors. This is a little bit tricky with ZPT since the tal:condition attribute is evaluated before the tal:repeat directive. You can get around this by using the metal: namespace. This is the namespace used by ZPT’s macro language[fn6] You can create another element around the element you want to be conditional. This wrapper element is simply there to do the iterating, but is not included in the output. The example below shows how to do alternating row colors in an HTML table. [fn6]The macro language isn’t discussed here. See the official ZPT documentation for more information.
5.3.3.2.4 tal:content The tal:content attribute evaluates an expression and replaces the content of the element with the result of the expression. The syntax is shown below. tal:content="[ text | structure ] expression" The text and structure options in the tal:content attribute indicate whether or not the content returned by the expression should be escaped (i.e. "&<> replaced by ", &, <, and >, respectively). When the text option is used, these special characters are escaped; this is the default behavior. When the structure option is specified, the result of the expression is assumed to be valid markup and is not escaped. In SimpleTAL, the default behavior is the same as using the text option. However, in plas, 99.9% of the time the content returned by the expression is valid markup, so the default was changed to structure in the SimpleTAL package distributed with plas. 5.3.3.2.5 tal:replace The tal:replace attribute is much like the tal:content attribute. They both evaluate an expression and include the content of that expression in the output, and they both have a text and structure option to indicate escaping of special characters. The difference is that when the tal:replace attribute is used, the element with the tal:replace attribute on it is not included in the output. Only the content of the evaluated expression is returned. The syntax of the tal:replace attribute is shown below. tal:replace="[ text | structure ] expression" 5.3.3.2.6 tal:attributes The tal:attributes attribute allows you to programatically create attributes on the element. The syntax is shown below. tal:attributes="name expression [; attribute-expression ]" The syntax of the tal:attributes attribute is very similar to that of the tal:define attribute. However, in the case of the tal:attributes attribute, the name is the name of the attribute to be created on the element and the expression is evaluated to get the value of the attribute. If an error occurs or None is returned by the expression, then the attribute is removed from the element. Just as in the case of the tal:define attribute, you can specify multiple attributes separated by semi-colons (;). If a semi-colon character is needed in the expression, then it must be represented by a double semi-colon (;;). An example of using the tal:attributes is shown below. link text 5.3.3.2.7 tal:omit-tag The tal:omit-tag attribute allows you to conditionally omit an element. The syntax is shown below. tal:omit-tag="expression" If the value of “expression” evaluates to true (or is empty), the element is omitted; however, the content of the element is still sent to the output. If the expression evaluates to false, the element is included in the output. 5.4 XHTML Renderer The XHTML renderer is a subclass of the ZPT renderer (section {ref27}). Since the ZPT renderer can render any variant of XML or HTML, the XHTML renderer has very little to do in the Python code. Almost all of the additionaly processing in the XHTML renderer has to do with generated images. Since HTML cannot render ’s vector graphics or equations natively, they are converted to images. In order for inline equations to line up correctly with the text around them, CSS attributes are used to adjust the vertical alignment. Since the images aren’t generated until after all of the document has been rendered, this CSS information is added in post-processing (i.e. the cleanup method). In addition to the processing of images, all characters with a ordinal greater than 127 are converted into numerical entities. This should prevent any rendering problems due to unknown encodings. Most of the work in this renderer was in creating the templates for every construct. Since this renderer was intended to be the basis of all HTML- based renderers, it must be capable of rendering all constructs; therefore, there are ZPT templates for every command, and the commands in some common packages. While the XHTML renderer is fairly complete when it comes to standard , there are many packages which are not currently supported. To add support for these packages, templates (and possibly Python based macros; section {ref5}) must be created. 5.4.1 Themes The theming support in the XHTML renderer is the same as that of the ZPT renderer. Any template directory can have a subdirectory called ‘Themes’ which contains theme directories with sets of templates in them. The names of the directories in the ‘Themes’ directory corresponds to the name of the theme. There are currently two themes included with plas: default and plain. The default theme is a minor variation of the one used in the Python 1.6 documentation. The plain theme is a theme with no extra navigation bars. 5.5 tBook Renderer Not yet implemented. 5.6 DocBook Renderer Not yet implemented. 6 plasFrameworks and APIs 6.1 plasTeX — The Python Macro and Document Interfaces{label21} {label34} While plasdoes a respectable job expanding macros, some macros may be too complicated for it to handle. These macros may have to be re-coded as Python objects. Another reason you may want to use Python-based macros is for performance reasons. In most cases, macros coded using Python will be faster than those expanded as true macros. The API for Python macros is much higher-level than that of macros. This has good and bad ramifications. The good is that most common forms of macros can be parsed and processed very easily using Python code which is easier to read than code. The bad news is that if you are doing something that isn’t common, you will have more work to do. Below is a basic example. from plasTeX import Command class mycommand(Command): """ \mycommand[name]{title} """ args = '[ name ] title' The code above demonstrates how to create a Python-based macro corresponding to macro with the form \mycommand[name]{title} where ‘name’ is an optional argument and ‘title’ is a mandatory argument. In the Python version of the macro, you simply declare the arguments in the args attribute as they would be used in the macro, while leaving the braces off of the mandatory arguments. When parsed in a document, an instance of the class mycommand in created and the arguments corresponding to ‘name’ and ‘title’ are set in the attributes dictionary for that instance. This is very similar to the way an XML DOM works, and there are more DOM similarities yet to come. In addition, there are ways to handle casting of the arguments to various data types in Python. The API documentation below goes into more detail on these and many more aspects of the Python macro API. 6.1.1 Macro Objects The Macro class is the base class for all Python based macros although you will generally want to subclass from Command or Environment in real-world use. There are various attributes and methods that affect how Python macros are parsed, constructed and inserted into the resulting DOM. These are described below. specifies the arguments to the macro and their data types. The args attribute gives you a very simple, yet extremely powerful way of parsing macro arguments and converting them into Python objects. Once parsed, each macro argument is set in the attributes dictionary of the Python instance using the name given in the args string. For example, the following args string will direct plasto parse two mandatory arguments, ‘id’ and ‘title’, and put them into the attributes dictonary. args = 'id title' You can also parse optional arguments, usually surrounded by square brackets ([]). However, in plas, any arguments specified in the args string that aren’t mandatory (i.e. no braces surrounding it) are automatically considered optional. This may not truly be the case, but it doesn’t make much difference. If they truly are mandatory, then your source file will always have them and plaswill simply always find them even though it considers them to be optional. Optional arguments in the args string are surround by matching square brackets ([]), angle brackets (<>), or parentheses (()). The name for the attribute is placed between the matching symbols as follows: args = '[ toc ] title' args = '( position ) object' args = '< markup > ref' You can have as many optional arguments as you wish. It is also possible to have optional arguments using braces ({}), but this requires you to change ’s category codes and is not common. Modifiers such as asterisks (*) are also allowed in the args string. You can also use the plus (+) and minus (-) signs as modifiers although these are not common. Using modifiers can affect the incrementing of counters (see the parse() method for more information). In addition to specifying which arguments to parse, you can also specify what the data type should be. By default, all arguments are processed and stored as document fragments. However, some arguments may be simpler than that. They may contain an integer, a string, an ID, etc. Others may be collections like a list or dictionary. There are even more esoteric types for mostly internal use that allow you to get unexpanded tokens, dimensions, and the like. Regardless, all of these directives are specified in the same way, using the typecast operator: ‘:’. To cast an argument, simply place a colon (:) and the name of the argument type immediately after the name of the argument. The following example casts the ‘filename’ argument to a string. args = 'filename:str' Parsing compound arguments such as lists and dictionaries is very similar. args = 'filenames:list' By default, compound arguments are assumed to be comma separated. If you are using a different separator, it is specified in parentheses after the type. args = 'filenames:list(;)' Again, each element element in the list, by default, is a document fragment. However, you can also give the data type of the elements with another typecast. args = 'filenames:list(;):str' Parsing dictionaries is a bit more restrictive. plasassumes that dictionary arguments are always key-value pairs, that the key is always a string and the separator between the key and value is an equals sign (=). Other than that, they operate in the same manner. A full list of the supported data types as well as more examples are discussed in section {ref5}. the source for the arguments to this macro. This is a read-only attribute. gives the arguments in the args attribute in object form (i.e. Argument objects). {bf Note:} This is a read-only attribute. {bf Note:} This is generally an internal-use-only attribute. indicates whether the macro node should be considered a block-level element. If true, this node will be put into its own paragraph node (which also has the blockType set to True) to make it easier to generate output that requires block-level to exist outside of paragraphs. specifies the name of the counter to associate with this macro. Each time an instance of this macro is created, this counter is incremented. The incrementing of this counter, of course, resets any “child” counters just like in . By default and convention, if the macro’s first argument is an asterisk (i.e. *), the counter is not incremented. specifies a unique ID for the object. If the object has an associated label (i.e. \label), that is its ID. You can also set the ID manually. Otherwise, an ID will be generated based on the result of Python’s id() function. a dictionary containing all of the objects referenced by “idref” type arguments. Each idref attribute is stored under the name of the argument in the idref dictionary. specifies the hierarchical level of the node in the DOM. For most macros, this will be set to Node.COMMAND_LEVEL or Node.ENVIRONMENT_LEVEL by the Command and Environment macros, respectively. However, there are other levels that invoke special processing. In particular, sectioning commands such as \section and \subsection have levels set to Node.SECTION_LEVEL and Node.SUBSECTION_LEVEL. These levels assist in the building of an appropriate DOM. Unless you are creating a sectioning command or a command that should act like a paragraph, you should leave the value of this attribute alone. See section {ref35} for more information. specifies the name of the macro that this class corresponds to. By default, the Python class name is the name that is used, but there are some legal macro names that are not legal Python class names. In those cases, you would use macroName to specify the correct name. Below is an example. class _illegalname(Command): macroName = '@illegalname' {bf Note:} This is a class attribute, not an instance attribute. specifies what the current parsing mode is for this macro. Macro classes are instantiated for every invocation including each \begin and \end. This attribute is set to Macro.MODE_NONE for normal commands, Macro.MODE_BEGIN for the beginning of an environment, and Macro.MODE_END for the end of an environment. These attributes are used in the invoke() method to determine the scope of macros used within the environment. They are also used in printing the source of the macro in the source attribute. Unless you really know what you are doing, this should be treated as a read-only attribute. boolean that indicates that the macro is in ’s “math mode.” This is a read- only attribute. the name of the node in the DOM. This will either be the name given in macroName, if defined, or the name of the class itself. {bf Note:} This is a read-only attribute. specifies the value to return when this macro is referenced (i.e. \ref). This is set automatically when the counter associated with the macro is incremented. specifies the source that was parsed to create the object. This is most useful in the renderer if you need to generate an image of a document node. You can simply retrieve the source from this attribute, create a document including the source, then convert the DVI file to the appropriate image type. specifies style overrides, in CSS format, that should be applied to the output. This object is a dictionary, so style property names are given as the key and property values are given as the values. inst.style['color'] = 'red' inst.style['background-color'] = 'blue' {bf Note:} Not all renderers are going to support CSS styles. same as nodeName specifies the title of the current object. If the attributes dictionary contains a title, that object is returned. An AttributeError is thrown if there is no ‘title’ key in that dictionary. A title can also be set manually by setting this attribute. absorb the tokens from the given output stream that belong to the current object. In most commands, this does nothing. However, environments have a \begin and an \end that surround content that belong to them. In this case, these environments need to absorb those tokens and construct them into the appropriate document object model (see the Environment class for more information). utility method to help macros like lists and tables digest their contents. In lists and tables, the items, rows, and cells are delimited by \begin and \end tokens. They are simply delimited by the occurrence of another item, row, or cell. This method allows you to absorb tokens until a particular class is reached. the expand method is a thin wrapper around the invoke method. The expand method makes sure that all tokens are expanded and will not return a None value like invoke. invakes the macro. Invoking the macro, in the general case, includes creating a new context, parsing the options of the macro, and removing the context. environments are slightly different. If macroMode is set to Macro.MODE_BEGIN, the new context is kept on the stack. If macroMode is set to Macro.MODE_END, no arguments are parsed, the context is simply popped. For most macros, the default implementation will work fine. The return value for this method is generally None (an empty return statement or simply no return statement). In this case, the current object is simply put into the resultant output stream. However, you can also return a list of tokens. In this case, the returned tokens will be put into the output stream in place of the current object. You can even return an empty list to indicate that you don’t want anything to be inserted into the output stream. retrieves all of the macros that belong to the scope of the current Python based macro. group content into paragraphs. Paragraphs are grouped once all other content has been digested. The paragraph grouping routine works like ’s, in that environments are included inside paragraphs. This is unlike HTML’s model, where lists and tables are not included inside paragraphs. The force argument allows you to decide whether or not paragraphs should be forced. By default, all content of the node is grouped into paragraphs whether or not the content originally contained a paragraph node. However, with force set to False, a node will only be grouped into paragraphs if the original content contained at least one paragraph node. Even though the paragraph method follow’s ’s model, it is still possible to generate valid HTML content. Any node with the blockType attribute set to True is considered to be a block-level node. This means that it will be contained in its own paragraph node. This paragraph node will also have the blockType attribute set to True so that in the renderer the paragraph can be inserted or ignored based on this attribute. parses the arguments defined in the args attribute from the given token stream. This method also calls several hooks as described in the table below. {bf Method Name}&{bf Description} | -------------------------------------------------------------------- preParse()&called at the beginning of the argument parsing process | preArgument()&called before parsing each argument | postArgument()&called after parsing each argument | postParse()&called at the end of the argument parsing process | The methods are called to assist in labeling and counting. For example, by default, the counter associated with a macro is automatically incremented when the macro is parsed. However, if the first argument is a modifier (i.e. *, +, -), the counter will not be incremented. This is handled in the preArgument() and postArgument() methods. Each time an argument is parsed, the result is put into the attributes dictionary. The key in the dictionary is, of course, the name given to that argument in the args string. Modifiers such as *, +, and - are stored under the special key ‘*modifier*’. The return value for this method is simply a reference to the attributes dictionary. {bf Note:} If parse() is called on an instance with macroMode set to Macro.MODE_END, no parsing takes place. called after parsing each argument. This is generally where label and counter mechanisms are handled. arg is the Argument instance that holds all argument meta-data including the argument’s name, source, and options. tex is the TeX instance containing the current context do any operations required immediately after parsing the arguments. This generally includes setting up the value that will be returned when referencing the object. called before parsing each argument. This is generally where label and counter mechanisms are handled. arg is the Argument instance that holds all argument meta-data including the argument’s name, source, and options. tex is the TeX instance containing the current context do any operations required immediately before parsing the arguments. set the object as the current labellable object and increment its counter. When an object is set as the current labellable object, the next \label command will point to that object. step the counter associated with the macro 6.2 plasTeX.ConfigManager — plasConfiguration {label36} The configuration system in plasthat parses the command-line options and configuration files is very flexible. While many options are setup by the plasframework, it is possible for you to add your own options. This is useful if you have macros that may need to be configured by configurable options, or if you write a renderer that surfaces special options to control it. The config files that ConfigManager supports are standard INI-style files. This is the same format supported by Python’s ConfigParser. However, this API has been extended with some dictionary-like behaviors to make it more Python friendly. In addition to the config files, ConfigManager can also parse command-line options and merge the options from the command-line into the options set by the given config files. In fact, when adding options to a ConfigManager, you specify both how they appear in the config file as well as how they appear on the command-line. Below is a basic example. from plasTeX.ConfigManager import * c = ConfigManager() # Create a new section in the config file. This corresponds to the # [ sectionname ] sections in an INI file. The returned value is # a reference to the new section d = c.add_section('debugging') # Add an option to the 'debugging' section called 'verbose'. # This corresponds to the config file setting: # # [debugging] # verbose = no # d['verbose'] = BooleanOption( """ Increase level of debugging information """, options = '-v --verbose !-q !--quiet', default = False, ) # Read system-level config file c.read('/etc/myconfig.ini') # Read user-level config file c.read('~/myconfig.ini') # Parse the current command-line arguments opts, args = c.getopt(sys.argv[1:]) # Print the value of the 'verbose' option in the 'debugging' section print c['debugging']['verbose'] One interesting thing to note about retrieving values from a ConfigManager is that you get the value of the option rather than the option instance that you put in. For example, in the code above. A BooleanOption in put into the ‘verbose’ option slot, but when it is retrieved in the print statement at the end, it prints out a boolean value. This is true of all option types. You can access the option instance in the data attribute of the section (e.g. c[’debugging’].data[’verbose’]). 6.2.1 ConfigManager Objects Instantiate a configuration class for plasthat parses the command-line options as well as reads the config files. The optional argument, defaults, is a dictionary of default values for the configuration object. These values are used if a value is not found in the requested section. merge items from another ConfigManager. This allows you to add ConfigManager instances with syntax like: config + other. This operation will modify the original instance. create a new section in the configuration with the given name. This name is the name used for the section heading in the INI file (i.e. the name used within square brackets ([]) to start a section). The return value of this method is a reference to the newly created section. return the dictionary of categories return a deep copy of the configuration return the dictionary of default values read configuration data contained in files specified by filenames. Files that cannot be opened are silently ignored. This is designed so that you can specify a list of potential configuration file locations (e.g. current directory, user’s home directory, system directory), and all existing configuration files in the list will be read. A single filename may also be given. retrieve the value of option from the section section. Setting raw to true prevents any string interpolation from occurring in that value. vars is a dictionary of addition value to use when interpolating values into the option. {bf Note:} You can alsouse the alternative dictionary syntax: config[section].get(option). retrieve the specified value and cast it to a boolean return the title of the given category retrieve the specified value and cast it to a float retrieve the specified value and cast it to and integer return the option value with any leading and trailing quotes removed parse the command-line options. If args is not given, the args are parsed from sys.argv[1:]. If merge is set to false, then the options are not merged into the configuration. The return value is a two element tuple. The first value is a list of parsed options in the form (option, value), and the second value is the list of arguments. return the option value as a list using delim as the delimiter return the raw (i.e. un-interpolated) value of the option add a category to group options when printing the command-line help. Command-line options can be grouped into categories to make options easier to find when printing the usage message for a program. Categories consist of two pieces: 1) the name, and 2) the title. The name is the key in the category dictionary and is the name used when specifying which category an option belongs to. The title is the actual text that you see as a section header when printing the usage message. return a boolean indicating whether or not an option with the given name exists in the given section return a boolean indicating whether or not a section with the given name exists merge items from another ConfigManager. This allows you to add ConfigManager instances with syntax like: config += other. return a list of configured option names within a section. Options are all of the settings of a configuration file within a section (i.e. the lines that start with ‘optionname=’). merge items from another ConfigManager. This allows you to add ConfigManager instances with syntax like: other + config. This operation will modify the original instance. like read(), but the argument is a file object. The optional filename argument is used for printing error messages. remove the specified option from the given section remove the specified section return the configuration as an INI formatted string; this also includes options that were set from Python code. return a list of all section names in the configuration set the value of an option return the configuration as an INI formatted string; however, do not include options that were set from Python code. return the configuration as an INI formatted string. The source option indicates which source of information should be included in the resulting INI file. The possible values are: {bf Name}&{bf Description} | ---------------------------------------------- COMMANDLINE&set from a command-line option | CONFIGFILE&set from a configuration file | BUILTIN&set from Python code | ENVIRONMENT&set from an environment variable | write the configuration as an INI formatted string to the given file object print the descriptions of all command-line options. If categories is specified, only the command-line options from those categories is printed. 6.2.2 ConfigSection Objects Instantiate a ConfigSection object. name is the name of the section. data, if specified, is the dictionary of data to initalize the section contents with. ConfigSection objects are rarely instantiated manually. They are generally created using the ConfigManager API (either the direct methods or the Python dictionary syntax). dictionary that contains the option instances. This is only accessed if you want to retrieve the real option instances. Normally, you would use standard dictionary key access syntax on the section itself to retrieve the option values. the name given to the section. make a deep copy of the section object. return the dictionary of default options associated with the parent ConfigManager. retrieve the value of option. Setting raw to true prevents any string interpolation from occurring in that value. vars is a dictionary of addition value to use when interpolating values into the option. {bf Note:} You can alsouse the alternative dictionary syntax: section.get(option). retrieve the specified value and cast it to a boolean retrieve the value of an option. This method allows you to use Python’s dictionary syntax on a section as shown below. # Print the value of the 'optionname' option print mysection['optionname'] retrieve the specified value and cast it to and integer retrieve the specified value and cast it to a float return the raw (i.e. un-interpolated) value of the option a reference to the parent ConfigManager object. return a string containing an INI file representation of the section. create a new option or set an existing option with the name option and the value of value. If the given value is already an option instance, it is simply inserted into the section. If it is not an option instance, an appropriate type of option is chosen for the given type. create a new option or set an existing option with the name key and the value of value. This method allows you to use Python’s dictionary syntax to set options as shown below. # Create a new option called 'optionname' mysection['optionname'] = 10 return a string containing an INI file representation of the section. Options set from Python code are not included in this representation. return a string containing an INI file representation of the section. The source option allows you to only display options from certain sources. See the ConfigManager.source() method for more information. 6.2.3 Configuration Option Types There are several option types that should cover just about any type of command-line and configuration option that you may have. However, in the spirit of object-orientedness, you can, of course, subclass one of these and create your own types. GenericOption is the base class for all options. It contains all of the underlying framework for options, but should never be instantiated directly. Only subclasses should be instantiated. Declare a command line option. Instances of subclasses of GenericOption must be placed in a ConfigManager instance to be used. See the documentation for ConfigManager for more details. docstring is a string in the format of Python documentation strings that describes the option and its usage. The first line is assumed to be a one- line summary for the option. The following paragraphs are assumed to be a complete description of the option. You can give a paragraph with the label ’Valid Values:’ that contains a short description of the values that are valid for the current option. If this paragraph exists and an error is encountered while validating the option, this paragraph will be printed instead of the somewhat generic error message for that option type. options is a string containing all possible variants of the option. All variants should contain the ’-’, ’–’, etc. at the beginning. For boolean options, the option can be preceded by a ’!’ to mean that the option should be turned OFF rather than ON which is the default. default is a value for the option to take if it isn’t specified on the command line optional is a value for the option if it is given without a value. This is only used for options that normally take a value, but you also want a default that indicates that the option was given without a value. values defines valid values for the option. This argument can take the following forms: {bf Type}&{bf Description} | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- single value&for StringOption this this is a string, for IntegerOption thisis an integer, for FloatOption this is a float. The single value mode ismost useful when the value is a regular expression. For example, to specifythat a StringOption must be a string of characters followed by a digit,’values’ would be set to re.compile(r’\w+\d’). | range of values&a two element list can be given to specify the endpoints ofa range of valid values. This is probably most useful on IntegerOption andFloatOption. For example, to specify that an IntegerOption can only take thevalues from 0 to 10, ’values’ would be set to [0,10]. {bf Note:} This modemust {em always} use a Python list since using a tuple means something elseentirely. | tuple of values&a tuple of values can be used to specify a complete list ofvalid values. For example, to specify that an IntegerOption can take thevalues 1, 2, or 3, ’values’ would be set to (1,2,3). If a string value canonly take the values, ’hi’, ’bye’, and any string of characters beginningwith the letter ’z’, ’values’ would be set to(’hi’,’bye’,re.compile(r’z.*?’)). {bf Note:} This mode must *always* use aPython tuple since using a list means something else entirely. | category is a category key which specifies which category the option belongs to (see the ConfigManager documentation on how to create categories). callback is a function to call after the value of the option has been validated. This function will be called with the validated option value as its only argument. environ is an environment variable to use as default value instead of specified value. If the environment variable exists, it will be used for the default value instead of the specified value. registry is a registry key to use as default value instead of specified value. If the registry key exists, it will be used for the default value instead of the specified value. A specified environment variable takes precedence over this value. {bf Note:} This is not implemented yet. name is a key used to get the option from its corresponding section. You do not need to specify this. It will be set automatically when you put the option into the ConfigManager instance. mandatory is a flag used to determine if the option itself is required to be present. The idea of a "mandatory option" is a little strange, but I have seen it done. source is a flag used to determine whether the option was set directly in the ConfigManager instance through Python, by a configuration file/command line option, etc. You do not need to specify this, it will be set automatically during parsing. This flag should have the value of BUILTIN, COMMANDLINE, CONFIGFILE, ENVIRONMENT, REGISTRY, or CODE. return a boolean indicating whether or not the option accepts an argument on the command-line. For example, boolean options do not accept an argument. cast the given value to the appropriate type. check value against all possible valid values for the option. If the value is invalid, raise an InvalidOptionError exception. reset the value of the option as if it had never been set. return the current value of the option. If default is specified and a value cannot be gotten from any source, it is returned. return a string containing a command-line representation of the option and its value. return a boolean indicating whether or not the option requires an argument on the command-line. As mentioned previously, GenericOption is an abstract class (i.e. it should not be instantiated directly). Only subclasses of GenericOption should be instantiated. Below are some examples of use of some of these subclasses, followed by the descriptions of the subclasses themselves. BooleanOption( ''' Display help message ''', options = '--help -h', callback = usage, # usage() function must exist prior to this ) BooleanOption( ''' Set verbosity ''', options = '-v --verbose !-q !--quiet', ) StringOption( ''' IP address option This option accepts an IP address to connect to. Valid Values: '#.#.#.#' where # is a number from 1 to 255 ''', options = '--ip-address', values = re.compile(r'\d{1,3}(\.\d{1,3}){3}'), default = '127.0.0.0', synopsis = '#.#.#.#', category = 'network', # Assumes 'network' category exists ) IntegerOption( ''' Number of seconds to wait before timing out Valid Values: positive integer ''', options = '--timeout -t', default = 300, values = [0,1e9], category = 'network', ) IntegerOption( ''' Number of tries to connect to the host before giving up Valid Values: accepts 1, 2, or 3 retries ''', options = '--tries', default = 1, values = (1,2,3), category = 'network', ) StringOption( ''' Nonsense option for example purposes only Valid Values: accepts 'hi', 'bye', or any string beginning with the letter 'z' ''', options = '--nonsense -n', default = 'hi', values = ('hi', 'bye', re.compile(r'z.*?')), ) Boolean options are simply options that allow you to specify an ‘on’ or ‘off’ state. The accepted values for a boolean option in a config file are ‘on’, ‘off’, ‘true’, ‘false’, ‘yes’, ‘no’, 0, and 1. Boolean options on the command-line do not take an argument; simply specifying the option sets the state to true. One interesting feature of boolean options is in specifying the command-line options. Since you cannot specify a value on the command-line (the existence of the option indicates the state), there must be a way to set the state to false. This is done using the ‘not’ operator (!). When specifying the options argument of the constructor, if you prefix an command-line option with an exclamation point, the existence of that option indicates a false state rather than a true state. Below is an example of an options value that has a way to turn debugging information on ({bf --debug}) or off ({bf --no- debug}). BooleanOption( options = '--debug !--no-debug' ) Compound options are options that contain multiple elements on the command- line. They are simply groups of command-line arguments surrounded by a pair of grouping characters (e.g. (), [], {}, <>). This grouping can contain anything including other command-line arguments. However, all content between the grouping characters is unparsed. This can be useful if you have a program that wraps another program and you want to be able to forward the wrapped program’s options on. An example of a compound option used on the command-line is shown below. # Capture the --diff-opts options to send to another program mycommand --other-opt --diff-opts ( -ib --minimal ) file1 file2 A CountedOption is a boolean option that keeps track of how many times it has been specified. This is useful for options that control the verbosity of logging messages in a program where the number of times an option is specified, the more logging information is printed. An InputDirectoryOption is an option that accepts a directory name for input. This directory name is checked to make sure that it exists and that it is readable. If it is not, a InvalidOptionError exception is raised. An OutputDirectoryOption is an option that accepts a directory name for output. If the directory exists, it is checked to make sure that it is readable. If it does not exist, it is created. An InputFileOption is an option that accepts a file name for input. The filename is checked to make sure that it exists and is readable. If it isn’t, an InvalidOptionError exception is raised. An OutputFileOption is an option that accepts a file name for output. If the file exists, it is checked to make sure that it is writable. If a name contains a directory, the path is checked to make sure that it is writable. If the directory does not exist, it is created. A FloatOption is an option that accepts a floating point number. An IntegerOption is an option that accepts an integer value. A MultiOption is an option that is intended to be used multiple times on the command-line, or take a list of values. Other options when specified more than once simply overwrite the previous value. MultiOptions will append the new values to a list. The delimiter used to separate multiple values is the comma (,). A different character can be specified in the delim argument. In addition, it is possible to specify the number of values that are legal in the range argument. The range argument takes a two element list. The first element is the minimum number of times the argument is required. The second element is the maximum number of times it is required. You can use a ‘*’ (in quotes) to mean an infinite number. You can cast each element in the list of values to a particular type by using the template argument. The template argument takes a reference to the option class that you want the values to be converted to. A StringOption is an option that accepts an arbitrary string. 6.3 plasTeX.DOM — The plasDocument Object Model (DOM){label35} {label37} While most processors use a stream model where the input is directly connected to the output, plasactually works in two phases. The first phase reads in the document, expands macros, and constructs an object similar to an XML DOM. This object is then passed to the renderer which translates it into the appropriate output format. The benefit to doing it this way is that you are not limited to a single output format. In addition, you can actually apply multiple renderers with only one parse step. This section describes the DOM used by plas, its API, and the similarities and differences between the plasDOM and the XML DOM. 6.3.1 plasvs. XML The plasDOM and XML DOM have more similarities than differences. This similarity is purely intentional to reduce the learning curve and to prevent reinventing the wheel. However, the XML DOM can be a bit cumbersome especially when you’re used to much simpler and more elegant Python code. Because of this, some Python behaviors were adopted into the plasDOM. The good news is that these extensions do not break compatibility with the XML DOM. There are, however, some differences due to conventions used . The only significant difference between the plasDOM and the XML DOM is that plasnodes do not have true attributes like in XML. Attributes in XML are more like arguments in , because they are similar the plas DOM actually puts the macro arguments into the attributes dictionary. This does create an incompatibility though since XML DOM attributes can only be strings whereas arguments can contain lots of markup. In addition, plasallows you to convert these arguments into Python strings, lists, dictionaries, etc., so essentially any type of object can occur in the attributes dictionary. Other than paying attention to the the attributes dictionary difference, you can use most other XML DOM methods on plasdocument objects to create nodes, delete nodes, etc. The full API is described below. In most cases, you will not need to be concerned with instantiating nodes. The plasframework does this. However, the API can be helpful if you want to modify the document object that plascreates. 6.3.2 Node Objects The Node class is the base class for all nodes in the plasDOM inluding elements, text, etc. a dictionary containing the attributes, in the case of plas  the macro arguments a list of the nodes that are contained by this one. In plas, this generally contains the contents of a environment. boolean indicating whether or not the node only contains whitespace. the last node in the childNodes list. If there are no child nodes, the value is None. the name of the node. This is either the special node name as specified in the XML DOM (e.g. #document-fragment, #text, etc.), or, if the node corresponds to an element, it is the name of the element. integer indicating the type of the node. The node types are defined as: Node.ELEMENT_NODE Node.ATTRIBUTE_NODE Node.TEXT_NODE Node.CDATA_SECTION_NODE Node.ENTITY_REFERENCE_NODE Node.ENTITY_NODE Node.PROCESSING_INSTRUCTION_NODE Node.COMMENT_NODE Node.DOCUMENT_NODE Node.DOCUMENT_TYPE_NODE Node.DOCUMENT_FRAGMENT_NODE Node.NOTATION_NODE {bf Note:} These are defined by the XML DOM, not all of them are used by plas. refers to the node that contains this node the node in the document that is adjacent to and immediately before this node. If one does not exist, the value is None. the node in the document that is adjacent to and immediately after this node. If one does not exist, the value is None. the node that owner of, and ultimate parent of, all nodes in the document contains just the text content of this node specifies a unicode string that could be used in place of the node. This unicode string will be converted into tokens in the plas output stream. dictionary used for holding user-defined data create a new node that is the sum of self and other. This allows you to use nodes in Python statements like: node + other. adds a new child to the end of the child nodes same as append create a clone of the current node. If deep is true, then the attributes and child nodes are cloned as well. Otherwise, all references to attributes and child nodes will be shared between the nodes. same as isEqualNode, but allows you to compare nodes using the Python statement: node == other. appends other to list of children then returns self returns the child node at the index given by i. This allows you to use Python’s slicing syntax to retrieve child nodes: node[i]. retrieves the data in the userdata dictionary under the name key returns a boolean indicating whether or not this node has attributes defined returns a boolean indicating whether or not the node has child nodes same as extend. This allows you to use nodes in Python statements like: node += other. inserts node newChild into position i in the child nodes list inserts newChild before refChild in this node. If refChild is not found, a NotFoundErr exception is raised. indicates whether the given node is equivalent to this one indicates whether the given node is the same node as this one returns an iterator that iterates over the child nodes. This allows you to use Python’s iter() function on nodes. returns the number of child nodes. This allows you to use Python’s len() function on nodes. combine consecutive text nodes and remove comments in this node removes child node and the index given by index. If no index is specified, the last child is removed. create a new node that is the sum of other and self. This allows you to use nodes in Python statements like: other + node. replaces oldChild with newChild in this node. If oldChild is not found, a NotFoundErr exception is raised. removes oldChild from this node. If oldChild is not found, a NotFoundErr exception is raised. sets the item at index i to node. This allows you to use Python’s slicing syntax to insert child nodes; see the example below. mynode[5] = othernode mynode[6:10] = [node1, node2] put data specified in data into the userdata dictionary under the name given by key return an XML representation of the node 6.3.3 DocumentFragment Objects A collection of nodes that make up only part of a document. This is mainly used to hold the content of a macro argument. 6.3.4 Element Objects The base class for all element-type nodes in a document. Elements generally refer to nodes created by commands and environments. returns the attribute specified by name retrieve the element with the given ID retrieve all nodes with the given name in the node returns a boolean indicating whether or not the specified attribute exists removes the attribute name from the attributes dictionary sets the attribute value in the attributes dictionary using the key name 6.3.5 Text Objects This is the node type used for all text data in a document object. Unlike XML DOM text nodes, text nodes in plasare not mutable. This is because they are a subclass of unicode. This means that they will respond to all of the standard Python string methods in addition to the Node methods and the methods described below. the text content of the node the length of the text content the text content of the node returns the text content from the current text node as well as its siblings 6.3.6 Document Objects The top-level node of a document that contains all other nodes. instantiate a new document fragment instantiate a new element with the given name instantiate a new text node initialized with data import a node from another document. If deep is true, all nodes within importedNode are cloned. concatenate all consecutive text nodes and remove comments 6.3.7 Command Objects The Command class is a subclass of Macro. This is the class that should be subclassed when creating Python based macros that correspond to commands. For more information on the Command class’ API, see the Macro class. 6.3.8 Environment Objects The Environment class is a subclass of Macro. This is the class that should be subclassed when creating Python based macros that correspond to environments. The main difference between the processing of Commands and Environments is that the invoke() method does special handling of the document context, and the digest() method absorbs the output stream tokens that are encapsulated by the \begin and \end tokens. For more information on the Environment class’ API, see the Macro class. 6.3.9 TeXFragment Objects A fragment of a document. This class is used mainly to store the contents of macro arguments. the source representation of the document fragment 6.3.10 TeXDocument Objects A complete document. a list of two element tuples containing character substitutions for all text nodes in a document. This is used to convert charcter strings like “---” into “—”. The first element in each tuple in the string to replace, the second element is the unicode character or sequence to replace the original string with. returns the source representation of the document preamble (i.e. everything before the \begin{document}) the source representation of the document 6.4 plasTeX.TeX — The Stream {label38} The stream is the piece of plaswhere the parsing of the document takes place. While the TeX class is fairly large, there are only a few methods and attributes designated in the public API. The stream is based on a Python generator. When you feed it a source file, it processes the file much like itself. However, on the output end, rather than a DVI file, you get a plasdocument object. The basic usage is shown in the code below. from plasTeX.TeX import TeX doc = TeX(file='myfile.tex').parse() 6.4.1 TeX Objects The TeX class is the central engine that does all of the parsing, invoking of macros, and other document building tasks. You can pass in an owner document if you have a customized document node, or if it contains a customized configuration; otherwise, the default TeXDocument class is instantiated. The file argument is the name of a file. This file will be searched for using the standard technique and will be read using the default input encoding in the document’s configuration. disables logging. This is useful if you are using the TeX object within another library and do not want all of the status information to be printed to the screen. {bf Note:} This is a class method. the current filename being processed the name of the basename at the top of the input stack the line number of the current file being processed expand a list of unexpanded tokens. This method can be used to expand tokens without having them sent to the output stream. The returned value is a TeXFragment populated with the expanded tokens. add a new input source to the input stack. source should be a Python file object. This can be used to add additional input sources to the stream after the TeX object has been instantiated. return a generator that iterates through the tokens in the source. This method allows you to treat the TeX stream as an iterable and use it in looping constructs. While the looping is generally handled in the parse() method, you can manually expand the tokens in the source by looping over the TeX object as well. for tok in TeX(open('myfile.tex')): print tok return an iterator that iterates over the unexpanded tokens in the input document. locate the given file in a kpsewhich-like manner. The full path to the file is returned if it is found; otherwise, None is returned. {bf Note:} Currently, only the directories listed in the environment variable TEXINPUTS are searched. joins consecutive text tokens into a string. If the list of tokens contain tokens that are not text tokens, the original list of tokens is returned. parse the sources currently in the input stack until they are empty. The output argument is an optional Document node to put the resulting nodes into. If none is supplied, a TeXDocument instance will be created. The return value is the document from the output argument or the instantiated TeXDocument object. pushes a token back into the input stream to be re-read. pushes a list of tokens back into the input stream to be re-read. parse a macro argument without the source that created it. This method is just a thin wrapper around readArgumentAndSource. See that method for more information. parse a macro argument. Return the argument and the source that created it. The arguments are described below. {bf Option}&{bf Description} | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- spec&string containing information about the type of argument to get. If itis ’None’, the next token is returned. If it is a two-character string, agrouping delimited by those two characters is returned (i.e. ’[]’). If it isa single-character string, the stream is checked to see if the nextcharacter is the one specified. In all cases, if the specified argument isnot found, ’None’ is returned. | type&data type to cast the argument to. New types can be added to theself.argtypes dictionary. The key should match this ’type’ argument and thevalue should be a callable object that takes a list of tokens as the firstargument and a list of unspecified keyword arguments (i.e. **kwargs) fortype specific information such as list delimiters. | subtype&data type to use for elements of a list or dictionary | delim&item delimiter for list and dictionary types | expanded&boolean indicating whether the argument content should be expandedor just returned as an unexpanded text string | default&value to return if the argument doesn’t exist | parentNode&the node that the argument belongs to | name&the name of the argument being parsed | The return value is always a two-element tuple. The second value is always a string. However, the first value can take the following values. {bf Value}&{bf Condition} | ------------------------------------------------ None&the requested argument wasn’t found | object of requested type&if type was specified | list of tokens&all other arguments | return the representation of the tokens in tokens convert a string of text into a series of tokens 6.5 plasTeX.Context — The Context{label25} {label39} The Context class stores all of the information associated with the currently running document. This includes things like macros, counters, labels, references, etc. The context also makes sure that localized macros get popped off when processing leaves a macro or environment. The context of a document also has the power to create new counters, dimens, if commands, macros, as well as change token category codes. Each time a TeX object is instantiated, it will create its own context. This context will load all of the base macros and initialize all of the context information described above. 6.5.1 Context Objects Instantiate a new context. If the load argument is set to true, the context will load all of the base macros defined in plas. This includes all of the macros used in the standard and distributions. stack of all macro and category code collections currently in the document being processed. The item at index 0 include the global macro set and default category codes. a dictionary of counters. the object that is given the label when a \label macro is invoked. boolean that specifies if we are currently in ’s math mode or not. a dictionary of labels and the objects that they refer to. add a macro value with name key to the global namespace. add a macro value with name key to the current namespace. same as push() set the category code for a character in the current scope. char is the character that will have its category code changed. code is the category code (0-15) to change it to. create a new chardef like \chardef. name is the name of the command to create. num is the character number to use. look through the stack of macros and return the one with the name key. The return value is an {em instance} of the requested macro, not a reference to the macro class. This method allows you to use Python’s dictionary syntax to retrieve the item from the context as shown below. tex.context['section'] import macros from another context into the global namespace. The argument, context, must be a dictionary of macros. set the given label to the currently labelable object. An object can only have one label associated with it. create a new let like \let. dest is the command sequence to create. source is the token to set the command sequence equivalent to. {bf Example} c.let('bgroup', BeginGroup('{')) imports all of the base macros defined by plas. This includes all of the macros specified by the and systems. loads a language package to configure names such as \figurename, \tablename, etc. language is a string containing the name of the language file to load. document is the document object being processed. load an INI formatted package file (see section {ref23} for more information). loads a package. tex is the processor to use in parsing the package content file is the name of the package to load options is a dictionary containing the options to pass to the package. This generally comes from the optional argument on a \usepackage or \documentclass macro. The package being loaded by this method can be one of three type: 1) a native package, 2) a Python package, or 3) an INI formatted file. The Python version of the package is searched for first. If it is found, it is loaded and an INI version of the package is also loaded if it exists. If there is no Python version, the true version of the package is loaded. If there is an INI version of the package in the same directory as the version, that file is loaded also. create a new command like \newcommand. name is the name of the macro to create. nargs is the number of arguments including optional arguments. definition is a string containing the macro definition. opt is a string containing the default optional value. {bf Examples} c.newcommand('bold', 1, r'\\textbf{#1}') c.newcommand('foo', 2, r'{\\bf #1#2}', opt='myprefix') create a new count like \newcount. create a new counter like \newcounter. name is the name of the counter to create. resetby is the counter that, when incremented, will reset the new counter. initial is the initial value for the counter. format is the printed format of the counter. In addition to creating a new counter macro, another macro corresponding to the \thename is created which prints the value of the counter just like in . create a new definition like \def. name is the name of the definition to create. args is a string containing the argument profile. definition is a string containing the macro code to expand when the definition is invoked. local is a boolean that specifies that the definition should only exist in the local scope. The default value is true. {bf Examples} c.newdef('bold', '#1', '{\\bf #1}') c.newdef('put', '(#1,#2)#3', '\\dostuff{#1}{#2}{#3}') create a new dimen like \newdimen. create a new environment like \newenvironment. This works exactly like the newcommand() method, except that the definition argument is a two element tuple where the first element is a string containing the macro content to expand at the \begin, and the second element is the macro content to expand at the \end. {bf Example} c.newenvironment('mylist', 0, (r'\\begin{itemize}', r'\\end{itemize}')) create a new if like \newif. This also creates macros corresponding to \nametrue and \namefalse. create a new muskip like \newmuskip. create a new skip like \newskip. a dictionary of packages. The keys are the names of the packages. The values are dictionaries containing the options that were specified when the package was loaded. pop the top scope off of the stack. If obj is specified, continue to pop scopes off of the context stack until the scope that was originally added by obj is found. add a new scope to the stack. If a macro instance context is specified, the new scope’s namespace is given by that object. set up a reference for resolution. obj is the macro object that is doing the referencing. label is the label of the node that obj is looking for. If the item that obj is looking for has already been labeled, the idref attribute of obj is set to the abject. Otherwise, the reference is stored away to be resolved later. set the current set of category codes to the set used for the verbatim environment. return the character code that char belongs to. The category codes are the same codes used by and are defined in the Token class. 6.6 plasTeX.Renderers — The plasRendering Framework {label40} The renderer is responsible for taking the information in a plas document object and creating a another (usually visual) representation of it. This representation may be HTML, XML, RTF, etc. While this could be implemented in various ways. One rendering framework is included with plas. The renderer is essentially just a dictionary of functions[fn7]. The keys in this dictionary correspond to names of the nodes in the document object. The values are the functions that are called when a node in the document object needs to be rendered. The only argument to the function is the node itself. What this function does in the rendering process is completely up to it; however, it should refrain from changing the document object itself as other renderers may be using that same object. [fn7] “functions” is being used loosely here. Actually, any Python callable object (i.e. function, method, or any object with the __call__ method implemented) can be used There are some responsibilities that all renderers share. Renderers are responsible for checking options in the configuration object. For instance, renderers are responsible for generating filenames, creating directories, writing files in the proper encoding, generating images, splitting the document into multiple output files, etc. Of course, how it accomplishes this is really renderer dependent. An example of a renderer based on Zope Page Templates (ZPT) is included with plas. This renderer is capable of generating XML and HTML output. 6.6.1 Renderer Objects Base class for all renderers. Renderer is a dictionary and contains functions that are called for each node in the plas document object. The keys in the dictionary correspond to the names of the nodes. This renderer implementation uses a mixin called Renderable that is mixed into the Node class prior to rendering. Renderable adds various methods to the Node namespace to assist in the rendering process. The primary inclusion is the __unicode__() method. This method returns a unicode representation of the current node and all of its child nodes. For more information, see the Renderable class documentation. the default renderer value. If a node is being rendered and no key in the renderer matches the name of the node being rendered, this function is used instead. contains the file extension to use for generated files. This extension is only used if the filename generator does not supply a file extension. a list of files created during rendering. contains a string template that renders the placeholder for the image attributes: width, height, and depth. This placeholder is inserted into the document where the width, height, and depth of an image is needed. The placeholder is needed because images are not generated until after the document is rendered. See the Imager API (section {ref28}) for more information. contains a string template that renders the placeholder for the image attribute units. This placeholder is inserted in the document any time an attribute of a particular unit is requested. This placeholder will always occur immediately after the string generated by imageAttrs. The placeholder is needed because images are not generated until after the document is rendered. See the Imager API (section {ref28}) for more information. a reference to an Imager implementation. Imagers are responsible for generating images from code. This is needed for output types which aren’t capable of displaying equations, pictures, etc. such as HTML. contains a list of file extensions of valid image types for the renderer. The first element in the list is the default image format. This format is used when generating images (if the image type isn’t specified by the filename generater). When static images are simply copied from the document, their format is checked against the list of supported image types. If the static image is not in the correct format it is converted to the default image format. Below is an example of a list of image types used in the HTML renderer. These image types are valid because web browsers all support these formats. imageTypes = ['.png','.gif','.jpg','.jpeg'] contains a list of file extensions of valid vector image types for the renderer. The first element in the list is the default vector image format. This format is used when generating images. Static images are simply copied into the output document directory. Below is an example of a list of image types used in the HTML renderer. These image types are valid because there are plug-ins available for these formats. vectorImageTypes = ['.svg'] filename generator. This method generates a basename based on the options in the configuration. The generator has an attribute called namespace which contains the namespace used to resolve the variables in the filename string. This namespace should be populated prior to invoking the generator. After a successful filename is generated, the namespace is automatically cleared (with the exception of the variables sent in the namespace when the generator was instantiated). {bf Note:} This generator can be accessed in the usual generator fashion, or called like a function. a function that converts the content returned from each rendered node to the appropriate value. the default renderer to use for text nodes. this method is called once the entire rendering process is finished. Subclasses can use this method to run any post-rendering cleanup tasks. The first argument, document, is the document instance that is being rendered. The second argument, files, is a list of all of the filenames that were created. This method opens each file, reads the content, and calls processFileContent on the file content. It is suggested that renderers override that method instead of cleanup. In addition to overriding processFileContent, you can post-process file content without having to subclass a renderer by using the postProcess argument. See the render method for more information. locate a rendering method from a list of possibilities. keys is a list of strings containing the requested name of a rendering method. This list is traversed in order. The first renderer that is found is returned. default is a default rendering method to return if none of the keys exists in the renderer. this routine is called after the renderer is instantiated. It can be used by subclasses to do any initialization routines before the rendering process. post-processing routine that allows renders to modify the output documents one last time before the rendering process is finished. document is the input document instance. content is the content of the file in a unicode object. The value returned from this method will be written to the output file in the appropriate encoding. invokes the rendering process on document. You can post-process each file after it is rendered by passing a function into the postProcess argument. This function must take two arguments: 1) the document object and 2) the content of a file as a unicode object. It should do whatever processing it needs to the file content and return a unicode object. 6.6.2 Renderable MixIn The Renderable mixin is mixed into the Node namespace prior to the rendering process. The methods mixed in assist in the rendering process. the filename that this object will create. Objects that don’t create new files should simply return None. The configuration determines which nodes should create new files. generate an image of the object and return the image filename. See the Imager documentation in section {ref28} for more information. generate a vector image of the object and return the image filename. See the Imager documentation in section {ref28} for more information. return the relative URL of the object. If the object actually creates a file, just the filename will be returned (e.g. ‘foo.html’). If the object is within a file, both the filename and the anchor will be returned (e.g. ‘foo.html#bar’). same as __unicode__(). invoke the rendering process on all of the child nodes. The rendering process includes walking through the child nodes, looking up the appropriate rendering method from the renderer, and calling the method with the child node as its argument. In addition to the actual rendering process, this method also prints out some status information about the rendering process. For example, if the node being rendered has a non-empty filename attribute, that means that the node is generating a new file. This filename information is printed to the log. One problem with this methodology is that the filename is not actually created at this time. It is assumed that the rendering method will check for the filename attribute and actually create the file. 6.7 plasTeX.Imagers — The plasImaging Framework{label28} {label41} The imager framework is used when an output format is incapable of representing part of a document natively. One example of this is equations in HTML. In cases like this you can use an Imager to generate images of the commands and environments that cannot be rendered in any other way. Currently, plascomes with several imager implementations based on {bf dvi2bitmap} (), {bf dvipng} (), and {bf ghostscript} with the PNG driver () called gspdfpng and gspspng, as well as one that uses OS X’s CoreGraphics library. Creating imagers based on other programs is quite simple, and more are planned for future releases. In addition to imagers that generate bitmap images, it is also possible to generate vector images using programs like dvisvg () or dvisvgm (). The Imager framework does all of its work in temporary directories the one requirement that it has is that Imager subclasses need to generate images with the basenames ‘img%d’ where ‘%d’ is the number of the image. The only requirement by the plasframework is that the imager class within the imager module is called “Imager” and should be installed in the plasTeX.Imagers package. The basename of the imager module is the name used when plaslooks for a specified imager. 6.7.1 Imager Objects Instantiate the imager class. document the document object that is being rendered. The Imager class is responsible for creating a document of requested images, compiling it, and generating images from each page in the document. specifies the converter that translates the output from the document compiler (e.g. PDF, DVI, PS) into images (e.g. PNG, JPEG, GIF). The only requirement is that the basename of each image is of the form ‘img%d’ where ‘%d’ is the number of the image. {bf Note:} This is a class attribute. Writing a renderer requires you to at least override the command that creates images. It can be as simple as the example below. import plasTeX.Imagers class DVIPNG(plasTeX.Imagers.Imager): """ Imager that uses dvipng """ command = 'dvipng -o img%d.png -D 110' specifies the document compiler (i.e. latex, pdflatex) command. {bf Note:} This is a class attribute. contains the “images” section of the document configuration. contains the file extension to use if no extension is supplied by the filename generator. contains a string template that will be used as a placeholder in the output document for the image height, width, and depth. These attributes cannot be determined in real-time because images are not generated until after the document has been fully rendered. This template generates a string that is put into the output document so that the image attributes can be post- processed in. For example, the default template (which is rather XML/HTML biased) is: &${filename}-${attr}; The two variables available are filename, the filename of the image, and attr, the name of the attr (i.e. width, height, or depth). contains a string template that will be used as a placeholder in the output document for the image units. This template generates a string that is put into the output document so that the image attribute units can be post- processed in. For example, the default template (which is rather XML/HTML biased) is: &${units); The only variable available is units and contains the CSS unit that was requested. The generate string will always occur immediately after the string generated by imageAttrs. dictionary that contains the Image objects corresponding to the requested images. The keys are the image filenames. callable iterator that generates filenames according to the filename template in the configuration. file object where the image document is written to. command that verifies the existence of the image converter on the current machine. If verification is not specified, the executable specified in command is executed with the {bf --help}. If the return code is zero, the imager is considered valid. If the return code is anything else, the imager is not considered valid. closes the generated document and starts the image generation routine. the method responsible for compiling the source. source is a file object containing the document. sets up the temporary environment for the image converter, then executes executeConverter. It also moves the generated images into their final location specified in the configuration. executes the command that converts the output from the compiler into image files. output is a file object containing the compiled output of the document. get an image for node in any way possible. The node is first checked to see if the imageoverride attribute is set. If it is, that image is copied to the image directory. If imageoverride is not set, or there was a problem in saving the image in the correct format, an image is generated using the source of node. invokes the creation of an image using the content in text. context is the code that sets up the context of the document. This generally includes the setting of counters so that counters used within the image code are correct. filename is an optional filename for the output image. Generally, image filenames are generated automatically, but they can be overridden with this argument. verifies that the command in command is valid for the current machine. The verify method returns True if the command will work, or False if it will not. writes the code to the generated document that creates the image content. filename is the final filename of the image. This is not actually used in the document, but can be handy for debugging. code is the code that an image is needed of. context is the code that sets up the context of the document. This generally includes the setting of counters so that counters used within the image code are correct. this method is called when the imager is instantiated and is used to write any extra information to the preamble. If overridden, the subclass needs to make sure that document.preamble.source is the first thing written to the preamble. 6.7.2 Image Objects Instantiate an Image object. Image objects contain information about the generated images. This information includes things such as width, height, filename, absolute path, etc. Images objects also have the ability to crop the image that they reference and return information about the baseline of the image that can be used to properly align the image with surrounding text. filename is the input filename of the image. config is the “images” section of the document configuration. width is the width of the image. This is usually extracted from the image file automatically. height is the height of the image. This is usually extracted from the image file automatically. alt is a text alternative of the image to be use by renderers such as HTML. depth is the depth of the image below the baseline of the surrounding text. This is generally calculated automatically when the image is cropped. longdesc is a long description used to describe the content of the image for renderers such as HTML. a text alternative of the image to be use by renderers such as HTML. the “images” section of the document’s configuration. the depth of the image below the baseline of the surrounding text. This is generally calculated automatically when the image is cropped. the filename of the image. the heigt of the image in pixels. a long description used to describe the content of the image for renderers such as HTML. the absolute path of the image file. the URL of the image. This may be used during rendering. the width of the image in pixels. crops the image so that the image edges are flush with the image content. It also sets the depth attribute of the image to the number of pixels that the image extends below the baseline of the surrounding text. A About This Document This document was writted using LaTeX (). The documents use macros written for documenting the Python () language and Python packages. Generating the PDF version of the document is simply a matter of using the {bf pdflatex} command. Generating the HTML version of the document, of course, uses plasTeX. The wonderful thing about the HTML version is that it was generated from the LaTeX source and Python style files without customization[fn8]! In fact, in its current state, plasTeX can generate the HTML versions of the Python documentation found on their website, . Without customization of plasTeX, the only remaining issues are that the module index is missing and there are some formatting differences. Not bad, considering plasTeX is doing actually expanding the LaTeX document natively. [fn8] Ok, there was one customization to \var for a whitespace issue, but the change works both in the PDF and HTML version B Frequently Asked Questions B.1 Parsing B.1.1 How can I make plaswork with my complicated macros? While plasmakes a valiant effort to expand all macros, it isn’t and may have problems if your macros are complicated. There are things that you can do to remedy the situation. If you are getting failures or warnings, you can do one of two things: 1) you can create a simplified version of the macro that plasuses for its work, while uses the more complicated one, or 2) you can implement the macro as a Python class. In the first solution, you can use the \ifplastex construct to wrap your plasand versions of the macros. You can even just remove parts of the macros. See the example below. % Print a double line, then bold the text. % In plasTeX, leave the lines out. \newcommand{\mymacro}[1]{\ifplastex\else\vspace*{1in}\fi\textbf{#1}} Depending on how complicated you macro is, you may want to implement it as a Python class instead of a macro. Using a Python class gives you full access to all of the plasinternal mechanisms to do whatever you need to do in your macro. To read more about writing Python class macros, see the section {ref5}. B.1.2 How can I get plasto find my packages? There are two types of packages that can be loaded by plas: 1) native packages, and 2) packages written entirely in Python. plas first looks for packages written in Python. Packages such as this are written specifically for plasand will yield better parsing performance as well as better looking output. Python-based packages are valid Python packages as well. So to load them, you must add the directory where your Python packages are to your PYTHONPATH environment variable. For more information about Python-based packages, see the section {ref23}. If you have a true package, plaswill try to locate it using the {bf kpsewhich} program just like does.