Copyright (c) Hyperion Entertainment and contributors.
Difference between revisions of "Expat Library"
Line 55: | Line 55: | ||
= Handler Functions = |
= Handler Functions = |
||
− | Handler functions are functions that get invoked automatically |
+ | Handler functions are custom functions that get invoked automatically whenever the parser encounters an XML event in the stream. Obvious events include XML elements and character data but there are, of course, more: comments, XML document declarations, namespace declarations, CDATA sections, unparsed entities (NDATA) etc. For each type of event you are interested in, a handler function must be written. How you write them entirely depends on what you intend to do. |
+ | |||
+ | While handler function names are arbitrary (you can choose any name), the parameter list must follow precise definition and is different for each type of handler (see the "libraries/expat.h" include file for the definitions). |
||
+ | |||
+ | You can pass user data to and between handlers, be it a simple variable or a complex data structure. This data can then be accessed from within the handler via the ''userData'' pointer that is provided when the handler function gets called. It is commonly used for tracking the parsing process, storing intermediate values, or passing global data to handlers. Never use global variables for that! |
||
== Element Handlers == |
== Element Handlers == |
||
Line 62: | Line 66: | ||
=== The Start Element Handler === |
=== The Start Element Handler === |
||
+ | |||
+ | This handler function is invoked when the parser encounters an element's start tag. |
||
+ | |||
+ | <syntaxhighlight> |
||
+ | void start_handler(void *userData, const XML_Char *name, const XML_Char **attrs) |
||
+ | { |
||
+ | } |
||
+ | </syntaxhighlight> |
||
=== The End Element Handler === |
=== The End Element Handler === |
||
+ | |||
+ | This handler function is invoked when the parser encounters an element's closing tag. |
||
+ | |||
+ | <syntaxhighlight> |
||
+ | void end_handler(void *userData, const XML_Char *name) |
||
+ | { |
||
+ | } |
||
+ | </syntaxhighlight> |
||
== The Character-Data Handler == |
== The Character-Data Handler == |
||
+ | |||
+ | This handler function is invoked when the parser encounters character data (i.e. a text string that is enclosed within an element but is not itself a tag). Please note that Expat may produce several events (and thus call the character-data handler several times successively) before it processes all text within a single element. You can never assume that element text will be processed in one go! |
||
+ | |||
+ | <syntaxhighlight> |
||
+ | void chardata_handler(void *userData, const XML_Char *string, int length) |
||
+ | { |
||
+ | } |
||
+ | </syntaxhighlight> |
||
= Parsing = |
= Parsing = |
Revision as of 16:13, 20 October 2013
Contents
Introduction
Expat is a fast and resource-efficient XML parser written by James Clark. On AmigaOS it is implemented in various flavours (static link library / shared object / shared Amiga library). This documentation specifically focuses on, and provides code examples for, the shared Amiga library version.
XML Parsing Basics
As far as XML document parsing is concerned, there are basically two kinds of parsers:
- Tree-based parsers, which process the entire XML file and build a tree structure representing the elements and other constructs in the document. An example of a tree-based parser is libxml2, which is also available for AmigaOS.
- Stream-oriented (event-driven) parsers, which process the XML file as a continuous stream and produce an event each time the parser encounters an XML element or character data. Expat is an example of an event-driven parser.
Tree-based parsers are really comfortable to work with: the parser reconstructs the entire document structure and contents for you. You are also provided with functions to search in the document, find data, add or modify the contents etc. Event-driven parsers are, on the other hand, much more basic. They require setup and generally more work on the part of the programmer.
However, tree-based parsers are rather taxing resource-wise. Parsing the document takes longer and uses up a considerable amount of memory. Implementations also tend to be bulky: for example, the current AmigaOS static library implementation of libxml2 is bigger than 5MB – which is a preposterous file size overhead added to your program only to provide it with a parser! Event-driven parsers may offer fewer bells and whistles but they are much smaller (about 300KB in all AmigaOS Expat implementations, static or shared) and faster. In order to keep the spirit of Amiga software, you'll quite naturally want to use Expat as a well-proven and efficient parser, preferably in its shared Amiga library incarnation.
Among other things that speak in favour of event-driven parsers is the fact that when working with XML files, reconstructing the complete document tree structure is not always necessary. Quite often you're just interested in particular data that is stored in particular elements. A parser like Expat can then be used to process (react upon) events only concerning the parts of the document you're interested in. But even if you do need the entire tree structure, for whatever purpose or merely for the comfort, there is no reason to give up on Expat. As tree-based parsers are typically built on top of event-driven parsers, you can use Expat to build your own XML data representation. It means work but you can tailor the procedure to your own needs, providing perhaps less sophisticated but still adequate representation, without the extra overhead libxml2 would incur.
How To Use The Library
Depending on your particular aim and purpose, using the AmigaOS Expat Library entails at least the following basic steps (they will all be discussed in more detail further on):
- Open the library and obtain its interface (see Library Opening Chores below).
- Create two element handler functions to process XML element events.
- Create a character-data handler function to process XML character data events.
- Create a parser instance.
- Configure the parser so that it knows of your element and character-data handlers.
- Open the XML file for reading, read data from the XML file into a buffer and call the parsing function (see Parsing XML Files below).
Reading from the file is usually done in a loop. The file data is continuously fed into a fixed-size memory buffer and parsed, until the end of the document is reached. Whenever the parser encounters an element's start tag, it will call your start element handler function. Whenever it encounters an element's end tag, it invokes your end element handler function. Whenever text is found, the parser will call your character-data handler function.
As you can see, the parser itself doesn't do much: it just processes the XML file and calls the respective handler functions to react upon the individual events. All the grunt work is done in the handlers, outside of the library. Programmers design the handler functions to suit their particular needs, and process (or ignore) the received data as they find appropriate.
It should also be noted that the Expat Library provides just the parser, so it cannot be used for writing XML files.
Library Opening Chores
Just like other AmigaOS libraries, the Expat Library must be opened and its interface obtained before you can use it:
struct Library *ExpatBase = NULL; struct ExpatIFace *IExpat = NULL; if ( (ExpatBase = IExec->OpenLibrary("expat.library", 53)) ) { IExpat = (struct ExpatIFace *) IExec->GetInterface(ExpatBase, "main", 1, NULL); } if ( !ExpatBase || !IExpat ) { /* handle library opening error */ }
Handler Functions
Handler functions are custom functions that get invoked automatically whenever the parser encounters an XML event in the stream. Obvious events include XML elements and character data but there are, of course, more: comments, XML document declarations, namespace declarations, CDATA sections, unparsed entities (NDATA) etc. For each type of event you are interested in, a handler function must be written. How you write them entirely depends on what you intend to do.
While handler function names are arbitrary (you can choose any name), the parameter list must follow precise definition and is different for each type of handler (see the "libraries/expat.h" include file for the definitions).
You can pass user data to and between handlers, be it a simple variable or a complex data structure. This data can then be accessed from within the handler via the userData pointer that is provided when the handler function gets called. It is commonly used for tracking the parsing process, storing intermediate values, or passing global data to handlers. Never use global variables for that!
Element Handlers
In XML, an element is enclosed between a start tag and an end tag. They are reported as separate events, so you need two separate functions to handle them:
The Start Element Handler
This handler function is invoked when the parser encounters an element's start tag.
void start_handler(void *userData, const XML_Char *name, const XML_Char **attrs) { }
The End Element Handler
This handler function is invoked when the parser encounters an element's closing tag.
void end_handler(void *userData, const XML_Char *name) { }
The Character-Data Handler
This handler function is invoked when the parser encounters character data (i.e. a text string that is enclosed within an element but is not itself a tag). Please note that Expat may produce several events (and thus call the character-data handler several times successively) before it processes all text within a single element. You can never assume that element text will be processed in one go!
void chardata_handler(void *userData, const XML_Char *string, int length) { }