Copyright (c) Hyperion Entertainment and contributors.
Difference between revisions of "Expat Library"
m |
|||
Line 25: | Line 25: | ||
# Create a [[#The Character-Data Handler|character-data handler]] function to process XML character data events. |
# Create a [[#The Character-Data Handler|character-data handler]] function to process XML character data events. |
||
# Create a [[#Creating The Parser|parser instance]]. |
# Create a [[#Creating The Parser|parser instance]]. |
||
− | # [[#Parser Configuration|Configure the parser]] so that it knows of your |
+ | # [[#Parser Configuration|Configure the parser]] so that it knows of your handlers. |
# Open the XML file for reading, read data from the XML file into a buffer and call the parsing function (see [[#Parsing XML Files|Parsing XML Files]] below). |
# Open the XML file for reading, read data from the XML file into a buffer and call the parsing function (see [[#Parsing XML Files|Parsing XML Files]] below). |
||
− | Reading from the file is usually done in a loop. The file data is continuously fed into a fixed-size memory buffer and parsed, until the end of the document is reached. |
+ | Reading from the file is usually done in a loop. The file data is continuously fed into a fixed-size memory buffer and parsed, until the end of the document is reached. As the document is read in pieces, parsing can start before you have all the document (unlike with tree-based parsers). This also allows parsing really big documents that won't fit into memory. |
+ | |||
+ | Whenever the parser encounters an element's start tag, it will call your [[#The Start Element Handler|start element handler]] function. Whenever it encounters an element's end tag, it invokes your [[#The End Element Handler|end element handler]] function. Whenever text is found, the parser will call your [[#The Character-Data Handler|character-data handler]] function. And so on. |
||
As you can see, the parser itself doesn't do much: it just processes the XML file and calls the respective handler functions to react upon the individual events. All the grunt work is done in the handlers, outside of the library. Programmers design the handler functions to suit their particular needs, and process (or ignore) the received data as they find appropriate. |
As you can see, the parser itself doesn't do much: it just processes the XML file and calls the respective handler functions to react upon the individual events. All the grunt work is done in the handlers, outside of the library. Programmers design the handler functions to suit their particular needs, and process (or ignore) the received data as they find appropriate. |
||
− | It should also be noted that the Expat Library provides just the parser, so ''it cannot be used for writing'' XML |
+ | It should also be noted that the Expat Library provides just the parser, so '''it cannot be used for writing XML files'''. Neither can it be used for validating XML documents against a DTD. |
+ | |||
+ | = Installation = |
||
+ | |||
+ | If the library file is not installed in your LIBS: drawer, and/or your SDK is missing the necessary includes, download the Expat package from [http://www.os4depot.net/index.php?function=showfile&file=development/library/misc/expat.lha OS4depot.net] and copy the files to their relevant directories: |
||
+ | |||
+ | * the entire contents of the SDK directory goes to SDK: |
||
+ | * the "Workbench/Libs/expat.library" file goes to LIBS: |
||
+ | * the "Workbench/SObjs/libexpat.so" file goes to SOBJS: (should you need the shared object version as well). |
||
+ | |||
+ | No further setup is required. To use the shared Amiga library, you must include the following header file at the beginning of your code: |
||
+ | |||
+ | <syntaxhighlight> |
||
+ | #include <proto/expat.h> |
||
+ | </syntaxhighlight> |
||
= Library Opening Chores = |
= Library Opening Chores = |
||
Line 55: | Line 71: | ||
= Handler Functions = |
= Handler Functions = |
||
− | Handler functions are custom functions that get invoked automatically whenever the parser encounters an XML event in the stream. Obvious events include XML elements and character data but there are, of course, more: comments, XML document declarations, namespace declarations, CDATA sections, unparsed entities (NDATA) etc. For each type of event you are interested in, a handler function must be written. How you write them entirely depends on what you intend to do. |
+ | Handler functions are custom callback functions that get invoked automatically whenever the parser encounters an XML event in the stream. Obvious events include XML elements and character data but there are, of course, more: comments, XML document declarations, namespace declarations, CDATA sections, unparsed entities (NDATA) etc. For each type of event you are interested in, a handler function must be written. How you write them entirely depends on what you intend to do. |
− | + | Whereas handler function names are arbitrary (you can choose any name), the parameter list must follow precise definition and is different for each type of handler (see the "libraries/expat.h" include file for the definitions). |
|
You can pass [[#User Data Setting|user data]] to and between handlers, be it a simple variable or a complex data structure. This data can then be accessed from within the handler via the ''userData'' pointer that is provided when the handler function gets called. It is commonly used for tracking the process (parsing state), storing intermediate values, or passing global data to handlers. Never use global variables for that! |
You can pass [[#User Data Setting|user data]] to and between handlers, be it a simple variable or a complex data structure. This data can then be accessed from within the handler via the ''userData'' pointer that is provided when the handler function gets called. It is commonly used for tracking the process (parsing state), storing intermediate values, or passing global data to handlers. Never use global variables for that! |
||
Line 99: | Line 115: | ||
== Creating The Parser == |
== Creating The Parser == |
||
− | When you have written all handlers you'll need for your particular application, it's time to create the parser (or ''a parser instance'' as we say, because more than one can be used). |
+ | When you have written all handlers you'll need for your particular application, it's time to create the parser (or ''a parser instance'' as we say, because more than one can be used). It takes a single function call: |
+ | |||
+ | <syntaxhighlight> |
||
+ | XML_Parser parser = NULL; |
||
+ | |||
+ | parser = XML_ParserCreate(NULL); |
||
+ | |||
+ | if (!parser) |
||
+ | { |
||
+ | /* Error: the parser instance couldn't be created for some reason. */ |
||
+ | } |
||
+ | </syntaxhighlight> |
||
+ | |||
+ | This function creates a simple XML parser with no namespace support. The function's only parameter is the name of the character encoding that is used in the document you want to parse. If the parameter is NULL (as in the example above), the parser assumes that the document uses one of the '''built-in encodings:''' |
||
+ | |||
+ | * US-ASCII |
||
+ | * UTF-8 |
||
+ | * UTF-16 |
||
+ | * ISO-8859-1 |
||
+ | |||
+ | Supplying a non-NULL parameter for the encoding (for example, "UTF-8") will override the XML document encoding declaration. This is rarely used – normally the parameter is NULL. |
||
+ | |||
+ | Please note that if you specify a different encoding (for example, "ISO-8859-2", which is used in many Central/Eastern European countries), it will '''not''' make Expat support it! The only supported encodings are the four built-in ones above. Refer to the [[#Unknown Encodings|Unknown Encodings]] section in the Advanced Use chapter below to learn how to handle documents with a non-standard character encoding. |
||
== Parser Configuration == |
== Parser Configuration == |
||
Line 110: | Line 148: | ||
== Parsing XML Files == |
== Parsing XML Files == |
||
+ | |||
+ | = Advanced Use = |
||
+ | |||
+ | == Namespace Processing == |
||
+ | |||
+ | == Unknown Encodings == |
||
= Function Reference = |
= Function Reference = |
Revision as of 18:58, 20 October 2013
Contents
Introduction
Expat is a fast and resource-efficient XML parser written by James Clark. On AmigaOS it is implemented in various flavours (static link library / shared object / shared Amiga library). This documentation specifically focuses on, and provides code examples for, the shared Amiga library version.
XML Parsing Basics
As far as XML document parsing is concerned, there are basically two kinds of parsers:
- Tree-based parsers, which process the entire XML file and build a tree structure representing the elements and other constructs in the document. An example of a tree-based parser is libxml2, which is also available for AmigaOS.
- Stream-oriented (event-driven) parsers, which process the XML file as a continuous stream and produce an event each time the parser encounters an XML element or character data. Expat is an example of an event-driven parser.
Tree-based parsers are really comfortable to work with: the parser reconstructs the entire document structure and contents for you. You are also provided with functions to search in the document, find data, add or modify the contents etc. Event-driven parsers are, on the other hand, much more basic. They require setup and generally more work on the part of the programmer.
However, tree-based parsers are rather taxing resource-wise. Parsing the document takes longer and uses up a considerable amount of memory. Implementations also tend to be bulky: for example, the current AmigaOS static library implementation of libxml2 is bigger than 5MB – which is a preposterous file size overhead added to your program only to provide it with a parser! Event-driven parsers may offer fewer bells and whistles but they are much smaller (about 300KB in all AmigaOS Expat implementations, static or shared) and faster. In order to keep the spirit of Amiga software, you'll quite naturally want to use Expat as a well-proven and efficient parser, preferably in its shared Amiga library incarnation.
Among other things that speak in favour of event-driven parsers is the fact that when working with XML files, reconstructing the complete document tree structure is not always necessary. Quite often you're just interested in particular data that is stored in particular elements. A parser like Expat can then be used to process (react upon) events only concerning the parts of the document you're interested in. But even if you do need the entire tree structure, for whatever purpose or merely for the comfort, there is no reason to give up on Expat. As tree-based parsers are typically built on top of event-driven parsers, you can use Expat to build your own XML data representation. It means work but you can tailor the procedure to your own needs, providing perhaps less sophisticated but still adequate representation, without the extra overhead libxml2 would incur.
How To Use The Library
Depending on your particular aim and purpose, using the AmigaOS Expat Library entails at least the following basic steps (they will all be discussed in more detail further on):
- Open the library and obtain its interface (see Library Opening Chores below).
- Create two element handler functions to process XML element events.
- Create a character-data handler function to process XML character data events.
- Create a parser instance.
- Configure the parser so that it knows of your handlers.
- Open the XML file for reading, read data from the XML file into a buffer and call the parsing function (see Parsing XML Files below).
Reading from the file is usually done in a loop. The file data is continuously fed into a fixed-size memory buffer and parsed, until the end of the document is reached. As the document is read in pieces, parsing can start before you have all the document (unlike with tree-based parsers). This also allows parsing really big documents that won't fit into memory.
Whenever the parser encounters an element's start tag, it will call your start element handler function. Whenever it encounters an element's end tag, it invokes your end element handler function. Whenever text is found, the parser will call your character-data handler function. And so on.
As you can see, the parser itself doesn't do much: it just processes the XML file and calls the respective handler functions to react upon the individual events. All the grunt work is done in the handlers, outside of the library. Programmers design the handler functions to suit their particular needs, and process (or ignore) the received data as they find appropriate.
It should also be noted that the Expat Library provides just the parser, so it cannot be used for writing XML files. Neither can it be used for validating XML documents against a DTD.
Installation
If the library file is not installed in your LIBS: drawer, and/or your SDK is missing the necessary includes, download the Expat package from OS4depot.net and copy the files to their relevant directories:
- the entire contents of the SDK directory goes to SDK:
- the "Workbench/Libs/expat.library" file goes to LIBS:
- the "Workbench/SObjs/libexpat.so" file goes to SOBJS: (should you need the shared object version as well).
No further setup is required. To use the shared Amiga library, you must include the following header file at the beginning of your code:
#include <proto/expat.h>
Library Opening Chores
Just like other AmigaOS libraries, the Expat Library must be opened and its interface obtained before you can use it:
struct Library *ExpatBase = NULL; struct ExpatIFace *IExpat = NULL; if ( (ExpatBase = IExec->OpenLibrary("expat.library", 53)) ) { IExpat = (struct ExpatIFace *) IExec->GetInterface(ExpatBase, "main", 1, NULL); } if ( !ExpatBase || !IExpat ) { /* handle library opening error */ }
Handler Functions
Handler functions are custom callback functions that get invoked automatically whenever the parser encounters an XML event in the stream. Obvious events include XML elements and character data but there are, of course, more: comments, XML document declarations, namespace declarations, CDATA sections, unparsed entities (NDATA) etc. For each type of event you are interested in, a handler function must be written. How you write them entirely depends on what you intend to do.
Whereas handler function names are arbitrary (you can choose any name), the parameter list must follow precise definition and is different for each type of handler (see the "libraries/expat.h" include file for the definitions).
You can pass user data to and between handlers, be it a simple variable or a complex data structure. This data can then be accessed from within the handler via the userData pointer that is provided when the handler function gets called. It is commonly used for tracking the process (parsing state), storing intermediate values, or passing global data to handlers. Never use global variables for that!
Element Handlers
In XML, an element is enclosed between a start tag and an end tag. They are reported as separate events, so you need to provide two separate functions to handle them:
The Start Element Handler
This handler function is invoked when the parser encounters an element's start tag.
void start_handler(void *userData, const XML_Char *name, const XML_Char **attrs) { }
The End Element Handler
This handler function is invoked when the parser encounters an element's closing tag.
void end_handler(void *userData, const XML_Char *name) { }
The Character-Data Handler
This handler function is invoked when the parser encounters character data (i.e. a text string that is enclosed within an element but is not itself a tag). Please note that Expat may produce several events (and thus call the character-data handler several times successively) before it processes all text within a single element. You can never assume that element text will be processed in one go!
void chardata_handler(void *userData, const XML_Char *string, int length) { }
Parsing
Creating The Parser
When you have written all handlers you'll need for your particular application, it's time to create the parser (or a parser instance as we say, because more than one can be used). It takes a single function call:
XML_Parser parser = NULL; parser = XML_ParserCreate(NULL); if (!parser) { /* Error: the parser instance couldn't be created for some reason. */ }
This function creates a simple XML parser with no namespace support. The function's only parameter is the name of the character encoding that is used in the document you want to parse. If the parameter is NULL (as in the example above), the parser assumes that the document uses one of the built-in encodings:
- US-ASCII
- UTF-8
- UTF-16
- ISO-8859-1
Supplying a non-NULL parameter for the encoding (for example, "UTF-8") will override the XML document encoding declaration. This is rarely used – normally the parameter is NULL.
Please note that if you specify a different encoding (for example, "ISO-8859-2", which is used in many Central/Eastern European countries), it will not make Expat support it! The only supported encodings are the four built-in ones above. Refer to the Unknown Encodings section in the Advanced Use chapter below to learn how to handle documents with a non-standard character encoding.
Parser Configuration
Before you can start parsing, the parser instance must be properly configured. This typically entails setting all handlers (so that the parser knows which particular function to call when it encounters an event) and providing a pointer to the user data we want to carry around.