The libxml2 Library

From libxml2 Wiki

Jump to: navigation, search

Contents

Introduction

This document is intended to be the canonical manual for the libxml2 library. The working copy of this document is managed as a wiki at http://wiki.xmlsoft.org/. Snapshots of this wiki are periodically made, proof-read and included in the libxml2 distribution. The document covers mainly the native C interface of the library, but has a brief introduction to the official Python bindings as well as pointers to other third party language bindings.

What is libxml2?

libxml2 is the XML C parser and toolkit. Originally developed for the GNOME project, it's perfectly usable outside of the GNOME platform and does not depend on any of its libraries. It is Free Software available under the MIT License. XML itself is a meta language to design markup languages, which in turn are text languages where semantic and structure are added to the content using extra "markup" information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C, a variety of language bindings make it available in other environments.

And what is it not?

TODO: Sum up the various ways in which people have misunderstood the purpose and scope of libxml2 ;)

libxml2 Architecture

TODO: Daniel, maybe you would like the share a little text about the design decisions behind libxml2 here, when you have time?

Getting Started

In this section we will discuss how to install libxml2 on your platform and how get started developing using the library.

Installing libxml2

Below are installation instructions for some of the more common platforms, as well as instructions on how to compile the library from source.

Packages

Binary or source packages of libxml2 are available on most common platforms. Here's installation instructions for some of them.

Linux

Below are instructions for some of the more common Linux distributions.

Ubuntu / Debian

libxml2 development packages are included in the main repositories on Debian and its derivatives (Ubuntu, Kubuntu, Xubuntu, Edubuntu), and can be installed by running:

apt-get install libxml2-dev

If you want the library with debugging symbols included, run this instead:

apt-get install libxml2-dbg
OpenSUSE
Fedora
Gentoo

libxml2 can be installed on Gentoo by running:

emerge libxml2

Setup your USE-flags to enable or disable certain functionality (ipv6, python, debug et.c.).

FreeBSD

libxml2 is included in the FreeBSD ports system and can be installed by running:

cd /usr/ports/textproc/libxml2
make install

Interesting options for developers are WITH_MEM_DEBUG and WITH_THREAD_ALLOC, but see the port Makefile for up to date information.

OpenBSD

libxml2 is included in the OpenBSD ports system and can be installed by running:

cd /usr/ports/textproc/libxml
make install

Solaris / OpenSolaris

MacOS X

Windows

Igor Zlatkovic is the maintainer of the Windows port of libxml2, and he provides binaries for download from his site. The site also has detailed instructions on how to install the packages.

Building From Source

libxml2 compiles and runs on a multitude of platforms. Here are some general instructions on how to build the libxml2 source, either from a release tarball, from the development snapshot or from the Subversion development version.

Prerequisites

All that is required to build libxml2 on a supported platform is a decent ANSI C compiler such as GCC or MSVC. The configuration process will however detect and use a couple of optional libraries if found:

  • libz - A highly portable and widely available compression library.
  • iconv - A powerful character encoding conversion library.


Note Note: libiconv is included by default in recent versions of the GNU C Library, so it doesn't need to be installed specifically on Linux [1]. The GNU implementation can be downloaded from here.

Download libxml2

The libxml2 source code can be downloaded either as a tarball of the current stable version, as a snapshot of the development version or from Subversion. In almost all cases you want the stable release. Only download the snapshot or from Subversion if you intend to work on the library itself or if you want to see if a bug has been fixed, and be aware that it's in development and might be prone to crashing or might not even run at all.

Stable Version

The current stable version of the libxml2 source code can be downloaded using FTP or RSync:

Using FTP

The downloads are available from the main libxml2 FTP or the GNOME FTP servers:

Using Rsync

The files are also available over RSync from the main xmlsoft.org site. Here's an example of how to download the 2.7.2 tarball using RSync:

rsync -avz rsync://xmlsoft.org/ftp/libxml2-2.7.2.tar.gz .
Snapshot

A snapshot of the current development version of libxml2 is updated hourly and can be downloaded from the main xmlsoft.org site.

Subversion

libxml2 uses Subversion as its version control system. To checkout the SVN trunk version of libxml2, issue the following command:

svn co http://svn.gnome.org/svn/libxml2/trunk libxml2

For more information about using Subversion, see Version Control with Subversion.

Configuring

Before starting the build, the source must be configured for your platform.

UNIX-like

To configure the source for building on a UNIX-like platform [2], use the configure script found in the root directory. Run the following to get a list of supported flags and arguments:

./configure --help

An example is ./configure --prefix=/home/joe/libxml2 to configure libxml2 for installation to /home/joe/libxml2.

Note Note: When building from Subversion, you will first need to generate the configure script by running ./autogen.sh. You will need to have GNU autoconf, automake and libtool installed for this to work.
Windows

To configure the source for building on Microsoft Windows, use the JavaScript configure script instead. Run the following from the win32/ directory to get a list of supported flags and arguments:

cscript configure.js help

An example is cscript configure.js compiler=msvc prefix=c:\opt include=c:\opt\include lib=c:\opt\lib debug=yes to configure libxml2 for installation to c:\opt and building using the MSVC compiler with debugging turned on and passing some additional include and library directories to the build process. See the file win32\Readme.txt for more information.

Building

Linux / BSD / MacOS X

To start the build process on a UNIX-like platform, simply type:

make
Note Note: On BSD the GNU make command might be called gmake instead of make. Make sure that you call the GNU make command and not BSD make!
Windows

To start the build on a Microsoft Windows system using the MSVC compiler, type:

nmake

in the win32/ directory.

Installing

Linux / BSD / MacOS X

Install the library to the configured location by typing:

make install
Windows

On windows, install the library to the configured location by typing:

nmake install

That's it. You're now ready to start developing using libxml2. In the next section we'll go over a minimal example to get you started.

Quick Start

Here's a very brief and basic introduction to using the libxml2 library.

Hello World

Let's start with a very simple "Hello World"-type example of libxml2 usage.

#include <stdio.h>
#include <libxml/tree.h>
#include <libxml/parser.h>

We begin by including a few required header files. In addition to the stdio.h from the standard C library, we include libxml/tree.h to get access to the Tree API and libxml/parser.h for the core parser module of libxml2.

int main (int argc, char *argv[])
{
    LIBXML_TEST_VERSION

Next up is a call to the LIBXML_TEST_VERSION macro. This macro must always be called once, and only once, in the main thread of execution. It will initialize the data structures of the libxml2 library and check for any ABI mis-matches between the library the program was compiled compiled against and the one it is running with.

    xmlChar *xml = "<message>Hello World!</message>";

We go on by defining a short xmlChar* string containing a simple XML document.

    xmlDocPtr doc = xmlParseMemory(xml, 31);
    if (doc == NULL)
        return(1);

This is where the action begins. Here we define an xmlDocPtr and assigns in the return value of xmlParseMemory, to which we pass our 31 byte XML document as argument. xmlParseMemory is part of the core parser module of libxml2 and will parse XML from an in-memory sequence of NULL-terminated UTF-8 bytes. We check for the NULL condition as it indicates an error during parsing.

    xmlNodePtr root = xmlDocGetRootElement(doc);
    if (root == NULL)
        return(1);

Next up, we define an xmlNodePtr and assigns it the return value of xmlDocGetRootElement, which we pass our xmlDocPtr doc as argument. xmlDocGetRootElement will give us the root element of the given XML document, in this case the message element.

    printf("%s\n", xmlNodeGetContent(root));

Here we print the contents of the root element, which happens to be the text Hello World, to standard output by calling xmlNodeGetContent and passing in our root element xmlNodePtr as argument.

    xmlFreeDoc(doc);
    xmlCleanupParser();
 
    return(0);
}

We then finish off the program by calling xmlFreeDoc, passing the doc xmlDocPtr as argument, to free all resources associated with the XML document and xmlCleanupParser to free any additional memory held by the libxml2 library. Every program that uses libxml2 must call xmlCleanupParser once, and only once, in the main thread of execution of the program. Same as with the LIBXML_TEST_VERSION macro.

Here's the full source of the program:

#include <stdio.h>
#include <libxml/tree.h>
#include <libxml/parser.h>
 
int main (int argc, char *argv[])
{
    LIBXML_TEST_VERSION
 
    xmlChar *xml = "<message>Hello World!</message>";
 
    xmlDocPtr doc = xmlParseMemory(xml, 31);
    if (doc == NULL)
        return(1);
 
    xmlNodePtr root = xmlDocGetRootElement(doc);
    if (root == NULL)
        return(1);
 
    printf("%s\n", xmlNodeGetContent(root));
 
    xmlFreeDoc(doc);
    xmlCleanupParser();
 
    return(0);
}

Building Your Program

You're now ready to build your program. For this first example we'll show instructions for building the code. The procedure is pretty much the same for all examples in this document, so they will be left out from now on.

Using GCC

To build the source using GCC on a UNIX-like platform, in a Bourne-like shell, just type:

gcc -o example `xml2-config --libs --cflags` example.c

An executable named example will be generated in the current directory.

Note Note: xml2-config is a script installed by libxml2 that will give you the correct compiler and linker flags for your platform. If your platform has pkg-config installed, you can also use pkg-config --libs --cflags libxml-2.0. Note also that the back ticks for invoking a sub-process might not be supported by all shells.

Using MSVC

For building using MSVC on Microsoft Windows, the instructions are a bit different.

TODO: Can someone who knows fill this in?

An executable named example.exe will be generated in the current directory.

The libxml2 APIs

Choosing the Right API

TODO

The Tree API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO: The idea for these "Common Problems" sections is to maybe make an analysis of the libxml2 mailing list archives and extract frequent problems that people have been having. The prime example would be e.g. the issues with understanding XPath in conjunction with namespaces a lot of people seems to have.

The Reader API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

The SAX2 API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

The HTML API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

XPath API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

XPointer API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

XInclude API

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

Utility APIs

String Handling

TODO

HTTP / FTP

TODO

URI Handling

TODO

XML and SGML Catalogs

TODO

String Dictionaries

TODO

Hash Tables

TODO

Putting It All Together

TODO

Validation

Introduction

TODO

DTDs

TODO

XML Schemas

TODO

RelaxNG

TODO

Schematron

TODO

Namespaces

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

Error Handling

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

I/O Handling

Introduction

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

Threads

Introduction

TODO

Thread Safety

TODO

Basic Example

TODO

Complex Example

TODO

Common Problems

TODO

Language Bindings

The libxml2 library includes bindings for the Python language, but there are many third party bindings for other languages. Here's some information about some of them.

Perl

TODO

PHP

PHP has several different extensions using libxml2, such as XMLReader, XMLWriter and SAX Parsing, in fact almost all XML extensions for PHP are based on libxml2. For more information about XML extensions in PHP see the XML Manipulation section of the PHP manual.

Python

Official Bindings

TODO

lxml Library

TODO

Ruby

TODO

Notes


Appendix

A See Also

TODO

B Acknowledgements

  • Daniel Veillard - For his tireless work with developing and maintaining libxml2 for so many years.
Personal tools