http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Installation

API Docs
Samples
Programming
FAQs

Releases
Caveats
Feedback

Questions
 

Answers
 
Why does my application crash on AIX when I run it under a multi-threaded environment?
 

AIX maintains two kinds of libraries on the system, thread-safe and non-thread safe. Multi-threaded libraries on AIX follow a different naming convention, Usually the multi-threaded library names are followed with "_r". For example, libc.a is single threaded whereas libc_r.a is multi-threaded.

To make your multi-threaded application run on AIX, you MUST ensure that you do not have a 'system library path' in your LIBPATH environment variable when you run the application. The appropriate libraries (threaded or non-threaded) are automatically picked up at runtime. An application usually crashes when you build your application for multi-threaded operation but don't point to the thread-safe version of the system libraries. For example, LIBPATH can be simply set as:

LIBPATH=$HOME/<Xerces>/lib

Where <Xerces> points to the directory where Xerces application resides.

If for any reason, unrelated to Xerces, you need to keep a 'system library path' in your LIBPATH environment variable, you must make sure that you have placed the thread-safe path before you specify the normal system path. For example, you must place /lib/threads before /lib in your LIBPATH variable. That is to say your LIBPATH may look like this:

export LIBPATH=$HOME/<Xerces>/lib:/usr/lib/threads:/usr/lib

Where /usr/lib is where your system libraries are.


What compilers are being used on the supported platforms?
 

Xerces has been built on the following platforms with these compilers

Operating System  Compiler 
Windows NT SP5/98  MSVC 6.0 
Redhat Linux 6.0  gcc 
AIX 4.1.4 and higher  xlC 3.1 
Solaris 2.6  CC version 4.2 
HP-UX B10.2  aCC and CC 
HP-UX B11  aCC and CC 

I cannot run my sample applications. What is wrong?
 

There are two major installation issues which must be dealt with in order to use Xerces from your applications. The DLL or shared library must be locatable via the system's environment. And, the converter files used by Xerces for its transcoding must be locatable.

On UNIX platforms you need to ensure that your library search environment variable includes the directory which has the Xerces shared library (On AIX, this is LIBPATH, on Solaris and Linux it is LD_LIBRARY_PATH while on HP-UX it is SHLIB_PATH). Thus, if you installed your binaries under $HOME/fastxmlparser, you need to point your library path to that directory.

export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX)

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux)

export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)

On Win32, you would ensure that the Xerces DLLs are in the PATH environment.

For the transcoding files (*.cnv), the easiest mechanism (which is used in the binary release) is to place them relative to the shared library or DLL. The transcoding converter files should be in the icu/data directory relative to the shared library or DLL. This will allow them to be located automatically.

However, if you redistribute Xerces within some other product, and cannot maintain this relationship, or if your build scenario does not allow you to maintain this relationship during debugging for instance, you can use the ICU_DATA environment variable to point to these converter files (make sure the variable ends with a backslash '\' on Windows platforms). This variable may be set system wide, within a particular command window, or just within the client application or higher level libraries, as is deemed necessary. It must be set before the XML system is initialized (see below.)


I just built my own application using the Xerces parser. Why does it crash?
 

In order to work with the Xerces parser, you have to first initialize the XML subsystem. The most common mistake is to forget this initialization. Before you make any calls to Xerces APIs, you must call

    XMLPlatformUtils::Initialize():
    try {
        XMLPlatformUtils::Initialize();
    }
    catch (const XMLException& toCatch) {
        // Do your failure processing here
    }

This initializes the Xerces system and sets its internal variables. Note that you must the include util/PlatformUtils.hpp file for this to work.

The second common problem is the absence of the transcoding converter files (*.cnv). This problem has a simple fix, if you understand how the transcoding converter files are searched.

Xerces first looks for the environment variable ICU_DATA. If it finds this variable in your environment settings, then it assumes that the transcoding converter files are kept in that directory. Thus, for example, if you had set your environment variable to (say):

set ICU_DATA=d:\myXerces\icu\data\
      

The transcoding converter files (all files having extension .cnv and convrtrs.txt) will be searched under d:\myXerces\icu\data

If you have not set your environment variable, then the search for the transcoding converters is done relative to the location of the shared library xerces-c_1_0.dll (or libxerces-c1_0.a on AIX and libxerces-c1_0.so on Solaris and Linux, libxerces-c1_0.sl on HP-UX). Thus if your shared library is in d:\fastxmlparser\lib, then your transcoding converter files should be in d:\fastxmlparser\lib\icu\data.

Before you run your application, make sure that you have covered the two possibilities mentioned above.


How do I use VisualAge for Windows with Xerces
 

IBM VisualAge Xerces for Windows Build Requirements

  1. VisualAge C++ Version 4.0 with Fixpak 1: Download the Fixpak from the IBM VisualAge C++ Professional home page at http://www.software.ibm.com/ad/VisualAge_c++/service/csd.html
  2. ICU Build: You should have the ICU Library in the same directory as the Xerces library. For example if Xerces is at the top level of the d drive, put the ICU library also at the top level of d drive e.g. d:/Xerces, d:/icu.

Instructions

NoteThese instructions assume that you are installing in d:\xerces-c-src-1_0 Replace d with the appropriate drive letter.
  1. Change directory to d:\xerces-c-src-1_0\Projects\Win32
  2. If a d:\xerces-c-src-1_0\Project\Win32\VACPP40 directory does not exist, create it.
  3. Copy the IBM VisualAge project file, XercesX.icc, to the VACPP40 directory. (If it is not provided in your download, request it from <xerces-dev@xml.apache.org >.
  4. From the VisualAge main menu enter the project file name and path.
  5. When the build finishes the status bar should display the message: Last Compile completed Successfully with warnings on date.

How do I use VisualAge for OS/2 with Xerces
 

IBM VisualAge Xerces/C++ for Windows Build Requirements

  1. VisualAge C++ Version 4.0 with Fixpak 1: Download the Fixpak from the IBM VisualAge C++ Professional home page at http://www.software.ibm.com/ad/VisualAge_c++/service/csd.html
  2. ICU Build: You should have the ICU Library in the same directory as the Xerces library. For example if Xerces is at the top level of the d drive, put the ICU library also at the top level of d drive e.g. d:/Xerces, d:/icu.

Instructions

NoteThese instructions assume that you install in drive d:\xerces-c-src-1_0. Replace d with the appropriate drive letter.
  1. Change directory to d:\xerces-c-src-1_0\Projects\Win32
  2. If a d:\xerces-c-src-1_0\Project\OS2\VACPP40 directory does not exist, create it.
  3. Copy the IBM VisualAge project file, XercesX.icc, to the VACPP40 directory. (If it is not provided in your download, request it from <xerces-dev@xml.apache.org >.
  4. From the VisualAge main menu enter the project file name and path.
  5. When the build finishes the status bar displays this message: Last Compile completed Successfully with warnings on date.

How do I use CodeWarrior for Macintosh with Xerces
 

The directions in this file cover installing and building Xerces and ICU under the MacOS using CodeWarrior.

  1. Create a folder that will contain the Xerces and ICU distributions. For future reference we will refer to this folder as "src drop".
  2. Download and uncompress the ICU source distribution and the Xerces source distribution. You might also want to download the binary distributions because they may contain documentation not present in the source distribution. This will create two additional directories; Xerces and icu Folder. Move these folders into the "src drop" folder.
  3. Drag the Xerces folder and drop it on to the "rename file" application located in the same folder as this readme. This is a MacPerl script that renames files with names too long to fit in a HFS/HFS+ filesystem. It also searches through all of the source code and changes the #include statements to refer to the new file names.
  4. Move the MacOS folder (in the Projects folder) to "src drop:Xerces:Projects".
  5. You should be able to open the CodeWarrior project file "src drop:Xerces:Projects:MacOS:Xerces:Xerces" and build the Xerces library.
  6. You should also be able to open the CodeWarrior project file "src drop:Xerces:Projects:MacOS:icu:icu" and build the ICU library.
  7. If you wish you can create projects for and build the rest of the tools and test suites. They are not needed if you just want to use Xerces. I suggest that you use the binary data files distributed with the binary distribution of ICU instead of creating your own from the text data files in the ICE source distribution.

There are some things to be aware of when creating your own projects using Xerces.

  1. You will need to link against both the ICU and Xerces libraries.
  2. The options "Always search user paths" and "Interpret DOS and Unix Paths" are very useful. Some of the code won't compile without them set.
  3. Most of the tools and test code will require slight modification to compile and run correctly (typecasts, command line parameters, etc), but it is possible to get them working correctly.
  4. You will most likely have to set up the Access Paths. The access paths in the Xerces projects should serve as a good example.

If you are having problems getting Xerces working, feel free to send an email to jbellardo@alumni.calpoly.edu. However, help will arrive only if time permits.


Is Xerces thread-safe?
 

This is not a question that has a simple yes/no answer. Here are the rules for using Xerces in a multi-threaded environment:

Within an address space, an instance of the parser may be used without restriction from a single thread, or an instance of the parser can be accessed from multiple threads, provided the application guarantees that only one thread has entered a method of the parser at any one time.

When two or more parser instances exist in a process, the instances can be used concurrently, and without external synchronization. That is, in an application containing two parsers and two threads, one pareser can be running within the first thread concurrently with the second parser running within the second thread.

The same rules apply to Xerces DOM documents - multiple document instances may be concurrently accessed from different threads, but any given document instance can only be accessed by one thread at a time.

DOMStrings allow multiple concurrent readers. All DOMString const methods are thread safe, and can be concurrently entered by multiple threads. Non-const DOMString methods, such as appendData(), are not thread safe and the application must guarantee that no other methods (including const methods) are executed concurrently with them.


How do I find out what version of Xerces I am using?
 

The version string for Xerces happens to be in one of the source files. Look inside the file src/util/XML4CDefs.hpp and find out what the static variable gXML4CFullVersionStr is defined to be. (It is usually of type 3.0.0 or something similar). This is the version of XML you are using.

If you don't have the source code, you have to find the version information from the shared library name. On Windows NT/95/98 right click on the DLL name xerces-c_1_0.dll in the bin directory and look up properties. The version information may be found on the Version tab.

On AIX, just look for the library name libxerces-c1_0.a (or libxerces-c1_0.so on Solaris/Linux and libxerces-c1_0.sl on HP-UX). The version number is coded in the name of the library.


How do I uninstall Xerces?
 

Xerces only installs itself in a single directory and does not set any registry entries. Thus, to un-install, you only need to remove the directory where you installed it, and all Xerces related files will be removed.


How do I add an additional transcoding file in the existing set?
 

Transcoding files shipped with binary drops of Xerces exist in the bin/icu/data directory on Win32 and in the lib/icu/data directory under various unix's. All transcoding files have the extension .cnv and are platform specific binary files. The ICU drop provides the utility 'makeconv' to generate these binary files. To add an additional transcoding file, you need to first define your new code-set in ASCII format (which has the extension .ucm). The coding format for an encoding may be obtained from one of the existing files in icu/data (in the source drop). After you create the .ucm file for your new language, you need to convert it to a binary form using makeconv.

Thus, if your new code-set is defined in file mynewcodeset.ucm , you would type:

makeconv mynewcodeset.ucm
      

...to create the binary transcoding file mynewcodeset.cnv. Make sure that this .cnv file is packaged in the same place as the others, i.e. in a directory icu/data relative to where your shared library is.

You can also add aliases for this encoding in the file 'convrtrs.txt', also present in the same directory as the converter files.


How are entity reference nodes handled in DOM ?
 

If you are using the native DOM classes, the function setExpandEntityReferences controls how entities appear in the DOM tree. When setExpandEntityReferences is set to false (the default), an occurance of an entity reference in the XML document will be represented by a subtree with an EntityReference node at the root whose children represent the entity expansion. Entity expansion will be a DOM tree representing the structure of the entity expansion, not a text node containing the entity expansion as text.

If setExpandEntityReferences is true, an entity reference in the XML document is represented by only the nodes that represent the entity expansion. The DOM tree will not contain any entityReference nodes.


What kinds of URLs are currently supported in Xerces?
 

We now have a spec. compliant, but limited, implementation of the class URL.

  • The only protocol currently supported is the "file://" which is used to refer to files locally.
  • Only the 'localhost' string is supported in the host placeholder in the URL syntax.

This should work for command line arguments to samples as well as any usage in the XML file when referring to an external file.

Examples of what this implementation will allow you to do are:

e:\>domcount file:///e:/Xerces/build/win32/vc6/debug/abc.xml

or

e:\>domcount file::///Xerces/build/win32/vc6/debug/abc.xml
e:\>domcount file::///d:/abc.xml

or

e:\>domcount file:://localhost/d:/abc.xml
      

Example of what you cannot do is:

Refer to files using the 'file://' syntax and giving a relative path to the file.

This implies that if you are using the 'file://' syntax to refer to external files, you have to give the complete path to files even in the current directory.

You always have the option of not using the 'file://' syntax and referring to files by just giving the filename or a relative path to it as in:

domcount abc.xml
      

Can I use Xerces to parse HTML?
 

Yes, if it follows the XML spec rules. Most HTML, however, does not follow the XML rules, and will therefore generate XML well-formedness errors.


I keep getting an error: "invalid UTF-8 character". What's wrong?
 

There are many Unicode characters that are not allowed in your XML document, according to the XML spec. Typical disallowed characters are control characters, even if you escape them using the Character Reference form: See the XML spec, sections 2.2 and 4.1 for details. If the parser is generating this error, it is very likely that there's a character in there that you can't see. You can generally use a UNIX command like "od -hc" to find it.

Another reason for this error is that your file is in some non UTF/ASCII encoding but you gave no encoding="" string in your file to tell the parser what its real encoding is.


What encodings are supported by Xerces?
 

Xerces uses a subset of IBM's International Classes for Unicode (ICU) for encoding & Unicode support. Xerces C++ Parser is Unicode 3.0 compliant.

Besides ASCII, the following encodings are currrently supported:

  • UTF-8
  • UTF-16 Big Endian, UTF-16 Little Endian
  • IBM-1208
  • ISO Latin-1 (ISO-8859-1)
  • ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech, Hungarian, Polish, Romanian, Serbian (in Latin transcription), Serbocroation, Slovak, Slovenian, Upper Sorbian and Lower Sorbian]
  • ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
  • ISO Latin-4 (ISO-8859-4)
  • ISO Latin Cyrillic (ISO-8859-5)
  • ISO Latin Arabic (ISO-8859-6) [Arabic]
  • ISO Latin Greek (ISO-8859-7)
  • ISO Latin Hebrew (ISO-8859-8) [Hebrew]
  • ISO Latin-5 (ISO-8859-9) [Turkish]
  • Extended Unix Code, packed for Japanese (euc-jp, eucjis)
  • Japanese Shift JIS (shift-jis)
  • Chinese (big5)
  • Extended Unix Code, packed for Korean (euc-kr)
  • Russian Unix, Cyrillic (koi8-r)
  • Windows Thai (cp874)
  • Latin 1 Windows (cp1252)
  • cp858
  • EBCDIC encodings:
  • EBCDIC US (ebcdic-cp-us)
  • EBCDIC Canada (ebcdic-cp-ca)
  • EBCDIC Netherland (ebcdic-cp-nl)
  • EBCDIC Denmark (ebcdic-cp-dk)
  • EBCDIC Norway (ebcdic-cp-no)
  • EBCDIC Finland (ebcdic-cp-fi)
  • EBCDIC Sweden (ebcdic-cp-se)
  • EBCDIC Italy (ebcdic-cp-it)
  • EBCDIC Spain & Latin America (ebcdic-cp-es)
  • EBCDIC Great Britain (ebcdic-cp-gb)
  • EBCDIC France (ebcdic-cp-fr)
  • EBCDIC Hebrew (ebcdic-cp-he)
  • EBCDIC Switzerland (ebcdic-cp-ch)
  • EBCDIC Roece (ebcdic-cp-roece)
  • EBCDIC Yugoslavia (ebcdic-cp-yu)
  • EBCDIC Iceland (ebcdic-cp-is)
  • EBCDIC Urdu (ebcdic-cp-ar2)
  • Latin 0 EBCDIC

Additional encodings to be available later:

  • EBCDIC Arabic (ebcdic-cp-ar1)
  • Chinese for PRC (mixed 1/2 byte) (gb2312)
  • Japanese ISO-2022-JP (iso-2022-jp)
  • Cyrllic (koi8-r)

The ICU uses IBM's UPMAP format as source files for data-based conversion. All codepages represented in that format are supported (i.e: SBCS, DBCS, MBCS and EBCDIC_STATEFUL), with the exception of codepages with a maximum character length strictly greater than two bytes (e.g. this excludes 1350 and 964).

The following is a non-exhaustive list of codepages that are supported by the international library packaged with the product.

ibm-1004, ibm-1006, ibm-1008, ibm-1038, ibm-1041, ibm-1043, ibm-1047, ibm-1051, ibm-1088, ibm-1089, ibm-1098, ibm-1112, ibm-1114, ibm-1115, ibm-1116, ibm-1117, ibm-1118, ibm-1119, ibm-1123, ibm-1140, ibm-1141, ibm-1142, ibm-1143, ibm-1144, ibm-1145, ibm-1146, ibm-1147, ibm-1148, ibm-1149, ibm-1153, ibm-1154, ibm-1155, ibm-1156, ibm-1157, ibm-1158, ibm-1159, ibm-1160, ibm-1164, ibm-1250, ibm-1251, ibm-1252, ibm-1253, ibm-1254, ibm-1255, ibm-1256, ibm-1257, ibm-1258, ibm-12712, ibm-1275, ibm-1276, ibm-1277, ibm-1280, ibm-1281, ibm-1282, ibm-1283, ibm-1361, ibm-1362, ibm-1363, ibm-1364, ibm-1370, ibm-1371, ibm-1383, ibm-1386, ibm-1390, ibm-1399, ibm-16684, ibm-16804, ibm-17248, ibm-21427, ibm-273, ibm-277, ibm-278, ibm-280, ibm-284, ibm-285, ibm-290, ibm-297, ibm-37, ibm-420, ibm-424, ibm-437, ibm-4899, ibm-4909, ibm-4930, ibm-4971, ibm-500, ibm-5104, ibm-5123, ibm-5210, ibm-5346, ibm-5347, ibm-5349, ibm-5350, ibm-5351, ibm-5352, ibm-5353, ibm-5354, ibm-803, ibm-808, ibm-813, ibm-833, ibm-834, ibm-835, ibm-837, ibm-848, ibm-8482, ibm-849, ibm-850, ibm-852, ibm-855, ibm-856, ibm-857, ibm-858, ibm-859, ibm-860, ibm-861, ibm-862, ibm-863, ibm-864, ibm-865, ibm-866, ibm-867, ibm-868, ibm-869, ibm-871, ibm-872, ibm-874, ibm-878, ibm-891, ibm-897, ibm-901, ibm-902, ibm-9027, ibm-903, ibm-904, ibm-9044, ibm-9049, ibm-9061, ibm-907, ibm-909, ibm-910, ibm-912, ibm-913, ibm-914, ibm-915, ibm-916, ibm-920, ibm-921, ibm-922, ibm-923, ibm-9238, ibm-924, ibm-930, ibm-933, ibm-935, ibm-937, ibm-939, ibm-941, ibm-942, ibm-943, ibm-944, ibm-946, ibm-947, ibm-948, ibm-949, ibm-950, ibm-953, ibm-955, ibm-961, ibm-964, and ibm-970




Copyright © 1999 The Apache Software Foundation. All Rights Reserved.