|
 |  |  |  | I cannot run my sample applications. What is wrong? |  |  |  |  |
| |
There are two major installation issues which must be dealt
with in order to use Xerces from your applications. The
DLL or shared library must be locatable via the system's
environment. And, the converter files used by Xerces for
its transcoding must be locatable.
On UNIX platforms you need to ensure that your library search
environment variable includes the directory which has the
Xerces shared library (On AIX, this is LIBPATH, on
Solaris and Linux it is LD_LIBRARY_PATH while on HP-UX it is
SHLIB_PATH). Thus, if you installed your binaries under
$HOME/fastxmlparser , you need to point your
library path to that directory.
 |  |  |  |
export LIBPATH=$LIBPATH:$HOME/fastxmlparser/lib # (AIX)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/fastxmlparser/lib # (Solaris, Linux)
export SHLIB_PATH=$SHLIB_PATH:$HOME/fastxmlparser/lib # (HP-UX)
|  |  |  |  |
On Win32, you would ensure that the Xerces DLLs are in
the PATH environment.
For the transcoding files (*.cnv) , the
easiest mechanism (which is used in the binary release) is to
place them relative to the shared library or DLL. The
transcoding converter files should be in the
icu/data directory relative to the shared library
or DLL. This will allow them to be located automatically.
However, if you redistribute Xerces within some other
product, and cannot maintain this relationship, or if your
build scenario does not allow you to maintain this
relationship during debugging for instance, you can use the
ICU_DATA environment variable to point to these converter
files (make sure the variable ends with a backslash '\' on Windows platforms).
This variable may be set system wide, within a
particular command window, or just within the client
application or higher level libraries, as is deemed
necessary. It must be set before the XML system is initialized
(see below.)
|
 |  |  |  | I just built my own application using the Xerces parser. Why does it
crash? |  |  |  |  |
| |
In order to work with the Xerces parser, you have to
first initialize the XML subsystem. The most common mistake is
to forget this initialization. Before you make any calls to
Xerces APIs, you must call
 |  |  |  |
XMLPlatformUtils::Initialize():
try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
// Do your failure processing here
}
|  |  |  |  |
This initializes the Xerces system and sets its
internal variables. Note that you must the include
util/PlatformUtils.hpp file for this to work.
The second common problem is the absence of the transcoding
converter files (*.cnv) . This problem has a
simple fix, if you understand how the transcoding converter
files are searched.
Xerces first looks for the environment variable
ICU_DATA. If it finds this variable in your environment
settings, then it assumes that the transcoding converter files
are kept in that directory. Thus, for example, if you had set
your environment variable to (say):
 |  |  |  |
set ICU_DATA=d:\myXerces\icu\data\
|  |  |  |  |
The transcoding converter files (all files having extension
.cnv and convrtrs.txt) will be searched under
d:\myXerces\icu\data
If you have not set your environment variable, then the
search for the transcoding converters is done relative to the
location of the shared library xerces-c_1_0.dll (or
libxerces-c1_0.a on AIX and libxerces-c1_0.so on Solaris and
Linux, libxerces-c1_0.sl on HP-UX). Thus if your shared library
is in d:\fastxmlparser\lib , then your transcoding
converter files should be in
d:\fastxmlparser\lib\icu\data.
Before you run your application, make sure that you have
covered the two possibilities mentioned above.
|
| |
This is not a question that has a simple yes/no answer. Here are
the rules for using Xerces in a multi-threaded environment:
Within an address space, an instance of the parser may be used
without restriction from a single thread, or an instance of the
parser can be accessed from multiple threads, provided the
application guarantees that only one thread has entered a method
of the parser at any one time.
When two or more parser instances exist in a process, the
instances can be used concurrently, and without external
synchronization. That is, in an application containing two
parsers and two threads, one pareser can be running within the
first thread concurrently with the second parser running
within the second thread.
The same rules apply to Xerces DOM documents -
multiple document instances may be concurrently accessed from
different threads, but any given document instance can only be
accessed by one thread at a time.
DOMStrings allow multiple concurrent readers. All DOMString
const methods are thread safe, and can be concurrently entered
by multiple threads. Non-const DOMString methods, such as
appendData(), are not thread safe and the application must
guarantee that no other methods (including const methods) are
executed concurrently with them.
|
 |  |  |  | What encodings are supported by Xerces? |  |  |  |  |
| |
Xerces uses a subset of IBM's International Classes
for Unicode (ICU) for encoding & Unicode
support. Xerces C++ Parser is Unicode 3.0 compliant.
Besides ASCII, the following encodings are currrently
supported:
- UTF-8
- UTF-16 Big Endian, UTF-16 Little Endian
- IBM-1208
- ISO Latin-1 (ISO-8859-1)
- ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech,
Hungarian, Polish, Romanian, Serbian (in Latin
transcription), Serbocroation, Slovak, Slovenian, Upper
Sorbian and Lower Sorbian]
- ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
- ISO Latin-4 (ISO-8859-4)
- ISO Latin Cyrillic (ISO-8859-5)
- ISO Latin Arabic (ISO-8859-6) [Arabic]
- ISO Latin Greek (ISO-8859-7)
- ISO Latin Hebrew (ISO-8859-8) [Hebrew]
- ISO Latin-5 (ISO-8859-9) [Turkish]
- Extended Unix Code, packed for Japanese (euc-jp, eucjis)
- Japanese Shift JIS (shift-jis)
- Chinese (big5)
- Extended Unix Code, packed for Korean (euc-kr)
- Russian Unix, Cyrillic (koi8-r)
- Windows Thai (cp874)
- Latin 1 Windows (cp1252)
- cp858
- EBCDIC encodings:
- EBCDIC US (ebcdic-cp-us)
- EBCDIC Canada (ebcdic-cp-ca)
- EBCDIC Netherland (ebcdic-cp-nl)
- EBCDIC Denmark (ebcdic-cp-dk)
- EBCDIC Norway (ebcdic-cp-no)
- EBCDIC Finland (ebcdic-cp-fi)
- EBCDIC Sweden (ebcdic-cp-se)
- EBCDIC Italy (ebcdic-cp-it)
- EBCDIC Spain & Latin America (ebcdic-cp-es)
- EBCDIC Great Britain (ebcdic-cp-gb)
- EBCDIC France (ebcdic-cp-fr)
- EBCDIC Hebrew (ebcdic-cp-he)
- EBCDIC Switzerland (ebcdic-cp-ch)
- EBCDIC Roece (ebcdic-cp-roece)
- EBCDIC Yugoslavia (ebcdic-cp-yu)
- EBCDIC Iceland (ebcdic-cp-is)
- EBCDIC Urdu (ebcdic-cp-ar2)
- Latin 0 EBCDIC
Additional encodings to be available later:
- EBCDIC Arabic (ebcdic-cp-ar1)
- Chinese for PRC (mixed 1/2 byte) (gb2312)
- Japanese ISO-2022-JP (iso-2022-jp)
- Cyrllic (koi8-r)
The ICU uses IBM's UPMAP format as source files for data-based
conversion. All codepages represented in that format are supported
(i.e: SBCS, DBCS, MBCS and EBCDIC_STATEFUL), with the exception of
codepages with a maximum character length strictly greater than
two bytes (e.g. this excludes 1350 and 964).
The following is a non-exhaustive list of codepages that are
supported by the international library packaged with the product.
ibm-1004,
ibm-1006,
ibm-1008,
ibm-1038,
ibm-1041,
ibm-1043,
ibm-1047,
ibm-1051,
ibm-1088,
ibm-1089,
ibm-1098,
ibm-1112,
ibm-1114,
ibm-1115,
ibm-1116,
ibm-1117,
ibm-1118,
ibm-1119,
ibm-1123,
ibm-1140,
ibm-1141,
ibm-1142,
ibm-1143,
ibm-1144,
ibm-1145,
ibm-1146,
ibm-1147,
ibm-1148,
ibm-1149,
ibm-1153,
ibm-1154,
ibm-1155,
ibm-1156,
ibm-1157,
ibm-1158,
ibm-1159,
ibm-1160,
ibm-1164,
ibm-1250,
ibm-1251,
ibm-1252,
ibm-1253,
ibm-1254,
ibm-1255,
ibm-1256,
ibm-1257,
ibm-1258,
ibm-12712,
ibm-1275,
ibm-1276,
ibm-1277,
ibm-1280,
ibm-1281,
ibm-1282,
ibm-1283,
ibm-1361,
ibm-1362,
ibm-1363,
ibm-1364,
ibm-1370,
ibm-1371,
ibm-1383,
ibm-1386,
ibm-1390,
ibm-1399,
ibm-16684,
ibm-16804,
ibm-17248,
ibm-21427,
ibm-273,
ibm-277,
ibm-278,
ibm-280,
ibm-284,
ibm-285,
ibm-290,
ibm-297,
ibm-37,
ibm-420,
ibm-424,
ibm-437,
ibm-4899,
ibm-4909,
ibm-4930,
ibm-4971,
ibm-500,
ibm-5104,
ibm-5123,
ibm-5210,
ibm-5346,
ibm-5347,
ibm-5349,
ibm-5350,
ibm-5351,
ibm-5352,
ibm-5353,
ibm-5354,
ibm-803,
ibm-808,
ibm-813,
ibm-833,
ibm-834,
ibm-835,
ibm-837,
ibm-848,
ibm-8482,
ibm-849,
ibm-850,
ibm-852,
ibm-855,
ibm-856,
ibm-857,
ibm-858,
ibm-859,
ibm-860,
ibm-861,
ibm-862,
ibm-863,
ibm-864,
ibm-865,
ibm-866,
ibm-867,
ibm-868,
ibm-869,
ibm-871,
ibm-872,
ibm-874,
ibm-878,
ibm-891,
ibm-897,
ibm-901,
ibm-902,
ibm-9027,
ibm-903,
ibm-904,
ibm-9044,
ibm-9049,
ibm-9061,
ibm-907,
ibm-909,
ibm-910,
ibm-912,
ibm-913,
ibm-914,
ibm-915,
ibm-916,
ibm-920,
ibm-921,
ibm-922,
ibm-923,
ibm-9238,
ibm-924,
ibm-930,
ibm-933,
ibm-935,
ibm-937,
ibm-939,
ibm-941,
ibm-942,
ibm-943,
ibm-944,
ibm-946,
ibm-947,
ibm-948,
ibm-949,
ibm-950,
ibm-953,
ibm-955,
ibm-961,
ibm-964,
and ibm-970
|
|