|
| |
This is not a question that has a simple yes/no answer. Here are
the rules for using Xerces in a multi-threaded environment:
Within an address space, an instance of the parser may be used
without restriction from a single thread, or an instance of the
parser can be accessed from multiple threads, provided the
application guarantees that only one thread has entered a method
of the parser at any one time.
When two or more parser instances exist in a process, the
instances can be used concurrently, and without external
synchronization. That is, in an application containing two
parsers and two threads, one pareser can be running within the
first thread concurrently with the second parser running
within the second thread.
The same rules apply to Xerces DOM documents -
multiple document instances may be concurrently accessed from
different threads, but any given document instance can only be
accessed by one thread at a time.
DOMStrings allow multiple concurrent readers. All DOMString
const methods are thread safe, and can be concurrently entered
by multiple threads. Non-const DOMString methods, such as
appendData(), are not thread safe and the application must
guarantee that no other methods (including const methods) are
executed concurrently with them.
|
 |  |  |  | What character encoding should I use when creating XML documents? |  |  |  |  |
| |
The best choice in most cases is either utf-8 or utf-16.
Advantages of these encodings include
- The best portability. These encodings are more widely
supported by XML processors than any others, meaning that
your documents will have the best possible chance of being
read correctly, no matter where they end up.
- Full international character support. Both utf-8 and
utf-16 cover the full Unicode character set, which
includes all of the characters from all major national,
international and industry character sets.
- Efficient. utf-8 has the smaller storage requirements
for documents that are primarily composed of of characters
from the Latin alphabet. utf-16 is more efficient for
encoding Asian languages. But both encodings cover
all languages without loss.
The only drawback of utf-8 or utf-16 is that they are not
the native text file format for most systems, meaning that
common text file editors and viewers can not be directly used.
A second choice of encoding would be any of the others listed in
the table above. This works best when the xml encoding is the same
as the default system encoding on the machine where the
XML document is being prepared, because the document will then
display correctly as a plain text file. For UNIX systems
in countries speaking Western European languages, the encoding
will usually be iso-8859-1.
The versions of Xerces, both C and Java, distributed
by IBM as XML4C and XML4J, include all of the encodings
listed in the above table, on all platforms.
A word of caution for Windows users: The default character set
on Windows systems is windows-1252, not iso-8859-1. While Xerces-c
does recognize this Windows encoding, it is a poor choice for portable
XML data because it is not widely recoginized by other XML processing
tools. If you are using a Windows based editing tool to generate
XML, check which character set it generates, and make sure that the
resulting XML specifies the correct name in the encoding="..." declaration.
|
| |
Yes, Xerces-C supports EBCDIC. When creating EBCDIC encoded XML data,
the preferred encoding is ibm1140. Also supported is ibm037 (and its alternate name,
ebcdic-cp-us); this encoding is almost the same as ibm1140, but it lacks the Euro
symbol
These two encodings, ibm1140 and ibm037, are available on both Xerces-C and
IBM XML4C, on all platforms.
On IBM System 390, XML4C also supports two alternative forms, ibm037-s390
and ibm1140-s390. These are similar to the base ibm037 and ibm1140 encodings,
but with alternate mappings of the EBCDIC new-line character, which allows
them to appear as normal text files on System 390s. These encodings are not
supported on other platforms, and should not be used for portable data.
XML4C on System 390 and AS/400 also provides additional EBCDIC encodings, including
those for the character sets of different countries. The exact set supported
will be platform dependent, and these encodings are not recommended for
portable XML data.
|
|