nanogui: UniCode

Previous by date:	29 Mar 2000 08:45:58 -0000 Re: 'Persistent' nano-X server, Greg Haerr
Next by date:	29 Mar 2000 08:45:58 -0000 Re: Unicode, Alan Cox
Previous in thread:	29 Mar 2000 08:45:58 -0000 Unicode, Martin_Doering.mn.man.de
Next in thread:	29 Mar 2000 08:45:58 -0000 Re: Unicode, Alan Cox

Subject: Re: Unicode
From: ####@####.####
Date: 29 Mar 2000 08:45:58 -0000
Message-Id: <20000329083600.17842.qmail@nameplanet.com>

28000 is the number of ideographic far eastern characters in Unicode V3,
*NOT* the full number of characters used. Unicode V3. specify 49194 characters,
6400 private use characters, 2048 surrogates, and misc other values, ending up
at 57709 assigned 16-bit values...

This is up about 10.000 from V2.1, and this rate of adding characters aren't
likely to stop anytime soon (there's still  lots of scripts to add, and lot of
discussion about adding more versions of some glyphs - many people want separate
Chinese and Japanese versions of symbols that are currently represented by one
glyph in Unicode, for instance).

In other words: 32 bit *will* be used in the next versions of Unicode (post 3.0)

However this won't be a problem as long as one of the UTF-* encodings are used,
and not UCS-2.

Regards,
Vidar Hokstad


On Wed, 29 Mar 2000 9:18:45 +0200 ####@####.#### wrote:
>Hi!
>
>Some time ago we where talking about unicode and what storage scheme to use,
>UTF-8, UTF-16 or UTF-32. Here
>
>http://www.unicode.org/unicode/faq/
>
>I found, that only 28000 Characters are used in the moment. So 16 Bit would be
>no problem:
>
>Q: What about the Far East support?
>
>Unicode incorporates the characters of all the major government standards for
>ideographic characters from Japan, Korea, China, and Taiwan, and more.
>The Unicode Standard, Version 3.0 has almost 28,000 ideographic characters. The
>Unicode Consortium actively works with the IRG committee of ISO SC2/WG2 to
>define additional sets of ideographic characters for inclusion in future
>versions.
>
>
>I think, that this will fit for a while, because all important or existing
>languages are completely in this standard and there is still enough room
>available.
>
>So this could speak for a 16-Bit representation. I think, it would be more easy
>to convert a given program to 16 bit, because you have a single 16 bit value
for
>a character, not a floating scheme. So it could be easier to use existing
>routines for example to get the length of a string etc. (This could be
>different: Is a length the lenght in bytes or the number of characters -what
now
>could be different. But in old programs its always the same.)
>
>Martin
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: ####@####.####
>For additional commands, e-mail: ####@####.####
>
>


-- 
Get your firstname@lastname email for FREE at http://NamePlanet.com

Previous by date:	29 Mar 2000 08:45:58 -0000 Re: 'Persistent' nano-X server, Greg Haerr
Next by date:	29 Mar 2000 08:45:58 -0000 Re: Unicode, Alan Cox
Previous in thread:	29 Mar 2000 08:45:58 -0000 Unicode, Martin_Doering.mn.man.de
Next in thread:	29 Mar 2000 08:45:58 -0000 Re: Unicode, Alan Cox