nanogui: New Microwindows font support

Previous by date:	21 Mar 2000 17:53:29 -0000 Re: OT TROLL: Re: What is the status of porting Opera to MicroWindows ?, shane.isupportlive.com
Next by date:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Dan Maas
Previous in thread:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Greg Haerr
Next in thread:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Dan Maas

Subject: Re: New Microwindows font support
From: Morten Rolland ####@####.####
Date: 21 Mar 2000 17:53:29 -0000
Message-Id: <38D7C2C9.1CC52FDF@screenmedia.no>

Greg Haerr wrote:
> 
> Morten - I would definitely like to take you up on your
> offer of writing a conversion function.  Why don't you look
> at 0.88pre4 engine/devfont.c, and see how the function
> currently is performed.

OK, did that, but the supplied function is basically just
a conversion function that needs to be integrated.  No
clean patch - sorry - time didn't allow that.

See end of this post/attachement for the listing.  If
someone can verify that surrogate scalar values are illegal
in UTF-8 strings, I would be thankful --- allowing them would
be a big mistake IMHO - UTF-8 character encoding is so... elegant.

I have not done much testing, but it should be somewhat usable.

> GdConvertEncoding(void *istr, int iflags, int cc, void *ostr, int oflags)
> it returns the number of new-unit-sized characters in the output
> buffer.

The function supplied may not be enough to support all cases,
unless more buffers and some simple conversions are used.

> Your utf8 conversion would be called by this routine.  I'm more
> than happy to include any other utility routines, like your
> proposed utf8_to_scaler, but all that Microwindows needs
> (we want to try to keep it small and fast) is the above routine.

You are right, with the ugly Surrogate hack in UTF-16, up to
0x10FFFF characters can be supported, which should keep the
ISO and ANSI guys busy for a while....

For Unicode-32, I was thinking the scalar function could come in
handy, but now I'm not even sure there is such a thing as
Unicode-32; although I would not be surprised if it contained
the scalar values, or code-points or whatever they are called.

> This allows utf8 to try to be displayed on small microwindows
> implementations where only ascii output is available, while still
> keeping utf8 as a std input text format.

This would be nice... I'm allready dreaming of having UTF-8
filenames on my system....:-/

> : The code in psd->drawarea has support for both alpha-blending
> : of an image with a separate alpha map, and alpha blending with
> : a uniform coloured image, e.g. use the output from T1 etc. as
> : the alpha map and specify the foreground color.
> 
> yep, exactly.  I'm starting to work on this now.

Fantastic.  Let me know if you need a hand or anything.  If not,
I'll continue writing on a patch for Nano-X to use shared memory
for client => server command queue transfers in "my spare time".
This seems to actually be very simple, as the AllocReq function
allready builds up a "command queue" that gets flushed.  I plan
to introduce two new protocol commands:  One to set up the
shared memory area, and one to command the server to process
the command queue right out of the shared memory area (flush).
The flush will be sent over the socket in the old way, so the
server can select and wake up when things needs looking into,
like before.  It is a fairly non-intrusive patch that
should improve general operation of all Nano-X programs.
Very graphics intensive ones can request a larger than usual
shared memory segment for reduced context switching.  Apps
with no heavy screen operations at all don't have to enable
shared memory.

> I will steal your idea and use a struct.

I wouldn't exactly call it stealing....:-)  I think this is
a good idea, I'm looking forward to it, Thanks!

> I'll put together the first round, and then I will enlist your
> help for rewriting the 16bpp blitters.

Sure thing.

> I was deep in thought trying to get the font stuff in so that you
> and Martin could move fwd, and decided not to apply the
> patch without testing.  I will apply the patch for the next
> pre5 cut, after I have had a chance to thouroughly understand it.

OK, can't wait...!

Regards,
Morten Rolland, Screen Media

PS: Use the attachment if my mailer breaks the lines/tabs.

---------------

/* UTF-8 to UTF-16 conversion.  Surrogates are handeled properly, e.g.
 * a single 4-byte UTF-8 character is encoded into a surrogate pair.
 * On the other hand, if the UTF-8 string contains surrogate values, this
 * is considered an error and returned as such.
 *
 * The destination array must be able to hold as many Unicode-16 characters
 * as there are ASCII characters in the UTF-8 string (the strlen function
 * will report the number of ASCII characters).  This in case all UTF-8
 * characters are ASCII characters.  No more will be needed.
 *
 * Copyright (c) 2000 Morten Rolland, Screen Media
 */

int utf8_to_utf16(char *utf8, unsigned short *unicode16)
{
	int count = 0;
	unsigned char c0, c1, c2, c3;
	unsigned short u0, u1;
	unsigned long scalar;
	int bits;

	for (;;) {
		c0 = *utf8++;
		printf("Trying: %02x\n",c0);

		if ( c0 == '\0' )
			/* Null terminated - end of string */
			return count;

		if ( c0 < 0x80 ) {
			/* Plain ASCII character, simple translation :-) */
			*unicode16++ = c0;
			count++;
			continue;
		}

		if ( (c0 & 0xc0) == 0x80 )
			/* Illegal; starts with 10xxxxxx */
			return -1;

		/* c0 must be 11xxxxxx if we get here => at least 2 bytes */
		scalar = c0;
		c1 = *utf8++;
		printf("c1=%02x\n",c1);
		if ( (c1 & 0xc0) != 0x80 )
			/* Bad byte */
			return -1;
		scalar <<= 6;
		scalar |= (c1 & 0x3f);

		if ( !(c0 & 0x20) ) {
			/* Two bytes UTF-8 */
			if ( scalar < 0x80 )
				return -1;	/* Overlong encoding */
			*unicode16++ = scalar & 0x7ff;
			count++;
			continue;
		}

		/* c0 must be 111xxxxx if we get here => at least 3 bytes */
		c1 = *utf8++;
		printf("c1=%02x\n",c1);
		if ( (c1 & 0xc0) != 0x80 )
			/* Bad byte */
			return -1;
		scalar <<= 6;
		scalar |= (c1 & 0x3f);

		if ( !(c0 & 0x10) ) {
			printf("####\n");
			/* Three bytes UTF-8 */
			if ( scalar < 0x800 )
				return -1;	/* Overlong encoding */
			if ( scalar >= 0xd800 && scalar < 0xe000 )
				return -1;	/* UTF-16 high/low halfs */
			*unicode16++ = scalar & 0xffff;
			count++;
			continue;
		}

		/* c0 must be 1111xxxx if we get here => at least 4 bytes */
		c1 = *utf8++;
		printf("c1=%02x\n",c1);
		if ( (c1 & 0xc0) != 0x80 )
			/* Bad byte */
			return -2;
		scalar <<= 6;
		scalar |= (c1 & 0x3f);

		if ( !(c0 & 0x08) ) {
			/* Four bytes UTF-8, needs encoding as surrogates */
			if ( scalar < 0x10000 )
				return -3;	/* Overlong encoding */
			scalar -= 0x10000;
			*unicode16++ = ((scalar >> 10) & 0x3ff) + 0xd800;
			*unicode16++ = (scalar & 0x3ff) + 0xdc00;
			count += 2;
			continue;
		}

		return -4;	/* No supprt for more than four byte UTF-8 */
	}
}

#ifdef TEST
int main(int argc, char **argv)
{
	unsigned short utf16[512];
	int count, t;

	static unsigned char test[] =
	{ 0xc3, 0x85, 0xe2, 0x84, 0xab,
	  0xf3, 0xb0, 0x80, 0x80, 0x61, 0xcc, 0x8a, 0 };

	count = utf8_to_utf16(test,utf16);

	printf("Number of characters: %d\n", count);

	for ( t=0; t < count; t++ )
		printf("%04x ",utf16[t]);
}
#endif

[Content type application/octet-stream not shown. Download]

Previous by date:	21 Mar 2000 17:53:29 -0000 Re: OT TROLL: Re: What is the status of porting Opera to MicroWindows ?, shane.isupportlive.com
Next by date:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Dan Maas
Previous in thread:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Greg Haerr
Next in thread:	21 Mar 2000 17:53:29 -0000 Re: New Microwindows font support, Dan Maas