nanogui: Thread: Microwindows for Hercules

Subject: Re: Microwindows for Hercules
From: Ben Pfaff ####@####.####
Date: 16 Jul 1999 22:25:54 -0000
Message-Id: <87yaggi69b.fsf@pfaffben.user.msu.edu>

Chipzz ####@####.#### writes:

   > 	A good idea, almost.  The BOGL library performs this for the packed pixel
   > modes, but the VGA requires OUT instructions inbetween memory accesses,
   > so it can't run on a generalized bit-depth algorithm in planes mode. (The VGA
   > design has to be seen/studied to be believed, I've never seen such a complicated
   > piece of hardware for something kinda-conceptually simple)

   Hmm then that's something that could be checked for in between STOSB
   instructions (or the like). We could for example use something like this
   (just an idea), where ? is a flag that isn't used (maybe the carry flag?):

   PUSH flags register
   CLI
   ...
   {If out needed} ST?
   ...
   {Bresenham}
   ...
   STOSB {or something like that, like OR}
   LOOPN?
   JN? :End
   ...
   {Perform OUT}
   ...
   LOOP
   ...

   :End
   ...
   POP flags register

   We could of course also use something else than a flag, like a register,
   if Bresenham doesn't already use all of them...
   Just an idea, I never did VGA 4 bit programming, I always used mode 13h.

Hmm, it's a good idea.  Unfortunately, I don't think that it will work
out.  The OUTs that need to be performed are not the same for each
pixel; rather, they are dependent on what needs to be written.  At any
rate, adding those jumps will kill your performance on 8086-class
processors, since it increases code size (read Abrash's _Zen of
Assembly Language Programming_).

Most VGA16 code ends up looking something like this:

	Set up lots of internal VGA registers with OUT operations.
	Read a byte from the memory byte that contains the pixel(s) of
	interest to load the internal VGA latches.
	Write an arbitrary byte whose value doesn't matter to the same
	memory byte.
	Start over.

   Which may be more of a problem is if we would use res > 320x200x256. These
   don't fit in one page, and we would have to do page swapping. (Except if
   we got a linear framebuffer). But there won't be many 8088-80286 that sup-
   port those res anyway..

Actually there's semi-standard 800x600x16 support that works on most
SVGA controllers using the same VGA16 code; that requires 800 * 600 /
2 = 240,000 bytes memory, but it's mapped into 800 * 600 / 8 = 60,000
bytes memory at one bit per pixel with the VGA16 code.

Subject: RE: Microwindows for Hercules
From: Greg Haerr ####@####.####
Date: 16 Jul 1999 22:42:53 -0000
Message-Id: <01BECFA9.B2A20DA0.greg@censoft.com>

: Actually there's semi-standard 800x600x16 support that works on most
: SVGA controllers using the same VGA16 code; that requires 800 * 600 /
: 2 = 240,000 bytes memory, but it's mapped into 800 * 600 / 8 = 60,000
: bytes memory at one bit per pixel with the VGA16 code.
: 
	Speaking of this, what would be *really* cool would be to add
SVGA bios support to my scr_bios driver, so that we could support the higher
than 640x480 modes....  Any volunteers? 

Greg

Subject: Re: Microwindows for Hercules
From: Alan Cox ####@####.####
Date: 18 Jul 1999 14:26:14 -0000
Message-Id: <E115rki-0006Q7-00@the-village.bc.nu>

> 	No need.  MicroWindows handles the Bresenham algorithm in the mid
> level code in devdraw.c.  It uses successive calls to drawpixel to make it work.
> In this way, people like you and me don't have to rewrite bresenham for every
> card someone wants....

The code in devdraw.c is very naiive. It assumes pixel plotting is the underlyin
op. On many cards line slices are the underlying operation, horizontal or
vertical. What you probably want to do is generate a series of

	draw_horizontal(x,y,l)

or
	draw_vertical(x,y,l)

calls for most things

> This might be useful when bitblt is implemented though...

Having 32K of offscreen memory is always useful. 

Alan

Subject: RE: Microwindows for Hercules
From: Greg Haerr ####@####.####
Date: 19 Jul 1999 17:10:43 -0000
Message-Id: <01BED1D6.BD213830.greg@censoft.com>

: The code in devdraw.c is very naiive. It assumes pixel plotting is the underlyin
: op. On many cards line slices are the underlying operation, horizontal or
: vertical. What you probably want to do is generate a series of
: 
: 	draw_horizontal(x,y,l)
: 
: or
: 	draw_vertical(x,y,l)
: 
: calls for most things
:

	That's a good idea.  This would certainly speed up diagonal lines
on systems with a fast horizontal line draw.  The vertical doesn't add much,
as most video planes aren't optimized for vertical line drawing.  Currently,
there aren't any applications that draw diagonal lines though, so the speed issue
is mute.


: > This might be useful when bitblt is implemented though...
: 
: Having 32K of offscreen memory is always useful. 
: 
	Definitely.  I plan on adding offscreen drawing memory, but it requires
some big architecture changes.

Greg

Subject: Re: Microwindows for Hercules
From: Ben Pfaff ####@####.####
Date: 19 Jul 1999 17:28:02 -0000
Message-Id: <87g12ky1k1.fsf@pfaffben.user.msu.edu>

Greg Haerr ####@####.#### writes:

   : The code in devdraw.c is very naiive. It assumes pixel plotting is the underlyin
   : op. On many cards line slices are the underlying operation, horizontal or
   : vertical. What you probably want to do is generate a series of
   : 
   : 	draw_horizontal(x,y,l)
   : 
   : or
   : 	draw_vertical(x,y,l)
   : 
   : calls for most things

	   That's a good idea.  This would certainly speed up diagonal lines
   on systems with a fast horizontal line draw.  The vertical doesn't add much,
   as most video planes aren't optimized for vertical line drawing.  Currently,
   there aren't any applications that draw diagonal lines though, so the speed issue
   is mute.

A few days ago I was considering a faster-than-Bresenham(sp?)
algorithm along the lines of what Alan was saying.  I came up with two
problems, both of which would only apply to assembly-language
implementations on the 8086 through 80286:

	1. AFAICT it would require at least one division operation,
           whereas standard Bresenham doesn't need any.  This
           wouldn't be a problem for long diagonal lines, just for
           short ones.  Division is expensive.

	2. I can't think of a way to fit all the necessary info into
           the 8086 register set.  The standard Bresenham algorithm
           fits, just barely, but it looks like an ``extended''
           algorithm that keeps track of spans would need to use
           memory as well.  This is a big loss on the 8086 IIRC.

Can anyone inform me how long DIV r/m16 takes on an 8086?  I seem to
have lost my cycle-timing books, or perhaps I threw them out in a fit
of optimism.
-- 
"Debian for hackers, Red Hat for suits, Slackware for loons."
--CmdrTaco <URL:http://slashdot.org/articles/99/03/22/0928207.shtml>

Subject: RE: Microwindows for Hercules
From: Greg Haerr ####@####.####
Date: 19 Jul 1999 17:51:03 -0000
Message-Id: <01BED1DC.91334820.greg@censoft.com>

: A few days ago I was considering a faster-than-Bresenham(sp?)
: algorithm along the lines of what Alan was saying.  I came up with two
: problems, both of which would only apply to assembly-language
: implementations on the 8086 through 80286:

What would be *really* cool would be a super-fast implementation of VGA_drawhline()
in assembly.  That's something that would vastly improve nano-X and microwindows
*now*, since fillrectangle is based on drawhline.

This routine for VGA and standard memory ops would be great.


Greg

Subject: Re: Microwindows for Hercules
From: Alan Cox ####@####.####
Date: 19 Jul 1999 18:10:53 -0000
Message-Id: <E116Hix-0007sI-00@the-village.bc.nu>

> 	1. AFAICT it would require at least one division operation,
>            whereas standard Bresenham doesn't need any.  This
>            wouldn't be a problem for long diagonal lines, just for
>            short ones.  Division is expensive.

No. You can do it by using Besenham and still speed it up

>            fits, just barely, but it looks like an ``extended''
>            algorithm that keeps track of spans would need to use
>            memory as well.  This is a big loss on the 8086 IIRC.

Its not a big deal

Firstly:

	if(x2-x1 > y2-y2)
		horizonal_optimised();
	else
		vertical_optimised();

Next for Bresenham you drop the plot_pixel call and instead when you 
bump x (or y in vertical mode) you do plot_line(oldx,oldy, x,y); bump it
oldx=x oldy=y

Saves you function calls costs you four memory accesses per line - thats
a win on everything.

> Can anyone inform me how long DIV r/m16 takes on an 8086?  I seem to

"weeks" 8)

Subject: Re: Microwindows for Hercules
From: Ben Pfaff ####@####.####
Date: 19 Jul 1999 18:22:50 -0000
Message-Id: <87btd8xz1w.fsf@pfaffben.user.msu.edu>

Alan Cox ####@####.#### writes:

   > 	1. AFAICT it would require at least one division operation,
   >            whereas standard Bresenham doesn't need any.  This
   >            wouldn't be a problem for long diagonal lines, just for
   >            short ones.  Division is expensive.

   No. You can do it by using Besenham and still speed it up

   >            fits, just barely, but it looks like an ``extended''
   >            algorithm that keeps track of spans would need to use
   >            memory as well.  This is a big loss on the 8086 IIRC.

   Its not a big deal

   Firstly:

	   if(x2-x1 > y2-y2)
		   horizonal_optimised();
	   else
		   vertical_optimised();

Well, yes, obviously.

   Next for Bresenham you drop the plot_pixel call and instead when you 
   bump x (or y in vertical mode) you do plot_line(oldx,oldy, x,y); bump it
   oldx=x oldy=y

   Saves you function calls costs you four memory accesses per line - thats
   a win on everything.

Okay that's one way to look at it.  The routine that I was looking at
is in Wilton's _Programmer's Guide to PC and PS/2 Video Systems_.  He
has a routine that does one pixel per ten or so CPU instructions on
VGA16.  Getting that fast is easy; I was looking to do even better
than that using clever things with bit masks to write multiple pixels
at once.

You're looking at optimizing at the generic level with calls to a
hardware-specific routine; I was thinking about optimizing an already
fast x86 asm routine.  Oh well.
-- 
"Unix... is not so much a product
 as it is a painstakingly compiled oral history
 of the hacker subculture."
--Neal Stephenson

Subject: RE: Microwindows for Hercules
From: Greg Haerr ####@####.####
Date: 19 Jul 1999 19:21:18 -0000
Message-Id: <01BED1E9.24C8FF10.greg@censoft.com>

: Next for Bresenham you drop the plot_pixel call and instead when you 
: bump x (or y in vertical mode) you do plot_line(oldx,oldy, x,y); bump it
: oldx=x oldy=y
: 
: Saves you function calls costs you four memory accesses per line - thats
: a win on everything.
: 

	I agree.  Easy and simple, saves function calls and costs 2 subtracts
and a two stores.

Subject: RE: Microwindows for Hercules
From: Greg Haerr ####@####.####
Date: 19 Jul 1999 19:25:16 -0000
Message-Id: <01BED1E9.B0376C80.greg@censoft.com>

: Okay that's one way to look at it.  The routine that I was looking at
: is in Wilton's _Programmer's Guide to PC and PS/2 Video Systems_.  He
: has a routine that does one pixel per ten or so CPU instructions on
: VGA16.  Getting that fast is easy; I was looking to do even better
: than that using clever things with bit masks to write multiple pixels
: at once.

	It'd be cool to optimize that.  My asmplan4.s replacement
high-speed driver for vgaplan4.c uses Wilton's code as a base.
Feel free to test and enhance that code.


: 
: You're looking at optimizing at the generic level with calls to a
: hardware-specific routine; I was thinking about optimizing an already
: fast x86 asm routine.  Oh well.
:
	Currently, there's not a direct entry point for the line draw,
it's commented out.  Only if the entire line draw is unclipped will a
low-level routine be called anyways, but you could test by uncommenting
that code in GdLine in devdraw.c and calling it outside the driver interface.

Greg