nanogui: Thread: More on Image handeling and optimizations

Subject: More on Image handeling and optimizations
From: Morten Rolland ####@####.####
Date: 21 Jan 2000 08:44:54 -0000
Message-Id: <38882773.C6E625A4@screenmedia.no>

Hello!

(Sorry if this mail is a duplicate, first mailing failed
somewhere along the line yesterday localtime).

I've been thinking about how to best arrange for stuff like
optimizations etc. during my effort to speed up and improve
the GdArea function, and I've got a few notes I'd like to
share in this respect and a list of stuff I want to implement..

Please note that currently, the GdArea/GrArea functions are
very important to us and Opera, and may be even more so if
Vidar decides to use GdArea in his efforts to beat X11 in
the font rendering game...  Which I'm prepared to help him
with of course:-)  Which means I'll make GrArea fast and
flexible, or use other means like Blit if this is considered
better.  I'm worried the Blit function may end up being too
flexible, large, and hard to optimize, though.  Anyways;

I have a wish-list I'm prepared to implement:

1) Tiling of images.  Painting a non-uniform background
   can be done with tiles, which would reduce the client
   to server overhead a lot (e.g. only transfer the
   single smaller image).  I envision this to be a feature
   of GrArea.

2) It should be possible to do sub-imageing on the client
   side by GrArea without temporary storage, e.g. the
   application wants to take a small piece out of a larger
   image and paint it on the screen.  This could be relevant
   when repainting only parts of a large image for example.
   By having the client side extract only the pixles needed,
   the transfer is more efficient.

3) Alpha blending.  Yeah.  It's definite, I've gone mad.
   Actually it could be really nice to do true alpha
   blending in the server.  Some of the stuff our designers
   have come up with could use some alpha blending to
   avoid having the entire user interface as a single image..

   Alpha blending would also be great news to the anti-
   aliased font support by Vidar... We could have true
   anti aliasing against a textured background this way....

   Not something for the faint of heart, or 286 projects,
   but it would make Nano-X rock as a high quality
   environment.

I'm not sure where this should be done, though.  It probably
belongs in the Blit function, which would grow large, but
I have a concern for the Blit function:

Is it future-proof to require the destination 'psd' to do the
operation?  What if the source 'psd' is better suited for the
job?  I'm probably thinking device-device bliting where none
of the devices are memory, which is probably not supported yet,
just thinking of the future when PDAs and stuff will have like,
err, stereo views or something?

I have extended the existing code with a psd->DrawArea function,
with a couple of emulation functions, and it seems to work like
a charm.  Should I continue this or try to integrate with Blit?
Changing the low level part to Blit later on should be easy, but
we may better experiment and figure out what is needed when they
are separate. Comments?

I'll get back on the pre4, it has some problems with the make
system, foremost: SCREEN_PIXTYPE is not defined for all the
compiler runs where it is needed.

Thanx for the effort to all of you!

So long,
Morten Rolland

Subject: RE: More on Image handeling and optimizations
From: Greg Haerr ####@####.####
Date: 21 Jan 2000 18:40:54 -0000
Message-Id: <C1962B36D9BBD311B0F80060083DFEFB0416AF@SYS.CenSoft.COM>

: Please note that currently, the GdArea/GrArea functions are
: very important to us and Opera, and may be even more so if
: Vidar decides to use GdArea in his efforts to beat X11 in
: the font rendering game... 

According to Vidar's last email, he already is using
GdArea for *fantastically cool anti-aliased better-than-X11*
font rendering.



  Which means I'll make GrArea fast and
: flexible, or use other means like Blit if this is considered
: better.  I'm worried the Blit function may end up being too
: flexible, large, and hard to optimize, though.  

A couple of words about blit:  at it's core, all a SRCCOPY
blit is supposed to do is copy a rectangle of
memory from one location to another, FAST.  Ideally,
the src and dst pixel packings are the same, and
no conversion occurs.  Then, the bigger issue
is whether the src and dst image "line length" values
are enforced for all images.  For instance, if the
framebuffer video screen is "word padded", while
user images are allowed to be submitted as only
"byte padded" (no padding), then an optimized
word-by-word memory copy cannot be used for
highest speed.  In Windows, _all_ images are
required to be DWORD padded, so that the fastest
dword-by-dword memory copy routines can always be
used.  If we used this convention, that would mean
that all GrArea/GdArea images would have to be
DWORD padded or they couldn't be used.

I'd like comments on the above...  Currently,
Microwindows requires WORD padding
on bitmap images, and no padding on GrArea
images.


: 1) Tiling of images.  Painting a non-uniform background
:    can be done with tiles, which would reduce the client
:    to server overhead a lot (e.g. only transfer the
:    single smaller image).  I envision this to be a feature
:    of GrArea.

I would suggest that, rather than making this a special
feature of GrArea, that the concept of server-side
images, with associated IDs, be introduced.  Then a
special tile function could be used with that ID.



: 
: 2) It should be possible to do sub-imageing on the client
:    side by GrArea without temporary storage, e.g. the
:    application wants to take a small piece out of a larger
:    image and paint it on the screen.  This could be relevant
:    when repainting only parts of a large image for example.
:    By having the client side extract only the pixles needed,
:    the transfer is more efficient.

I'm not quite sure what you're looking for here.  Are you
talking about wanting to just paint a sub-rectangle
of client side image bits?  Just modify the x,y,w,h
of the original GrArea.  [Note this has big problems
if we move to high-speed DWORD padding of images]



: 
: 3) Alpha blending.  Yeah.  It's definite, I've gone mad.
:    Not something for the faint of heart, or 286 projects,
:    but it would make Nano-X rock as a high quality
:    environment.

I would _love_ to do alpha blending.  Both MAC OS X
and Windows 2000 are supporting it.  I'm definitely
interested in supporting it.  Actually, after checking out
screen shots for both the above, I decided I was
going to write it!


: 
: I'm not sure where this should be done, though.  It probably
: belongs in the Blit function, which would grow large

the blit functions are all going to get bigger when we
start supporting something other than just the simple
SRCCOPY.



: I have a concern for the Blit function:
: 
: Is it future-proof to require the destination 'psd' to do the
: operation?  What if the source 'psd' is better suited for the
: job?  I'm probably thinking device-device bliting where none
: of the devices are memory, which is probably not supported yet,

We already support device-device blitting.  I currently use
screen-to-screen blitting to implement the scrolling for the terminal
emulator demos.


: 
: I have extended the existing code with a psd->DrawArea function,
: with a couple of emulation functions, and it seems to work like
: a charm.  Should I continue this or try to integrate with Blit?
: Changing the low level part to Blit later on should be easy, but
: we may better experiment and figure out what is needed when they
: are separate. Comments?

I'd like to see your code.  But I'd also like to see the GdArea
code using blit, since we already have written (and now debugged)
1, 2, 4, 8, 16 and 32bpp blit drivers. [not all are fast].

So, both.

Regards,

Greg

Subject: Re: More on Image handeling and optimizations
From: "Bradley D. LaRonde" ####@####.####
Date: 21 Jan 2000 19:00:17 -0000
Message-Id: <087401bf6440$2b65ce40$b8119526@ltc.com>

----- Original Message ----- 
From: "Greg Haerr" ####@####.####
To: "Morten Rolland" ####@####.####
Cc: "NanoGUI Mailing List" ####@####.####
Sent: Friday, January 21, 2000 1:37 PM
Subject: RE: More on Image handeling and optimizations


> : 3) Alpha blending.  Yeah.  It's definite, I've gone mad.
> :    Not something for the faint of heart, or 286 projects,
> :    but it would make Nano-X rock as a high quality
> :    environment.
> 
> I would _love_ to do alpha blending.  Both MAC OS X
> and Windows 2000 are supporting it.  I'm definitely
> interested in supporting it.  Actually, after checking out
> screen shots for both the above, I decided I was
> going to write it!

OK, that's got my interest piqued.  What screen shots?

Regards,
Brad

Subject: Re: More on Image handeling and optimizations
From: Morten Rolland ####@####.####
Date: 24 Jan 2000 09:03:23 -0000
Message-Id: <388C2054.FBFE0EE8@screenmedia.no>

Greg Haerr wrote:
> 
> : Please note that currently, the GdArea/GrArea functions are
> : very important to us and Opera, and may be even more so if
> : Vidar decides to use GdArea in his efforts to beat X11 in
> : the font rendering game...
> 
> According to Vidar's last email, he already is using
> GdArea for *fantastically cool anti-aliased better-than-X11*
> font rendering.

Yes he is - but against a uniform colored background.  I'd like
to see antialiased fonts on top of a background image in Opera...
This needs full alpha blending (which BTW I'd love to see some
MMX code for... current implementation is neither fast, nor
extremely accurate.  And it uses semi-large tables.)

>   Which means I'll make GrArea fast and
> : flexible, or use other means like Blit if this is considered
> : better.  I'm worried the Blit function may end up being too
> : flexible, large, and hard to optimize, though.
> 
> A couple of words about blit:  at it's core, all a SRCCOPY
> blit is supposed to do is copy a rectangle of
> memory from one location to another, FAST.  Ideally,
> the src and dst pixel packings are the same, and
> no conversion occurs.  Then, the bigger issue
> is whether the src and dst image "line length" values
> are enforced for all images.  For instance, if the
> framebuffer video screen is "word padded", while
> user images are allowed to be submitted as only
> "byte padded" (no padding), then an optimized
> word-by-word memory copy cannot be used for
> highest speed.  In Windows, _all_ images are
> required to be DWORD padded, so that the fastest
> dword-by-dword memory copy routines can always be
> used.  If we used this convention, that would mean
> that all GrArea/GdArea images would have to be
> DWORD padded or they couldn't be used.
> 
> I'd like comments on the above...  Currently,
> Microwindows requires WORD padding
> on bitmap images, and no padding on GrArea
> images.

I have been thinking of requiring word or dword padding
on GrArea as well - better safe than sorry.

> : 1) Tiling of images.  Painting a non-uniform background
> :    can be done with tiles, which would reduce the client
> :    to server overhead a lot (e.g. only transfer the
> :    single smaller image).  I envision this to be a feature
> :    of GrArea.
> 
> I would suggest that, rather than making this a special
> feature of GrArea, that the concept of server-side
> images, with associated IDs, be introduced.  Then a
> special tile function could be used with that ID.

Hmm, in my world this smells too much like X11 with its
memory management problems (fragmentation).  We are going
to use it in an environment where there may be
*absolutely*  **no** more memory left at some point, which
means that the nano-X server should ideally allocate and
touch all the pages it will need when it starts off, and
never look back WRT memory.

Would this be feasible today?  How much dynamic allocation
is there in nano-X?  I was thinking of just doing:

    x = malloc(NANOX_MAX_MEMUSE);
    optimizer_guard = malloc(1);
    memset(x,0,NANOX_MAX_MEMUSE);
    free(x);

Very early on in 'main' to reserve the space needed.
A wrapper for malloc could monitor the memory usage
and warn when a brk has to be performed during
profiling.

> : 2) It should be possible to do sub-imageing on the client
> :    side by GrArea without temporary storage, e.g. the
> :    application wants to take a small piece out of a larger
> :    image and paint it on the screen.  This could be relevant
> :    when repainting only parts of a large image for example.
> :    By having the client side extract only the pixles needed,
> :    the transfer is more efficient.
> 
> I'm not quite sure what you're looking for here.  Are you
> talking about wanting to just paint a sub-rectangle
> of client side image bits?  Just modify the x,y,w,h
> of the original GrArea.  [Note this has big problems
> if we move to high-speed DWORD padding of images]

Yes... But we will probably not get around this completely
anyway - ie. doing word aligned memcopy on an 8 bit display
would restrict your choices on where to put the image...:-)

With a client side sub-image extraction, the image fed to
the nano-X server can be padded to be properly aligned.
You can't just change x,y,w,h of the current GrArea, as the
w and h defines the memory layout of the image.  The
psd->DrawArea I proposed had extra information that held the
underlying image size, and the subimage to paint (for
efficient clipping).  The nano-X client thing above is
basically the same thing, but with the added benefit of not
transfering more image data than needed.

> : 3) Alpha blending.  Yeah.  It's definite, I've gone mad.
> :    Not something for the faint of heart, or 286 projects,
> :    but it would make Nano-X rock as a high quality
> :    environment.
> 
> I would _love_ to do alpha blending.  Both MAC OS X
> and Windows 2000 are supporting it.  I'm definitely
> interested in supporting it.  Actually, after checking out
> screen shots for both the above, I decided I was
> going to write it!

I have some code doing this now, but it lacks finesse like
MMX and uses some memory.  It also has an accuracy problem
I'm not sure will be significant.

> : I have a concern for the Blit function:
> :
> : Is it future-proof to require the destination 'psd' to do the
> : operation?  What if the source 'psd' is better suited for the
> : job?  I'm probably thinking device-device bliting where none
> : of the devices are memory, which is probably not supported yet,
> 
> We already support device-device blitting.  I currently use
> screen-to-screen blitting to implement the scrolling for the terminal
> emulator demos.

Yes, but *inter* device blitting?  (Blitting from one gfx card
to another...?)  A simple example to illustrate the problem is
when blitting from screen to memory.  When doing this, the
memory-psd gets called to do the job, but it can not know that
there is an accelerated function waiting to be used in the
screen driver to do this in hardware...

> : I have extended the existing code with a psd->DrawArea function,
> : with a couple of emulation functions, and it seems to work like
> : a charm.  Should I continue this or try to integrate with Blit?
> : Changing the low level part to Blit later on should be easy, but
> : we may better experiment and figure out what is needed when they
> : are separate. Comments?
> 
> I'd like to see your code.  But I'd also like to see the GdArea
> code using blit, since we already have written (and now debugged)
> 1, 2, 4, 8, 16 and 32bpp blit drivers. [not all are fast].

If doing GrArea with blit, we need to setup a suitable psd for the
operation on every call to GrArea, which is kind of not needed.
One could pre-allocate a memory psd and only update the bits inside
it that are relevant to the blit in question, but this is kind of
an unclean situation.  I'd like the device-drivers and the memory
driver to fiddle with the internals of the psd as much as possible,
and not the engine code?

A psd->DrawArea that needs little or no extra overhead may be the
way togo here, and in order to reduce the number of (possibly unused
arguments) passed to the low level drivers, we could:

Define a "low-level-GC" structure like:

struct driver_gc { int x, y, w, h; void *pixles, *misc;
                   int srcw, srch, srcx, srcy;
                   PIXELVAL color;
};

And define *strictly* which arguments are needed by which functions
and operations carried out by Blit/Area etc. that uses this struct.

This way, when the Area or Blit low-level driver is called, only
the parameters actually needed have to be filled into the struct,
and a pointer to this struct is passed to the low level driver.

I realize that indexing a supplied struct may be slower than
reading off the stack (ties up one more register), but call setup
would be faster and cleaner (not a whole bunch of zero arguments),
and those that needs initialization is initialized by name, ie.:

  hwgc.x = x;
  hwgc.y = y;
  hwgc.color = c
  psd->Blit(psd,BLIT_DRAW_POINT,&hwgc);

I'm not suggesting to draw points this way, but you get the idea.
I think this looks clean, and one very important last point:
When extending the functionality of the Blit, Area or whatever and
needs another argument.... we don't have to update all the calls
to Blit/Area/... that allready exists and don't need the extra (zero)
argument (I have experience in this since improving the Area
function with gradually more functinoality...)

Comments?

Bye,
- Morten

Subject: RE: More on Image handeling and optimizations
From: Greg Haerr ####@####.####
Date: 25 Jan 2000 18:38:51 -0000
Message-Id: <C1962B36D9BBD311B0F80060083DFEFB041951@SYS.CenSoft.COM>

:  I'd like
: to see antialiased fonts on top of a background image in Opera...
: This needs full alpha blending (which BTW I'd love to see some
: MMX code for... current implementation is neither fast, nor
: extremely accurate.  And it uses semi-large tables.)

I spent the whole weekend coming up to speed on alpha
blending.  I will have a version for Microwindows out immediately
after 0.87.  We will support alpha blending for 16bpp and 32bpp
directly, as well as producing RGB->palette index conversion
tables for 8bpp systems.  (I have to make the 8bpp work,
since that's the only mode I can get my damn framebuffer
to operate in).  The table sizes for 8bpp will total 64k.  No
tables are required for 16bpp or 32bpp truecolor.

In the beginning I plan to support constant alpha
blending for an entire image, as well as per-pixel
alpha blending with 24bpp images, with the alpha
channel in an additional 8bpp.

:
: I have been thinking of requiring word or dword padding
: on GrArea as well - better safe than sorry.

It would probably be a good idea to require dword padding now.
then we would have a common format for all monochrome
and color images within Microwindows, and the data could
be used between all routines...


:: Hmm, in my world this smells too much like X11 with its
: memory management problems (fragmentation).  We are going
: to use it in an environment where there may be
: *absolutely*  **no** more memory left at some point, which
: means that the nano-X server should ideally allocate and
: touch all the pages it will need when it starts off, and
: never look back WRT memory.
: 
: Would this be feasible today?  How much dynamic allocation
: is there in nano-X?  I was thinking of just doing:
: 

This is a problem.  Microwindows, being message oriented,
doesn't allocate any space for events, it just passes messages.
Nano-X, however, is constantly allocating space for client
connections and event structures.

In addition, with the new clipping code, we will be moving
towards dynamic rectangle allocations.  [Yes, you can use
the old clipping...]

I don't think it will be easy to meet your specific zero-memory
after initialization need...





: Yes... But we will probably not get around this completely
: anyway - ie. doing word aligned memcopy on an 8 bit display
: would restrict your choices on where to put the image...:-)

No - the DWORD padding just ensures that the source
bitmap is aligned for high-speed access across cache lines,
etc.  Images are NOT restricted to multiples of the padding, and
the copy to the destination isn't restricted either.  This is
just a speed issue, not a limitation on size or placement.



: 
: Yes, but *inter* device blitting?  (Blitting from one gfx card
: to another...?)  

That's a horse of a different wheelbase.  Adding plans for 
multiple graphics cards is probably not something
Microwindows will do in the near future.


: 
: If doing GrArea with blit, we need to setup a suitable psd for the
: operation on every call to GrArea, which is kind of not needed.
: One could pre-allocate a memory psd and only update the bits inside
: it that are relevant to the blit in question, but this is kind of
: an unclean situation.  I'd like the device-drivers and the memory
: driver to fiddle with the internals of the psd as much as possible,
: and not the engine code?
: 

I've already accomplished this, but you may not have noticed.
The first 10 words of a SCREENDEVICE are exactly what you're 
looking for.  The difference is that the function pointers are also
included, so that another level of indirection isn't required.
I haven't finished all the work with the function pointers, though.

Regards,

Greg

Subject: RE: More on Image handeling and optimizations
From: "Darran D. Rimron" ####@####.####
Date: 25 Jan 2000 19:22:08 -0000
Message-Id: <NCBBLCEDENCINNMFNPBCMEHAEBAA.darran@rimron.co.uk>

> -----Original Message-----
> : Yes, but *inter* device blitting?  (Blitting from one gfx card
> : to another...?)
> That's a horse of a different wheelbase.  Adding plans for
> multiple graphics cards is probably not something
> Microwindows will do in the near future.

IIRC, can't you set up a multi-head FB with Matrox cards using the
FB-Driver as a module or similar, or is this vapourware or am I
imagining I saw it all those months ago when Frame-Buffers were in 2.1
:)

If it's true, how does the FB(/dev/fb) handle it?

	-Darran

Subject: Re: More on Image handeling and optimizations
From: Kyle Harris ####@####.####
Date: 25 Jan 2000 21:45:49 -0000
Message-Id: <388E16A6.D133152F@nexus-tech.net>

Speaking of all this image handling and optimizations.....

What is the preferred method (i.e., fastest) for putting an image on the
screen? Do I draw to a memory psd first, followed by GdBlit(), or some
other procedure?

Thanks, Kyle.

Subject: Re: More on Image handeling and optimizations
From: Morten Rolland ####@####.####
Date: 26 Jan 2000 17:15:13 -0000
Message-Id: <388F3698.8966EDFD@screenmedia.no>

Hello Greg and all!

Sorry for not being able to work as much on nano-X as
I'd like and what is probably needed.  There are just
too many areas that needs tending...  I have supplied
my patches to pre4 that illustrates our needs, and
a possible way to solve them.

Notice that I have implemented a 'driver_gc_t' low
level graphics context that works very well in the
low level Area handeling code and the interface to
it.  It helps keep the low level drivers both
elegant and fast (only a pointer gets passed around
inside and into the driver).

> I spent the whole weekend coming up to speed on alpha
> blending.  I will have a version for Microwindows out immediately
> after 0.87.  We will support alpha blending for 16bpp and 32bpp
> directly, as well as producing RGB->palette index conversion
> tables for 8bpp systems.  (I have to make the 8bpp work,
> since that's the only mode I can get my damn framebuffer
> to operate in).  The table sizes for 8bpp will total 64k.  No
> tables are required for 16bpp or 32bpp truecolor.

OK, nice!  I have alpha blending as well, for 16-bits, not
the highest of quality (some quantization noise for the
green channel because lookup of green is split in two
separate lookups pluss an addition to save table space.)

Do you unpack, multiply, add and pack for each pixel in
the 16 and 32 bit cases?  How fast is this method?
I rejected this method without even testing it because
I figured it would be too slow...

> In the beginning I plan to support constant alpha
> blending for an entire image, as well as per-pixel
> alpha blending with 24bpp images, with the alpha
> channel in an additional 8bpp.

OK, my code is ugly as it adds another pointer to memory
so the 16-bit image data are separate from an 8-bit
alpha map.  This will requre an API change, which I have
not figured out how to do yet.  One possible and
attractive solution is to make new Gd* and Gr*
functions.

> : I have been thinking of requiring word or dword padding
> : on GrArea as well - better safe than sorry.
> 
> It would probably be a good idea to require dword padding now.
> then we would have a common format for all monochrome
> and color images within Microwindows, and the data could
> be used between all routines...

Yes... Too bad my version of an optimized pixel area function
uses reversed bit order from what is both natural and common,
because of the T1lib output format....  The reason why the
Area function has been extended to support this is I need
an optimized version in the driver, and Area was natural
as I didn't want to intrude too much on Blit.  They should
probably be merged.

> This is a problem.  Microwindows, being message oriented,
> doesn't allocate any space for events, it just passes messages.
> Nano-X, however, is constantly allocating space for client
> connections and event structures.
> 
> In addition, with the new clipping code, we will be moving
> towards dynamic rectangle allocations.  [Yes, you can use
> the old clipping...]
> 
> I don't think it will be easy to meet your specific zero-memory
> after initialization need...

I don't really require *no* allocations after initialization, but
I'd like to be able to touch all the data-pages I *think* will
be needed, so that there is less chance of a sudden memory
shortage unless the server uses more memory than testing
indicated that it would (e.g. map a lot of pages initially
"for keeps" and have them used by malloc later on).
Maybe the page allocation code in Linux takes care of this
problem when there is no swap - I don't know.

We will probably have very few, maybe even zero overlapping
windows, so the clipping code will be predictable, and events
and client connections are behaving well I hope (little
fragmentation, same size reuse etc... And *no* leaks? :-)
We'll expend quite some effort to assure there are no
memory allocation problems and track down those that
exists when bringing Nano-X to production quality.

Back to the server side image storage question; what I'm
affraid of in this respect is nano-X server suddenly
dying because some seldomly used app requests a lot of
server side memory while opera or other is busy using all
available memory.  Preparing for all such possiblities are
error prone and wastes memory when not needed (if pre-
allocated).

The way we handle images (mmap into client app), and then transfer
them to the server is better for our use, as it is more acceptable
with a client crash than a nano-X server crash...
Also; when mmaping the image-file, the linux buffer cache may
reuse the mmaped pages when deemed not needed (the original
contents are read from flash memory), and common design themes can
be shared easilly.  This practically eliminates the need for
image data in RAM at the client (which mainly inspired me to
the "extract smaller image from larger" feature of GrArea or
similar).

Just a weird thought; We could mmap the image data
right into the server... Client says: "Map this file" to
server and later have pieces of it painted later on... Hmmm.
However, most big images are dynamic, so the gain will
probably be little (images in Opera).

> : Yes... But we will probably not get around this completely
> : anyway - ie. doing word aligned memcopy on an 8 bit display
> : would restrict your choices on where to put the image...:-)
> 
> No - the DWORD padding just ensures that the source
> bitmap is aligned for high-speed access across cache lines,
> etc.  Images are NOT restricted to multiples of the padding, and
> the copy to the destination isn't restricted either.  This is
> just a speed issue, not a limitation on size or placement.

Good - thats what I figured too, without giving it too much
thought.  This is not in my patch, but I agree we should settle
for 4 byte alignment.

> : Yes, but *inter* device blitting?  (Blitting from one gfx card
> : to another...?)
> 
> That's a horse of a different wheelbase.  Adding plans for
> multiple graphics cards is probably not something
> Microwindows will do in the near future.

Yes, but the screen->memory optimization is still an issue, no?
But I aggree we leave it for now - I was thinking long term.

> : If doing GrArea with blit, we need to setup a suitable psd for the
> : operation on every call to GrArea, which is kind of not needed.
> : One could pre-allocate a memory psd and only update the bits inside
> : it that are relevant to the blit in question, but this is kind of
> : an unclean situation.  I'd like the device-drivers and the memory
> : driver to fiddle with the internals of the psd as much as possible,
> : and not the engine code?
> 
> I've already accomplished this, but you may not have noticed.
> The first 10 words of a SCREENDEVICE are exactly what you're
> looking for.  The difference is that the function pointers are also
> included, so that another level of indirection isn't required.
> I haven't finished all the work with the function pointers, though.

No, not quite.  At least not in my version - I'd like to pass arguments
to the low level driver like "void *pixels" and "int dstx" for the
low level primitives.  Quickly changing arguments shouldn't
get passed in a psd IMHO.

OK.  Have a look at the patch - you may want to remove the patches to
the config files, you have them allready, or they are specific to my
setup.

I'm not done yet, but it illustrates very well what functionality
we need, and the optimizations I've attempted may prove valuable
input, and I'd love comments!

Best regards,
Morten Rolland

PS: The code should also run under X11 and other modes than 16bpp by
    using the older code in GdArea as a fallback, but this is not
    very well tested...

[Content type application/octet-stream not shown. Download]

Subject: Re: More on Image handeling and optimizations
From: "Greg Haerr" ####@####.####
Date: 26 Jan 2000 17:48:24 -0000
Message-Id: <04b301bf6824$18474ae0$15320cd0@gregh>

: Notice that I have implemented a 'driver_gc_t' low
: level graphics context that works very well in the
: low level Area handeling code and the interface to
: it.  It helps keep the low level drivers both
: elegant and fast (only a pointer gets passed around
: inside and into the driver).

I'll take a look at your patch and hopefully add
most of it.


: Do you unpack, multiply, add and pack for each pixel in
: the 16 and 32 bit cases?  How fast is this method?
: I rejected this method without even testing it because
: I figured it would be too slow...
: 

Yes, for each R, G, B component, theres a unpack,
subtract, multiply, shift, add and pack:

    dest = (source-dest)*alpha/256+dest;

for each color.  The above is equivalent to the
std alpha blending algorithm:

    dest = src * alpha + (1 - alpha) * dest;

Where alpha varies from 0 to 1.  In implementation,
alpha varies from 0 to 255.


: OK, my code is ugly as it adds another pointer to memory
: so the 16-bit image data are separate from an 8-bit
: alpha map.  This will requre an API change, which I have
: not figured out how to do yet.  One possible and
: attractive solution is to make new Gd* and Gr*
: functions.

We will need functions for per-pixel blending and constant
blending.  In addition, we might want to have separate
alpha channel blending where the alpha isn't bound
to the source bitmap.  (I guess you need it with 16bpp
images.  Why don't you just switch to 24?)

Regards,

Greg