nanogui: More on Image handeling and optimizations

Previous by date:	24 Jan 2000 09:03:23 -0000 Re: pre4 makefile problems with Linux svga, Greg Haerr
Next by date:	24 Jan 2000 09:03:23 -0000 Re: pre4 makefile problems with Linux svga, Chris Johns
Previous in thread:	24 Jan 2000 09:03:23 -0000 Re: More on Image handeling and optimizations, Bradley D. LaRonde
Next in thread:	24 Jan 2000 09:03:23 -0000 Re: More on Image handeling and optimizations, Greg Haerr

Subject: Re: More on Image handeling and optimizations
From: Morten Rolland ####@####.####
Date: 24 Jan 2000 09:03:23 -0000
Message-Id: <388C2054.FBFE0EE8@screenmedia.no>

Greg Haerr wrote:
> 
> : Please note that currently, the GdArea/GrArea functions are
> : very important to us and Opera, and may be even more so if
> : Vidar decides to use GdArea in his efforts to beat X11 in
> : the font rendering game...
> 
> According to Vidar's last email, he already is using
> GdArea for *fantastically cool anti-aliased better-than-X11*
> font rendering.

Yes he is - but against a uniform colored background.  I'd like
to see antialiased fonts on top of a background image in Opera...
This needs full alpha blending (which BTW I'd love to see some
MMX code for... current implementation is neither fast, nor
extremely accurate.  And it uses semi-large tables.)

>   Which means I'll make GrArea fast and
> : flexible, or use other means like Blit if this is considered
> : better.  I'm worried the Blit function may end up being too
> : flexible, large, and hard to optimize, though.
> 
> A couple of words about blit:  at it's core, all a SRCCOPY
> blit is supposed to do is copy a rectangle of
> memory from one location to another, FAST.  Ideally,
> the src and dst pixel packings are the same, and
> no conversion occurs.  Then, the bigger issue
> is whether the src and dst image "line length" values
> are enforced for all images.  For instance, if the
> framebuffer video screen is "word padded", while
> user images are allowed to be submitted as only
> "byte padded" (no padding), then an optimized
> word-by-word memory copy cannot be used for
> highest speed.  In Windows, _all_ images are
> required to be DWORD padded, so that the fastest
> dword-by-dword memory copy routines can always be
> used.  If we used this convention, that would mean
> that all GrArea/GdArea images would have to be
> DWORD padded or they couldn't be used.
> 
> I'd like comments on the above...  Currently,
> Microwindows requires WORD padding
> on bitmap images, and no padding on GrArea
> images.

I have been thinking of requiring word or dword padding
on GrArea as well - better safe than sorry.

> : 1) Tiling of images.  Painting a non-uniform background
> :    can be done with tiles, which would reduce the client
> :    to server overhead a lot (e.g. only transfer the
> :    single smaller image).  I envision this to be a feature
> :    of GrArea.
> 
> I would suggest that, rather than making this a special
> feature of GrArea, that the concept of server-side
> images, with associated IDs, be introduced.  Then a
> special tile function could be used with that ID.

Hmm, in my world this smells too much like X11 with its
memory management problems (fragmentation).  We are going
to use it in an environment where there may be
*absolutely*  **no** more memory left at some point, which
means that the nano-X server should ideally allocate and
touch all the pages it will need when it starts off, and
never look back WRT memory.

Would this be feasible today?  How much dynamic allocation
is there in nano-X?  I was thinking of just doing:

    x = malloc(NANOX_MAX_MEMUSE);
    optimizer_guard = malloc(1);
    memset(x,0,NANOX_MAX_MEMUSE);
    free(x);

Very early on in 'main' to reserve the space needed.
A wrapper for malloc could monitor the memory usage
and warn when a brk has to be performed during
profiling.

> : 2) It should be possible to do sub-imageing on the client
> :    side by GrArea without temporary storage, e.g. the
> :    application wants to take a small piece out of a larger
> :    image and paint it on the screen.  This could be relevant
> :    when repainting only parts of a large image for example.
> :    By having the client side extract only the pixles needed,
> :    the transfer is more efficient.
> 
> I'm not quite sure what you're looking for here.  Are you
> talking about wanting to just paint a sub-rectangle
> of client side image bits?  Just modify the x,y,w,h
> of the original GrArea.  [Note this has big problems
> if we move to high-speed DWORD padding of images]

Yes... But we will probably not get around this completely
anyway - ie. doing word aligned memcopy on an 8 bit display
would restrict your choices on where to put the image...:-)

With a client side sub-image extraction, the image fed to
the nano-X server can be padded to be properly aligned.
You can't just change x,y,w,h of the current GrArea, as the
w and h defines the memory layout of the image.  The
psd->DrawArea I proposed had extra information that held the
underlying image size, and the subimage to paint (for
efficient clipping).  The nano-X client thing above is
basically the same thing, but with the added benefit of not
transfering more image data than needed.

> : 3) Alpha blending.  Yeah.  It's definite, I've gone mad.
> :    Not something for the faint of heart, or 286 projects,
> :    but it would make Nano-X rock as a high quality
> :    environment.
> 
> I would _love_ to do alpha blending.  Both MAC OS X
> and Windows 2000 are supporting it.  I'm definitely
> interested in supporting it.  Actually, after checking out
> screen shots for both the above, I decided I was
> going to write it!

I have some code doing this now, but it lacks finesse like
MMX and uses some memory.  It also has an accuracy problem
I'm not sure will be significant.

> : I have a concern for the Blit function:
> :
> : Is it future-proof to require the destination 'psd' to do the
> : operation?  What if the source 'psd' is better suited for the
> : job?  I'm probably thinking device-device bliting where none
> : of the devices are memory, which is probably not supported yet,
> 
> We already support device-device blitting.  I currently use
> screen-to-screen blitting to implement the scrolling for the terminal
> emulator demos.

Yes, but *inter* device blitting?  (Blitting from one gfx card
to another...?)  A simple example to illustrate the problem is
when blitting from screen to memory.  When doing this, the
memory-psd gets called to do the job, but it can not know that
there is an accelerated function waiting to be used in the
screen driver to do this in hardware...

> : I have extended the existing code with a psd->DrawArea function,
> : with a couple of emulation functions, and it seems to work like
> : a charm.  Should I continue this or try to integrate with Blit?
> : Changing the low level part to Blit later on should be easy, but
> : we may better experiment and figure out what is needed when they
> : are separate. Comments?
> 
> I'd like to see your code.  But I'd also like to see the GdArea
> code using blit, since we already have written (and now debugged)
> 1, 2, 4, 8, 16 and 32bpp blit drivers. [not all are fast].

If doing GrArea with blit, we need to setup a suitable psd for the
operation on every call to GrArea, which is kind of not needed.
One could pre-allocate a memory psd and only update the bits inside
it that are relevant to the blit in question, but this is kind of
an unclean situation.  I'd like the device-drivers and the memory
driver to fiddle with the internals of the psd as much as possible,
and not the engine code?

A psd->DrawArea that needs little or no extra overhead may be the
way togo here, and in order to reduce the number of (possibly unused
arguments) passed to the low level drivers, we could:

Define a "low-level-GC" structure like:

struct driver_gc { int x, y, w, h; void *pixles, *misc;
                   int srcw, srch, srcx, srcy;
                   PIXELVAL color;
};

And define *strictly* which arguments are needed by which functions
and operations carried out by Blit/Area etc. that uses this struct.

This way, when the Area or Blit low-level driver is called, only
the parameters actually needed have to be filled into the struct,
and a pointer to this struct is passed to the low level driver.

I realize that indexing a supplied struct may be slower than
reading off the stack (ties up one more register), but call setup
would be faster and cleaner (not a whole bunch of zero arguments),
and those that needs initialization is initialized by name, ie.:

  hwgc.x = x;
  hwgc.y = y;
  hwgc.color = c
  psd->Blit(psd,BLIT_DRAW_POINT,&hwgc);

I'm not suggesting to draw points this way, but you get the idea.
I think this looks clean, and one very important last point:
When extending the functionality of the Blit, Area or whatever and
needs another argument.... we don't have to update all the calls
to Blit/Area/... that allready exists and don't need the extra (zero)
argument (I have experience in this since improving the Area
function with gradually more functinoality...)

Comments?

Bye,
- Morten

Previous by date:	24 Jan 2000 09:03:23 -0000 Re: pre4 makefile problems with Linux svga, Greg Haerr
Next by date:	24 Jan 2000 09:03:23 -0000 Re: pre4 makefile problems with Linux svga, Chris Johns
Previous in thread:	24 Jan 2000 09:03:23 -0000 Re: More on Image handeling and optimizations, Bradley D. LaRonde
Next in thread:	24 Jan 2000 09:03:23 -0000 Re: More on Image handeling and optimizations, Greg Haerr