[<<] [<] Page 1 of 1 [>] [>>] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Call for action - source code needed!
From: Greg Haerr ####@####.#### Date: 18 Oct 1999 17:31:08 -0000 Message-Id: <01BF195C.16214CF0.greg@censoft.com> On Monday, October 18, 1999 10:57 AM, Michael Engel ####@####.#### wrote: : mwin runs nicely so far, but a little slow ... : I will produce some new mwin stuff today that you can try. All, I'm glad to hear that more of the list is trying out microwindows on their palm pc's and other hardware. It's nice to see this stuff used, and I welcome the comments. In regards to speed, basically, everything comes down to two routines, drawhorzline and bitblt. At their lowest level, for 8bpp and 16bpp, these routines ultimately rely on memcpy, or a wmemcpy. Coding this routines as while(--cnt >=0) *dst++ = *src++; greatly slows them down, so I call memcpy. What I am looking for are inline versions of byte, word, and double word memcpy's, for the 8bpp, 16bpp and 32bpp. We need inline so that the procedure call overhead is minimized. In addition, the memcpy routines need to check for odd or unaligned data, move it, then move to double-word moves for the main loop, then end with moving odd or unaligned data. I'd bother to write all these routines, but I'm looking for some __fast__ routines, that are known to work... Any pointers or submissions would be appreciated. This will _definitely_ speed up microwindows. Greg | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: Call for action - source code needed!
From: Alan Cox ####@####.#### Date: 18 Oct 1999 17:44:05 -0000 Message-Id: <E11dGfK-0000Cw-00@the-village.bc.nu> > What I am looking for are inline versions of byte, word, and > double word memcpy's, for the 8bpp, 16bpp and 32bpp. We need inline For most platforms you won't beat the glibc memcpy functions. On a single issue CPU you may want to look at the X macros (but keep a bucket handy) that write these operations as a duffs device. In fact for a 640 pixel wide 4bit fram buffer you can quite sanely expand the duffs to device out so you do one pass (80 32bit ops) of movel D0, (A0)+ movel D0, (A0)+ ... ret You load D0 with the colour pattern fix the ends by hand then jmp to the right point in the unrolled copy loop. You can unroll copies as well as colour sets this way. You may want to get people to profile the binaries with gprof before you get deep into this - be sure its not something more fundamentally dumb going on in the clipping code or similar. Alan | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
RE: Call for action - source code needed!
From: Greg Haerr ####@####.#### Date: 18 Oct 1999 17:52:55 -0000 Message-Id: <01BF195F.0E0FB080.greg@censoft.com> : You may want to get people to profile the binaries with gprof before you get : deep into this - be sure its not something more fundamentally dumb going on : in the clipping code or similar. Alan - thanks for the quick response. The unrolled copy loop with a jump into the middle sounds very interesting; I'll look at glibc also and perhaps just inline that stuff. Writing the bitblt turned out to be quite hard to do write, and it's still not totally right, especially because of some clipping issues. So, I have a quick routine (actually close to the one you originally modified of Dave's) that determines whether the entire bitblt area is completely unobscured, or not. In the completely visible case, I can see major speed differences depending on the implementation of memcpy. In the partially obscured case, I have to resort to bit blitting by reading and writing every pixel, which is _slow as hell_. The answer to the latter is having the engine chop each portion of the bitblit rectangle into completely visible regions, and recursively calling bitblit on that rectangle, but that was a bit much for this last weekend!! Greg | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: Call for action - source code needed!
From: Alan Cox ####@####.#### Date: 18 Oct 1999 17:56:09 -0000 Message-Id: <E11dGrP-0000Eu-00@the-village.bc.nu> > In the completely visible case, I can see major speed differences depending on > the implementation of memcpy. In the partially obscured case, I have > to resort to bit blitting by reading and writing every pixel, which is _slow as hell_. > The answer to the latter is having the engine chop each portion of the bitblit > rectangle into completely visible regions, and recursively calling bitblit on > that rectangle, but that was a bit much for this last weekend!! You may want to look at X11 here. X has a nice algorithm that builds a rectangle list from a set of clipping data and tends to output lots of wide rectangles. Alan | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: Call for action - source code needed!
From: "Frank W. Miller" ####@####.#### Date: 18 Oct 1999 18:03:51 -0000 Message-Id: <199910181748.NAA22284@macalpine.cornfed.com> > For most platforms you won't beat the glibc memcpy functions. On a single The *BSD kernel bcopy routines are quite fast as well, also assembly and unencumbered. Later, FM -- Frank W. Miller Cornfed Systems Inc www.cornfed.com | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[<<] [<] Page 1 of 1 [>] [>>] |