nanogui: Thread: Speed Issues on a slow CPU


[<<] [<] Page 1 of 1 [>] [>>]
Subject: Speed Issues on a slow CPU
From: Amadeus ####@####.####
Date: 3 Oct 2007 10:15:24 +0100
Message-Id: <200710031115.03287.amadeus@iksw-muees.de>

Hello,

this is my first post to this list. I am the gui who wants to get PIXIL 
running on the Nintendo DS. This effort is hosted at www.dslinux.org.

The Nintendo DS is a dual-core machine. An ARM7 CPU @ 33 MHz for the 
sound, IO and background tasks. And an ARM946es @ 66 MHz for the main 
system.

There are several limitations and shortcommings on this system...
First of all, there is no MMU, only a MPU. So there is uClinux / Kernel 
2.6.14 running.  

Second, there are only 4 MByte of internal memory. There was a 
possibility to expand this memory +32 MByte externally, but only on 
16bit bus with only ONE(!) write strobe.

I costed me several month and a lot of gcc hacking to reclaim this 
memory as general-purpose memory. So now we have a 36MByte uClinux 
system, with 32 MByte of this memory is a bit slower than usual.

The screen is 256 x 192 pixel RGB 555.

There are no shared libraries and no dynamic loader on this system. I 
have had some issues with PIXIL about this, but no big problems.

I have nano-X and PIXIL up and running, but I am facing serious speed 
issues. The reaction to a click on the touchscreen is slow, and the 
calculator needs 1-2 seconds to display.

What I have done so far:
- define NDEBUG in nano-X drivers.
- add assembler code for horizontal and vertical lines in the driver.
- implement shared memory support.

There were small improvements in speed, but nothing worth mention.
The load monitor applet is displaying a constant load of about 40% CPU 
usage (I think because of the frequent screen updates in the load 
monitor window).

So, can someone with more experience in nano-X and PIXIL explain which 
are the most performance-critical parts of the system and how to 
improve them?

regards
Amadeus
 
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)
Subject: Re: [nanogui] Speed Issues on a slow CPU
From: Alan Cox ####@####.####
Date: 3 Oct 2007 13:52:38 +0100
Message-Id: <20071003135636.16003f36@the-village.bc.nu>

> I costed me several month and a lot of gcc hacking to reclaim this 
> memory as general-purpose memory. So now we have a 36MByte uClinux 
> system, with 32 MByte of this memory is a bit slower than usual.

I couldn't find info on this to see what its performance hit was or if
you have put a small graphics accelerator library on the ARM7

> I have nano-X and PIXIL up and running, but I am facing serious speed 
> issues. The reaction to a click on the touchscreen is slow, and the 
> calculator needs 1-2 seconds to display.

Thats slower than on an original IBM XT so bad

> What I have done so far:
> - define NDEBUG in nano-X drivers.
> - add assembler code for horizontal and vertical lines in the driver.
> - implement shared memory support.

Do you have gprof running on the system yet - embedded can have such
strange bottlenecks that gprof can reveal a lot - and you only need the
profiling side on the DS. You can do the analysis with cross tools on a
PC. 


There are a couple of oddities I noted on the web site too btw:

"No. Because the NDS has no MMU, DSLinux has no virtual memory, so it
cannot swap at all."

Thats not totally true - you can swap entire apps to/from secondary
storage if you have any kind of segmentation  (eg FCSE on
some ARM although the granularity is a bit high..) and/or PI code. You've
also presumably got protection ranges ?

BTW on

"Why doesn't DSLinux support reading from or writing to a CF"

if you've got specs for the CF interface and a tester thats probably easy
to fix now.
Subject: Re: [nanogui] Speed Issues on a slow CPU
From: Amadeus ####@####.####
Date: 3 Oct 2007 19:21:35 +0100
Message-Id: <200710032020.49684.amadeus@iksw-muees.de>

Alan,

glad to hear from you!

On Mittwoch, 3. Oktober 2007, Alan Cox wrote:
> > system, with 32 MByte of this memory is a bit slower than usual.
>
> I couldn't find info on this to see what its performance hit was

The speed of a burst read is 120ns for 16 bit. Not much... 

I have not investigated into running apps in thumb mode.

> or 
> if you have put a small graphics accelerator library on the ARM7
No. The video memory is exported as a framebuffer to the ARM9 running 
nano-X in 16bit RGB mode.

> Do you have gprof running on the system yet - embedded can have such
> strange bottlenecks that gprof can reveal a lot - and you only need
> the profiling side on the DS. You can do the analysis with cross
> tools on a PC.

I will look into gprof.

> "No. Because the NDS has no MMU, DSLinux has no virtual memory, so it
> cannot swap at all."
>
> Thats not totally true - you can swap entire apps to/from secondary
> storage if you have any kind of segmentation  (eg FCSE on
> some ARM although the granularity is a bit high..) and/or PI code.

Hmm.. swapping entire apps may be possible. The current state is that 
the kernel and a minimal userland (busybox) are occupying 2 MBytes of 
the internal RAM (XIP), and the other 2 MBytes are free for 
applications.

> You've also presumably got protection ranges ?
Yes. We use them for access control to the special memory regions of the 
DS.

> "Why doesn't DSLinux support reading from or writing to a CF"
>
> if you've got specs for the CF interface and a tester thats probably
> easy to fix now.

Where have you found that? It's outdated. With the incorporation of the 
DLDI interface (http://dldi.drunkencoders.com) we have access to most 
SD/CF based hardware on the DS.

There is one SERIOUS problem in this area I have not found a solution 
for: as soon as FAT16 with 32 KByte cluster size is used (needed for 
the common 2 GByte SD cards), DSLINUX has problems to handle them. 
There are data aborts while directory traversal. I have not heard from 
any other embedded system having this problem, and it looks rather 
strange to me.

regards
Amadeus

-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)
Subject: Re: [nanogui] Speed Issues on a slow CPU
From: Alan Cox ####@####.####
Date: 3 Oct 2007 19:31:14 +0100
Message-Id: <20071003193558.55758fc7@the-village.bc.nu>

> The speed of a burst read is 120ns for 16 bit. Not much... 
> I have not investigated into running apps in thumb mode.

That may help, also putting the blitter functions into assembler and
using the ability to lock them into cache. This is where stuff like gprof
timing can reveal the true hotspots.

> Hmm.. swapping entire apps may be possible. The current state is that 
> the kernel and a minimal userland (busybox) are occupying 2 MBytes of 
> the internal RAM (XIP), and the other 2 MBytes are free for 
> applications.

Thats pretty tight. I'd assumed you were able to use the full 32MB as
well. 

> There is one SERIOUS problem in this area I have not found a solution 
> for: as soon as FAT16 with 32 KByte cluster size is used (needed for 
> the common 2 GByte SD cards), DSLINUX has problems to handle them. 
> There are data aborts while directory traversal. I have not heard from 
> any other embedded system having this problem, and it looks rather 
> strange to me.

I've not seen similar reports at all, but I don't know how many people
are using FAT16 on such devices on a PC in Linux.

Alan
Subject: Re: [nanogui] Speed Issues on a slow CPU
From: Amadeus ####@####.####
Date: 4 Oct 2007 18:20:49 +0100
Message-Id: <200710041920.42467.amadeus@iksw-muees.de>

Hello Alan,

On Mittwoch, 3. Oktober 2007, Alan Cox wrote:
> > The speed of a burst read is 120ns for 16 bit. Not much...
> > I have not investigated into running apps in thumb mode.
>
> That may help, also putting the blitter functions into assembler and
> using the ability to lock them into cache. This is where stuff like
> gprof timing can reveal the true hotspots.

I have put the whole nano-X server into internal ram. This has helped, 
but speed is still not acceptable. I will show if I can blitt in 
assembler...

> Thats pretty tight. I'd assumed you were able to use the full 32MB as
> well.

Oops.. missunderstanding. I am able to use the full 32 MB as well.

> > There is one SERIOUS problem in this area I have not found a
> > solution for: as soon as FAT16 with 32 KByte cluster size is used
> > (needed for the common 2 GByte SD cards), DSLINUX has problems to
> > handle them. There are data aborts while directory traversal. I
> > have not heard from any other embedded system having this problem,
> > and it looks rather strange to me.
>
> I've not seen similar reports at all, but I don't know how many
> people are using FAT16 on such devices on a PC in Linux.

I can open and use these cards on my desktop PC without problems...

regards
Amadeus
-- 
We're back to the times when men were men 
and wrote their own device drivers.

(Linus Torvalds)
[<<] [<] Page 1 of 1 [>] [>>]


Powered by ezmlm-browse 0.20.