nanogui: Thread: Corrupted Packet Nano-X


[<<] [<] Page 1 of 2 [>] [>>]
Subject: Corrupted Packet Nano-X
From: "Detzner, Peter" ####@####.####
Date: 3 Sep 2007 10:24:56 +0100
Message-Id: <C68208999FE4B94888BDFF4D9B8CF68CE3F884@w2kex2.insta.de>

Hey,

I am using MicroWindows/Nano-X in Version 0.91. When I am doing some
stuff with Bitmaps, my application is killed with this message:

nxclient: bad readblock -1, errno 104
nxclient 548: Corrupted packet
[1] + Killed                     nano-X

Do you have an idea, why this happenes? After loading an Image from
Buffer, I take care, that freeImage is also executed as the next step -
of course after "drawImageToFit(...)"... 

Please help me...

Thanks,

pete
Subject: Re: [nanogui] Corrupted Packet Nano-X
From: "Greg Haerr" ####@####.####
Date: 3 Sep 2007 18:04:44 +0100
Message-Id: <022201c7ee4c$3a411f20$2f01a8c0@HaydenLake>

> nxclient 548: Corrupted packet
Do you have an idea, why this happenes? After loading an Image from
Buffer, I take care, that freeImage is also executed as the next step -
of course after "drawImageToFit(...)"... 

This seems to be something to do with the maximum
request size packet overflowing a server buffer.
Grep the headers for a MAXREQSZ define or
something like that and increase it.  The system
is supposed to break down images into smaller
pieces but may not be for the GrDrawImageToFit function
you're using.

The overflow buffer is in nanox/srvnet.c::GsHandleClient()
IIRC.

Regards,

Greg
Subject: Re: [nanogui] Corrupted Packet Nano-X
From: "Greg Haerr" ####@####.####
Date: 12 Sep 2007 07:12:05 +0100
Message-Id: <118f01c7f503$d3dafe40$6401a8c0@winXP>

> So maybe the multithreading is the problem?


I should have asked this in the beginning, its definitely the problem.
Despite having THREADSAFE=Y, if more than one makes
a request with a non-void GrXXX function (that is, one that
requires a response from the server), then the client/server
interaction on the single pipe to the application gets out of
sync, and the "corrupted packet" message is generated.
This is because two threads have attempted to read or
write the pipe at the same time, and junk gets written
in the middle of a packet.

The THREADSAFE option puts mutex's to protect
against a task switch between two writers, but
can't protect against a thread trying to read a response
while another, usually the main thread, is in GrGetNextEvent.

Regards,

Greg

ps: please post reponses to the list
Subject: Re: [nanogui] Corrupted Packet Nano-X
From: "Greg Haerr" ####@####.####
Date: 12 Sep 2007 07:19:14 +0100
Message-Id: <11a601c7f504$d4ad4cf0$6401a8c0@winXP>

> Also, are you running a multithreaded application?

> The Problem appears, when I am trying to do a lot of "focus in" and "focus
out". In fact, when I am switching two bitmaps, after a couple of times, it
is getting very slow until the "out of memory"/ "corrupted packet" message
appears...

The only easy fix to this issue will be to only allow additional
threads other than the main original thread to execute void GrXXX
functions (that is, typically draw functions) only, and allow ONLY
the main thread to execute non-void functions or any function that
could require a wait and/or a read from the server.  In this way,
the THREADSAFE option protects the multiple threads doing
write-only client->server operations from stepping on each
other in the middle of a request, but the server->client
communication is read and processed only by a single
thread, the main thread.

BTW, the reason this can't be fixed given the current
protocol specification is that there isn't a standard-length
reply from the server, and there's only a basic queuing
mechanism in the client library.  This means that any thread
reading the server pipe can't know how many bytes to
read, and thus may get interrupted and task switched
while in the middle of reading data from the server.
The next thread wakes up, does a read, and gets
unexpected crap from the middle of the previous
thread's response packet.

Regards,

Greg

Subject: WG: [nanogui] Corrupted Packet Nano-X
From: "Detzner, Peter" ####@####.####
Date: 12 Sep 2007 07:48:56 +0100
Message-Id: <C68208999FE4B94888BDFF4D9B8CF68CE0F14D@w2kex2.insta.de>

Hey,

Nope, the patch doesnt fix the problem. Yes it is a multithreading system. The UPNP Library creates a threads pool (2 <= threadspool <= 12). 

So maybe the multithreading is the problem?

Regards,

Pete

-----Ursprüngliche Nachricht-----
Von: Greg Haerr ####@####.####
Gesendet: Montag, 10. September 2007 19:09
An: Detzner, Peter
Betreff: Re: [nanogui] Corrupted Packet Nano-X

Peter -

Try adding this patch to nanox/client.c, and let me know whether this fixes the problem.

Also, are you running a multithreaded application?

Regards,

Greg


----- Original Message -----
From: "Detzner, Peter" ####@####.####
To: "Greg Haerr" ####@####.####
Sent: Monday, September 10, 2007 6:51 AM
Subject: AW: [nanogui] Corrupted Packet Nano-X



Hey,

I have still this problem... I've changed already MAXREQST in the srvnet.c, but there is still the problem. I've attached 3 files of it. I guess, it is enough to understand my programm...

The Problem appears, when I am trying to do a lot of "focus in" and "focus out". In fact, when I am switching two bitmaps, after a couple of times, it is getting very slow until the "out of memory"/ "corrupted packet" message appears...

Please help me, it is my final dissertation and I have no clues any more...




-----Ursprüngliche Nachricht-----
Von: Greg Haerr ####@####.####
Gesendet: Montag, 3. September 2007 19:02
An: Detzner, Peter; ####@####.####
Betreff: Re: [nanogui] Corrupted Packet Nano-X

> nxclient 548: Corrupted packet
Do you have an idea, why this happenes? After loading an Image from Buffer, I take care, that freeImage is also executed as the next step - of course after "drawImageToFit(...)"...

This seems to be something to do with the maximum request size packet overflowing a server buffer.
Grep the headers for a MAXREQSZ define or something like that and increase it.  The system is supposed to break down images into smaller pieces but may not be for the GrDrawImageToFit function you're using.

The overflow buffer is in nanox/srvnet.c::GsHandleClient() IIRC.

Regards,

Greg




Subject: WG: [nanogui] Corrupted Packet Nano-X
From: "Detzner, Peter" ####@####.####
Date: 12 Sep 2007 07:49:35 +0100
Message-Id: <C68208999FE4B94888BDFF4D9B8CF68CE0F14E@w2kex2.insta.de>

 

-----Ursprüngliche Nachricht-----
Von: Greg Haerr ####@####.#### 
Gesendet: Mittwoch, 12. September 2007 08:11
An: Detzner, Peter
Cc: Nanogui List
Betreff: Re: [nanogui] Corrupted Packet Nano-X

> So maybe the multithreading is the problem?


I should have asked this in the beginning, its definitely the problem.
Despite having THREADSAFE=Y, if more than one makes a request with a non-void GrXXX function (that is, one that requires a response from the server), then the client/server interaction on the single pipe to the application gets out of sync, and the "corrupted packet" message is generated.
This is because two threads have attempted to read or write the pipe at the same time, and junk gets written in the middle of a packet.

The THREADSAFE option puts mutex's to protect against a task switch between two writers, but can't protect against a thread trying to read a response while another, usually the main thread, is in GrGetNextEvent.

Regards,

Greg

ps: please post reponses to the list

Subject: WG: [nanogui] Corrupted Packet Nano-X
From: "Detzner, Peter" ####@####.####
Date: 12 Sep 2007 07:49:35 +0100
Message-Id: <C68208999FE4B94888BDFF4D9B8CF68CE0F14F@w2kex2.insta.de>

 

-----Ursprüngliche Nachricht-----
Von: Greg Haerr ####@####.#### 
Gesendet: Mittwoch, 12. September 2007 08:19
An: Detzner, Peter
Cc: Nanogui List
Betreff: Re: [nanogui] Corrupted Packet Nano-X

> Also, are you running a multithreaded application?

> The Problem appears, when I am trying to do a lot of "focus in" and 
> "focus
out". In fact, when I am switching two bitmaps, after a couple of times, it is getting very slow until the "out of memory"/ "corrupted packet" message appears...

The only easy fix to this issue will be to only allow additional threads other than the main original thread to execute void GrXXX functions (that is, typically draw functions) only, and allow ONLY the main thread to execute non-void functions or any function that could require a wait and/or a read from the server.  In this way, the THREADSAFE option protects the multiple threads doing write-only client->server operations from stepping on each other in the middle of a request, but the server->client communication is read and processed only by a single thread, the main thread.

BTW, the reason this can't be fixed given the current protocol specification is that there isn't a standard-length reply from the server, and there's only a basic queuing mechanism in the client library.  This means that any thread reading the server pipe can't know how many bytes to read, and thus may get interrupted and task switched while in the middle of reading data from the server.
The next thread wakes up, does a read, and gets unexpected crap from the middle of the previous thread's response packet.

Regards,

Greg



Subject: Re: Corrupted Packet Nano-X
From: "Aaron J. Grier" ####@####.####
Date: 12 Sep 2007 19:44:53 +0100
Message-Id: <20070912184437.GS22036@mordor.unix.fryenet>

On Wed, Sep 12, 2007 at 12:18:39AM -0600, Greg Haerr wrote:
> The only easy fix to this issue will be to only allow additional
> threads other than the main original thread to execute void GrXXX
> functions (that is, typically draw functions) only, and allow ONLY the
> main thread to execute non-void functions or any function that could
> require a wait and/or a read from the server.  In this way, the
> THREADSAFE option protects the multiple threads doing write-only
> client->server operations from stepping on each other in the middle of
> a request, but the server->client communication is read and processed
> only by a single thread, the main thread.
> 
> BTW, the reason this can't be fixed given the current protocol
> specification is that there isn't a standard-length reply from the
> server, and there's only a basic queuing mechanism in the client
> library.  This means that any thread reading the server pipe can't
> know how many bytes to read, and thus may get interrupted and task
> switched while in the middle of reading data from the server.  The
> next thread wakes up, does a read, and gets unexpected crap from the
> middle of the previous thread's response packet.

I've also run into this problem since trying our app with client/server.

I have previously been using a "big lock" approach with our
multithreaded application, replacing _all_ nano-X calls with mutex
wrappers via link-time magic.  (it was implemented a couple years before
the THREADSAFE option appeared in nano-X.)  the big lock has proven
reliable (we have shipped hundreds of instruments since late 2003 and
never run into this problem) but it does mean there is some risk of
denial-of-service / priority inversion since a lower priority thread
could potentially starve out a higher one by making repeated graphics
calls.

I'm wondering if a simpler more appropriate fix would be to put a lock
on the client read side for server responses.  I'll cook something up
and see how it works.  otherwise I'll dust off our big lock code.  (if
anybody's interested, I'll post it to the list.  it is GNU-specific.)

-- 
  Aaron J. Grier  |   Frye Electronics, Tigard, OR   |  ####@####.####
Subject: Re: [nanogui] Re: Corrupted Packet Nano-X
From: "Greg Haerr" ####@####.####
Date: 12 Sep 2007 21:24:26 +0100
Message-Id: <462d01c7f57a$e9fdfb60$0300a8c0@RDP>

: I'm wondering if a simpler more appropriate fix would be to put a lock
: on the client read side for server responses.  I'll cook something up
: and see how it works.  otherwise I'll dust off our big lock code.  (if
: anybody's interested, I'll post it to the list.  it is GNU-specific.)

Yes, I'd like to see that code.  (actually you may have sent
it some time ago, this sounds familiar).  However, since
THREADSAFE wraps all Gr functions, what's the difference
with your approach?

The current THREADSAFE implementation uses the same
lock around all Gr calls, including client server read calls,
so the above shouldn't be an issue, right?

Regards,

Greg

Subject: Re: [nanogui] Corrupted Packet Nano-X
From: ####@####.####
Date: 14 Sep 2007 13:16:03 +0100
Message-Id: <20070914071557.bfxfahk7go8sc0w4@localhost>

Hi Greg,

> So maybe the multithreading is the problem?
>
>
> I should have asked this in the beginning, its definitely the problem.
> Despite having THREADSAFE=Y, if more than one makes
> a request with a non-void GrXXX function (that is, one that
> requires a response from the server), then the client/server
> interaction on the single pipe to the application gets out of
> sync, and the "corrupted packet" message is generated.
> This is because two threads have attempted to read or
> write the pipe at the same time, and junk gets written
> in the middle of a packet.

Am I missing something?
How can two threads read or write the pipe at the same time? All GrXXX  
functions are protected by the nxGlobalLock mutex, which would mean,  
that only one thread at the time can access the pipe. The one holding  
the lock.

>
> The THREADSAFE option puts mutex's to protect
> against a task switch between two writers, but
> can't protect against a thread trying to read a response
> while another, usually the main thread, is in GrGetNextEvent.

I believe that only with my patch, which unlocks before select and  
locks after select in _GrGetNextEventTimeout you can have a situation  
like the one you describe.

Besides that if you apply my patch and call non void functions from  
different threads you can have a deadlock because the read inside  
ReadBlock is blocking. (You can make it non blocking, which might lead  
to starvation;)
The following is the scenario:
In GrGetNextEvent unlock and call select
No event is coming

Another thread can run and since we are unlocked do stuff, which leads  
to a call to read.
Since now this GrXXX took the lock and GrGetNextEvent would need the  
lock to process the event, we are deadlocked.

With LINK_APP_INTO_SERVER everything seems to work, which looks like  
there is something wrong with the client server communication.

Do you think it would help to have different read and write file descriptors?

Regards,

Robert
[<<] [<] Page 1 of 2 [>] [>>]


Powered by ezmlm-browse 0.20.