[<<] [<] Page 1 of 1 [>] [>>] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
nasty client/server bug fix
From: "Greg Haerr" ####@####.#### Date: 1 Dec 2000 18:23:07 -0000 Message-Id: <073c01c05bc4$0ff361a0$15320cd0@gregh> Morten, I've finally solved the rare bug that you talked about previously between Nano-X clients and the server. This manifested itself when the server posted a GsError, for instance, and clients would get out of sync, as well as when folks used GrGetNextEventTimeout, or GrPrepareSelect. Another aspect of this nasty bug showed up as the client Nano-X library used the "if (storedevent)" code that stored one event (always a GetNextEvent) when it was looking for another reply. In the error condition, a second or more event was stored on top of the previously stored event, which resulted in events being discarded. The reason for this is actually quite complicated, but a simple explanation is that if a GetNextEvent request was sent, but timed out before getting a response, then the server, if multiple events were queued and the system was busy, would write more than one event on the wire. This caused the client to overwrite events waiting for the response it was looking for. The only solution here is that the client must have a client-side event queue, so that's what is now implemented. I believe this completely fixes the problem, without a lot of code, as well. Did you ever fix this problem for the FreePad? Regards, Greg | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject:
Re: nasty client/server bug fix
From: Morten Rolland ####@####.#### Date: 3 Dec 2000 15:41:22 -0000 Message-Id: <3A2A6ACA.AD3F11D1@screenmedia.no> Hello Greg, > Morten, > I've finally solved the rare bug that you talked about > previously between Nano-X clients and the server. Oh. Does that mean you didn't get my mail about this very same problem, dated Tue 10 Oct 2000 13:22:07 +0200 subject "Hello" ? Thing is, we fixed it back then, and one hell of a debugging effort it was too as you found out yourself. We found the problem to be partly what you describe and also closely bound to shared memory operation. We also had to introduce a "NOP" protocol operation to be used to make the samantics around GrPrepareSelect/GrServiceSelect work in the case when a stored event was received just prior to going to sleep - the nop reply is used to make the select "wake up" so that the GrServiceSelect function can call the event handler with the stored event without (much) delay. > This > manifested itself when the server posted a GsError, for > instance, and clients would get out of sync, as well as > when folks used GrGetNextEventTimeout, or GrPrepareSelect. We have not looked much at GsError, but we predicted that a timed out GrGetNextEvent would cause the same problem that we experienced, yes. > The reason for this is actually quite complicated, but a simple > explanation is that if a GetNextEvent request was sent, but > timed out before getting a response, then the server, if > multiple events were queued and the system was busy, > would write more than one event on the wire. Yes, this sounds right. Also, the sending of an async "GetNextEvent" by the server may also be interpreted as "shared memory command execution completed" by the client, causing the shared memory to be reused before it was acted upon by the server, causing a lot of trubles... With our fix, this is a typical function in client.c that needs to return information from the server: void GrCheckNextEvent(GR_EVENT *ep) { if ( nxGetStoredEvent(ep) ) return; AllocReq(CheckNextEvent); nxFlushWait(); nxSocketReadTyped(GrNumGetNextEvent, ep, sizeof(*ep)); nxFlushFinish(); } Now, the nxFlushWait() indicates that this operation needs to be sent to the server asap so the reply can and will be received. This function does one of two things: flushes the buffer by write()ing it to the socket, or send a command to execute the shared memory segment. The nxFlushFinish() cleans up by possibly waiting until a reply to the flush (that may hav been sent) is received. Here are two others: void GrFlush(void) { nxFlushAuto(); nxFlushFinish(); } void GrSync(void) { nxFlushWait(); nxFlushFinish(); } GrFlush will ship all queued commands to the Nano-X server, while the GrSync function will wait until the Nano-X server has executed the queued commands before continuing. The nxFlush* commands will do one thing if shared memory is used and another if it is not. As I wrote in the beforementioned mail, I will have to work a little bit to produce a patch for the latest releases, as we use 0.88pre3 as a base for our changes, but I would do it if you want them. The patch will contain a rewrite of nxproto.c that makes it a bit more shallow and easier to follow imho. Or I can send you the entire thing that we are working on now if you want for inspection and ideas. Regards, Morten Rolland, Screen Media | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[<<] [<] Page 1 of 1 [>] [>>] |