nanogui: Thread: nasty client/server bug fix


[<<] [<] Page 1 of 1 [>] [>>]
Subject: nasty client/server bug fix
From: "Greg Haerr" ####@####.####
Date: 1 Dec 2000 18:23:07 -0000
Message-Id: <073c01c05bc4$0ff361a0$15320cd0@gregh>

Morten,
    I've finally solved the rare bug that you talked about
previously between Nano-X clients and the server.  This
manifested itself when the server posted a GsError, for
instance, and clients would get out of sync, as well as
when folks used GrGetNextEventTimeout, or GrPrepareSelect.

Another aspect of this nasty bug showed up as the client
Nano-X library used the "if (storedevent)" code that
stored one event (always a GetNextEvent) when it was
looking for another reply.  In the error condition, a second
or more event was stored on top of the previously stored event,
which resulted in events being discarded.

The reason for this is actually quite complicated, but a simple
explanation is that if a GetNextEvent request was sent, but
timed out before getting a response, then the server, if
multiple events were queued and the system was busy,
would write more than one event on the wire.  This caused
the client to overwrite events waiting for the response it
was looking for.

The only solution here is that the client must have a
client-side event queue, so that's what is now implemented.
I believe this completely fixes the problem, without 
a lot of code, as well.

Did you ever fix this problem for the FreePad?

Regards,

Greg



Subject: Re: nasty client/server bug fix
From: Morten Rolland ####@####.####
Date: 3 Dec 2000 15:41:22 -0000
Message-Id: <3A2A6ACA.AD3F11D1@screenmedia.no>

Hello Greg,

> Morten,
>     I've finally solved the rare bug that you talked about
> previously between Nano-X clients and the server.

Oh.  Does that mean you didn't get my mail about this very
same problem, dated Tue 10 Oct 2000 13:22:07 +0200
subject "Hello" ?

Thing is, we fixed it back then, and one hell of a debugging
effort it was too as you found out yourself.  We found the
problem to be partly what you describe and also closely bound
to shared memory operation.

We also had to introduce a "NOP" protocol operation to be used
to make the samantics around GrPrepareSelect/GrServiceSelect
work in the case when a stored event was received just prior to
going to sleep - the nop reply is used to make the select
"wake up" so that the GrServiceSelect function can call the
event handler with the stored event without (much) delay.

> This
> manifested itself when the server posted a GsError, for
> instance, and clients would get out of sync, as well as
> when folks used GrGetNextEventTimeout, or GrPrepareSelect.

We have not looked much at GsError, but we predicted that a
timed out GrGetNextEvent would cause the same problem that
we experienced, yes.

> The reason for this is actually quite complicated, but a simple
> explanation is that if a GetNextEvent request was sent, but
> timed out before getting a response, then the server, if
> multiple events were queued and the system was busy,
> would write more than one event on the wire.

Yes, this sounds right.  Also, the sending of an async
"GetNextEvent" by the server may also be interpreted as
"shared memory command execution completed" by the client,
causing the shared memory to be reused before it was acted
upon by the server, causing a lot of trubles...

With our fix, this is a typical function in client.c that needs
to return information from the server:

void 
GrCheckNextEvent(GR_EVENT *ep)
{
        if ( nxGetStoredEvent(ep) )
                return;

        AllocReq(CheckNextEvent);
        nxFlushWait();

        nxSocketReadTyped(GrNumGetNextEvent, ep, sizeof(*ep));
        nxFlushFinish();
}

Now, the nxFlushWait() indicates that this operation needs to be
sent to the server asap so the reply can and will be received.  This
function does one of two things: flushes the buffer by write()ing
it to the socket, or send a command to execute the shared memory
segment.

The nxFlushFinish() cleans up by possibly waiting until a reply to the
flush (that may hav been sent) is received.

Here are two others:

void 
GrFlush(void)
{
        nxFlushAuto();
        nxFlushFinish();
}

void
GrSync(void)
{
        nxFlushWait();
        nxFlushFinish();
}


GrFlush will ship all queued commands to the Nano-X server, while
the GrSync function will wait until the Nano-X server has executed
the queued commands before continuing.  The nxFlush* commands will
do one thing if shared memory is used and another if it is not.

As I wrote in the beforementioned mail, I will have to work a little
bit to produce a patch for the latest releases, as we use 0.88pre3 as
a base for our changes, but I would do it if you want them.

The patch will contain a rewrite of nxproto.c that makes it a bit
more shallow and easier to follow imho.  Or I can send you the entire
thing that we are working on now if you want for inspection and ideas.

Regards,
Morten Rolland, Screen Media
[<<] [<] Page 1 of 1 [>] [>>]


Powered by ezmlm-browse 0.20.