[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Major JFFS2 bug (?)



Let me preface this response with the disclaimer that in no way, shape or
form am I a POSIX standard expert:

That out of the way ;)....

>Vipin.Malik@xxxxxxx.com said:
>>  The system does (seems to) do page writes to JFFS2, *and commits them
>> by having valid CRC's/versions ID's etc.). IMHO this is wrong. Till
>> all the pages (i.e. data send down in a single write) is written to
>> JFFS2, they must not be "committed" *logically* to the fs.

>This was my interpretation of the POSIX spec too. I was argued down.
>I'm fairly sure that none of the other journalling filesystems do this
>either.

Actually, I thought that write() was allowed to return before having written
*all the data desired*- but this is *not* happening. write() always seems to
write the entire data.

I was aware of this fact and was monitoring it in my program. _however_
POSIX say's this may occur, and NOT *it should* occur.

If we are able to accept all the data presented to us in the write(), why
can't we make sure that the older data is not invalidated till *all* the new
data is safely written (and not just a page size worth).

There is no program out there that will break if we implement this
mode of operation. As there is nothing special about *the time* an async
power down occurs, one can easily assume that it occured even before
the write() was called, rather than in the middle of executing it.

Does this make logical sense (to anyone besides me?).

>AFAICT, generic_file_write() just doesn't permit the semantics you
>desire. 

Hmm, if I understand you correctly- you are saying that the JFFS2 driver
jffs2_write() portion (of the system write()) has no idea that the kernel
VFS has broken the original "user" write() into (multiple) page size worth
jffs2_write() call's?

If this is so, and patching (just for JFFS2) the VFS write to pass
*all* the data in a single write to jffs2_write() is not feaseable, then
I see another method to provide this _atomic_ write functionality:

We can implement a pair of ioctl's, namely JFFS2_START_ATOMIC_WRITE and
JFFS2_END_ATOMIC_WRITE that we use as follows:

CASE1: No ioctl used:

<snip from user code>
blah
blah
write(fd, buf, 0 < size < infinity); /* atomic for size <= PAGE_SIZE
                                   NOT atomic for size > PAGE_SIZE */
blah
blah
<end snip>

In this case, which would be a "typical" application out there- which
probably does NOT depend on atomic writes anyway- as nothing out there
provides this feature, the write is by default NOT atomic for data
writes greater than PAGE_SIZE.

If we do want the atomic feature we would do:
CASE 2: iotcl used:

<snip from user code>
blah
blah

ioctl(JFFS2_START_ATOMIC_WRITE, size); /* This guy stops logical commit
of properly written nodes. We use page 'n' of 'm' type fields in
the node hdr to do this (one method). The "size" argument is used to 
calculate what 'm' is. 'n' just starts from 0 and gets incremented with
every call to jffs2_write(). In case power fails, only nodes with all 'm'
pages AND latest version ID's AND good CRC's will be accepted as newer data
for that file, else older nodes will be used. If pwr does NOT fail, then on
the 'm'th page write, the older data is marked for garbage collection and
the newer list of nodes is logically inserted into the node chain, thus
atomically and in a power safe manner, replacing the older data. */

write(fd, buf, 0 < size < infinity); /* NOW guaranteed to be atomic due to
ioctl wrappers around it */

ioctl(JFFS2_END_ATOMIC_WRITE); /* this guy causes the commit of all data
 written to JFFS2 since last START_WRITE ioctl. Actually all it is doing
is turning *off* the above feature turned on with the first ioctl. By
design, once all data (i.e. all 'n' of 'm' pages are written the data is
already logically committed. */

blah
blah
<end snip from user code>


This will provide us with the functionality that we want- when we
want to use it. It's NOT quite as elegant as having a write() that's
atomic in itself for all data sizes, but it's  better than the
 alternative- namely no alternative!

>You need to provide your own jffs2_file_write() function, and the
>easiest 
>way to make it conform is to prevent it from ever writing more than 
>PAGE_SIZE bytes - it's perfectly entitled to return early, according to
>the 
>spec. Then watch all your programs break - not even glibc expects this 
>behaviour, even though it's permitted by the spec.

I think that's true. We have to take into account expected behaviour
of programs out there. The ioctl method is one possible solution that 
I can think of.

I must emphasize (MHO) again how important it is to at least provide a means
to accomplish this. The alternative is to keep a flip flop data struct
in user space, where you look at CRC's and time stamps (or incrementing
ID's) to determine  which of your structure is the latest one. In other
words, duplicate functionality already present in JFFS/2 and do it in
a kludgy way- shudder!

Vipin

To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com