[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Major JFFS2 bug (?)



I don't want to seem to "push" this issue beyond a certain reasonable,
debateable limit, but as long as there is some interest in debating it, I'll
provide/clarify my view point.

David Woodhouse said:
>Ponder the following situation:
> 1. You request a write of 5000 bytes.
> 2. JFFS2 writes 4096 bytes of it and returns accordingly.
> 3. Power is lost before you can do anything about it or attempt to
>	perform a second write call to write the remaining 904 bytes.

(I presume you mean this is the new powr fail safe wite() functionality you
are describing)
>Compare with this situation:
> 1. You request a write of 5000 bytes.
> 2. JFFS2 writes 4096 bytes and is about to write the remainder.
> 3. Power is lost before it does so.

>What is the difference between the results of those two situations? The
>first is perfectly legal. The second you argue is not, and I'm half
>inclined
>to agree with you - but does it actually matter?

The bottom line is, the "roll back and recover" on power fail during a write
needs to happen at *some* level. Having an end implementation, where this is
not implemented at any level (system or user) will result in unreliable
config file update type operations (where either old data or new data is ok,
as long as the two don't get mixed).

My argument is to put the roll back and recover mechanism inside the
JFFS2_write() handling, the other argument is for the users to handle it
themselves when they need it.

In the system implementation method *all* programs can natively utilize this
feature. It needs to be coded once, tested once- then it'll always work.
That's the advantage.

The disadvantage of this approach is that it (unnecessarly) complicates the
kernel.

My argument against this is two fold:

1. The complication is in the JFFS2 driver, and folks that don't need JFFS
will never encounter it.
2. The complication is just an extension of existing functionality. Even
today, when JFFS2 tries to overwrite a portion of a file (or any data), it
protects it with a CRC. Even the header is has it's own CRC for power fail
protection. If power fails, the CRC will be bad the next time aound while
building the fs and the fs will "roll back and recover" to the last good
"chunk" for that portion of that data file.

This functionality is there for the size that jffs2_write() natively works
at (what ever that may be, I think PAGE_SIZE). My suggegstion is to
logically extend this to the *entire* chunk of data being written during a
single system write() call.

There is a simple way to handle this. All it needs is an extra field to
define chunk_n_of_m for this particular range of data send down in the
write() system call. 
e.g. If the user sent down 5000 bytes and JFFS2_write() can handle only 4096
max at a time, then m = 2, and n = 0 for first chunk write and n = 1 for
second chunk write. The overhead to figure out and write an extra "chunk #"
field is truly negligible- even on a slow processor.

If power fails, the next time on mount, the crietrion for accepting a chunk
of data on the system would be extended from just goodCRC(chunk_hdr) +
goodCRC(chunk_data) to the more strict:

for(n=0; n < m; n++){
   goodCRC(chunk_hdr(n)) + goodCRC(chunk_data(n))
}

That's it. The overhead for this extra processing is negligible IMHO. The
benefit is huge.


It just comes down to reality vs pholosophy. The reality is that, I think
it's a good tradeoff of providing an important feature in *once place* for
all to use (rather than let everybody do their own and mess it up
everywhere) vs "POSIX says write() can return with less data written than
asked".

I don't think that it ever says that it *should* return with less data than
asked.

Now let's examine the other alternative. The "password file" method of
handling it. The problem is that, the password file has implemented and I
need that feature means that I need to implement it all over again.

Worse, password files are quite static. What about a large (for embedded
systems, say flat type database with with 2000 records) config file, that
has record level fcntl type read/write locking. How feasable is modifying a
single record in a copy of the entire database and then doing an atominv
"mv" operation for power fail safe update of the record. Now imagine that
record reads are happening at the rate of 10's/sec and writes can take place
at burst rates of 10's/sec to once/min (you get the idea). When will we ever
get the chance to keepup with 10's of record writes in a power fail safe
manner, while letting the reads continue without block?

Maybe there is a trivial solution that I have not thought about.
Suggestions?

Vipin

To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com