[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ramblings on NAND flash support.




p2@xxxxxxx.be said:
> I have the impression the current code assumes it can all free space
> as 1 contiguous block (possibly by forcing garbage collection). If the
> NAND flash develops a bad block, this is obviously not true. 

JFFS2 doesn't assume that, JFFS1 does. In JFFS2, we deal with erase blocks 
individually, and you can just stick the offending block on the bad_list 
and never be troubled by it again. 

That's the easy part.

What's more difficult is dealing with the write-cycle restrictions on NAND 
flash. That means firstly that you can't go back and mark nodes obsolete, 
so you have to make sure your garbage-collection doesn't delete, say, a 
deletion dirent before the real dirent which it obsoletes has been erased. 
Because we don't go through the flash linearly, it's possible that we come 
to garbage collect the deletion dirent _before_ the earlier node that it 
obsoletes. 

The good news on that front is that I already dealt with the _really_ hairy 
case which required fundamental changes to the core code (and lots of 
banging of head against wall) - the case where you truncate a file, later 
extend it leaving a hole, and don't want the old data to 'show through' the 
holes. The code to deal with that has been in there from the start, even 
though it wasn't strictly necessary on NOR flash. Fixing it required fairly 
fundamental changes to how we do truncation, holes, etc.

You need to fix the deletion dirent case - that's not hard. Currently we 
have this comment in gc.c::jffs2_garbage_collect_deletion_dirent():

        /* FIXME: When we run on NAND flash, we need to work out whether
           this deletion dirent is still needed to actively delete a
           'real' dirent with the same name that's still somewhere else
           on the flash. For now, we know that we've actually obliterated
           all the older dirents when they became obsolete, so we didn't
           really need to write the deletion to flash in the first place.
        */

In fact, there's a simple but na´ve method for this. Look through the 
physical nodes belonging to the parent directory. For each one that's 
marked (in the jffs2_raw_node_ref in memory) obsolete, read it and check 
whether it is obsoleted by the one you're trying to GC (i.e. check if the 
name matches). Repeat. If you find one that _is_ still obsoleted by the 
deletion dirent you're trying to GC, you need to write that deletion dirent 
out to the flash again. If you don't find one, you can just let it get 
erased as we currently do. 20-odd lines of code. 

You also have to deal with the restriction on write cycles during normal 
operation - you can't just write lots and lots of tiny nodes consecutively, 
in individual write cycles, and expect it to work. If you just write lots 
of dirents consecutively, you could quite feasibly fit more than ten in a 
512-byte page. And some NAND flash chips allow even fewer writes per page 
than that. 

I think the best approach here is to keep a page-sized (NAND page, 512 
byte) write cache, and to _actually_ write to the flash only when that 
becomes full. This way, you can use the helpful hardware ECC built in to 
devices like the DiskOnChip and some SmartMedia adaptors, which is 
block-based. 

This approach gives you a couple of interesting problems to solve, but
they're not that bad. First, you have to ensure that you don't actually
erase an older node which is obsoleted by a node that's still in the write
cache. You have to make sure the new node is entirely on the medium before
erasing the old node which it obsoletes. That's not too hard to achieve - if
the new node that finally obsoletes an old block is still in the write
cache, you don't stick that eraseblock on the erase_pending_list
immediately, you stick it on a new erase_pending_wbuf list, and the code
that flushes the write buffer moves it to the erase_pending list when it's
done. 

Secondly, you need to deal with write errors when flushing the write cache.
You haven't lost any data - either your node is entirely in the write cache
and can just be written out elsewhere, or only the end of it is in the write
cache and you know that the beginning of it is on the flash OK - in which 
case you can read it back and write the whole thing out elsewhere.

Oh, and you'll need to implement fsync() I suppose. That won't be hard
either - you can just check whether any nodes belonging to the inode
being synced are currently in the write cache. When you need to flush the
write cache to honour fsync() I think it's probably best just to trigger
garbage collection to _fill_ the write cache and make it get flushed
naturally, rather than padding the block. 

You'll need to move the CLEANMARKER node into the 'spare' area of your 
flash chip, which is trivial to do.

Altogether, it doesn't seem quite as complicated as once I thought it 
was.... is that because I've missed out something obvious?

--
dwmw2



To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com