[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JFFS2: powerfailtesting again

LANCE_N@xxxxxxx.COM said:
>  Processor type, speed, and bus size can have a huge effect on GC
> times. One thing that may be obscuring this effect is that much of the
> development and evaluation is done using PCs with plenty of memory and
> power.  We are running with relatively small memory and low power - 60
> MHz ARM7 with 16 bit data bus to FLASH and DRAM.

That would surprise me. I believe Axis' ETRAX CPU, used in many of their 
devices, is similarly powered. I'm not sure about the box for which Alex 
did the 2.4 port, but the 2.2 port and all the work Red Hat did on it was 
aimed at a StrongARM box.

The JFFS2 rewrite was done for a project using a fairly slow MIPS box. 

However, on my part at least a lot more attention has been paid to 
correctness, reliability and write-avoidance; performance was a secondary 

>  At this point, the system was booted, and it took 44 seconds to mount
> the file system (a full GC is normally done during a mount)!  

Shouldn't be. We should GC in the background, not while the user(space) is
waiting - except to the extent that we _have_ do so when necessary to
satisfy write requests.

> As mentioned above, JFFS1 is very inefficient space-wise for storing
> files which are frequently updated with small writes

In JFFS2, we've shifted some data out of inode nodes into separate dirent
nodes to reduce the size of the former. We also write _pages_ at a time. So
if you append to a file, the GC doesn't have to merge the data; it'll be
merge in the write call, which will rewrite data from the beginning of the
changed page, obsoleting the original version of the page. I've since fixed
the original reason for doing that - just haven't got round to optimising it
out yet. It would be interesting to know if it's a win.

What we probably ought to do is rewrite the page iff the previous node for
that page isn't in the same erase block as the node we're writing. 

>   The problem is that as GC  moves through the file system, the wear
> leveling causes all the big, unchanging data to  have to move.  One
> way around this is to have 2 volumes - 1 for large unchanging files
> (which will rarely GC) and one sized optimally for the frequently
> changing files.  The trade-off here is overhead. 

You can do it without that overhead. Treat sectors individually. On looking 
for one to GC, mostly you pick a dirty one. Just occasionally you pick a 
clean one, for wear levelling. This is what JFFS2 does.

>  JFFS1 requires around 2.5 free sectors.  Consider a device with a
> single 1 MB Flash. 2.5 sectors is around 160K, or 16% of the entire
> device!  

I increased that to 4 free sectors. People were running out of space with 
only 2. Performance vs. correctness. 4 sectors on certain arrangements of 
chips can be 1MiB. It sucks, I know.

> As you can imagine, seeing discussions of 4+ free sectors required for
> JFFS2 has caused some anxiety and we are hopeful that when the dust
> settles, it will prove to require no more than JFFS1 (and hopefully
> less)!

Currently 5 sectors. That sucks too, but at least I have better ideas on 
how to reduce it :) I'm just being paranoid at the moment and leaving it 
high. I believe I've eliminated every circumstance in which the nodes can 
expand when garbage-collected. Therefore, we should be able to cut that 
down to one block, assuming perfectly reliable devices - two blocks to be 
safer. Prove I'm right and I'll do it tomorrow. Otherwise it's on the TODO 
list - talk to me in diff -u form or talk to the nice people at 
embedded-sales@xxxxxxx. :)

Seriously though - this would be a useful thing for you to concentrate on,
if you're so inclined.

>  One trade-off (separate volumes for frequent and infrequently updated
> files) is mentioned above.

I believe we have a slightly nicer optimisation for that particular 
problem, as mentioned above, and it's already working in JFFS2.

> Another is to force GC frequently and during non-critical times to 
> minimize worse-case GC times, to minimize variation in GC times, and
> to minimize the  probability that a user will see any delays at all.

We do GC from a kernel thread, starting at some heuristic threshold of dirty/
free space. This is for precisely that reason. GC is done in user context
only when it's absolutely necessary. GC is done slightly ahead of time by
the kernel thread. We have taken care to ensure that the thread is 
optional. You can SIGSTOP it and SIGCONT it later, or just SIGKILL it. 
You'll then have all GC done just in time to make space for writes. 
There's currently no mount option to prevent the thread from being created, 
and no way to start it if you'd SIGKILL'd it. Patches accepted. 

The original versions of JFFS would garbage collect _fully_ immediately 
after every write, and after mounting. I believe(d) that to be fairly 
suboptimal, which is why we introduced the thread.

> Hope this is helpful.

It is, thankyou.


To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com