[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Further optimising mount time.



I've just started to play with JFFS2 on NAND flash. It's taking about 30
seconds to mount a 20%-full file system from a 144MiB DiskOnChip 2000.

We've already reduced mount time fairly much by postponing the building
up of node lists for each inode till after the mount has completed, thus
avoiding all the CRC32 checking etc. But we still have to physically
scan the flash, and that's what's taking the time.

To make JFFS2 scale to larger devices, we really do need to be able to
avoid that. I'm pondering the idea of writing a 'tailer' to each
eraseblock as we finish writing to it and pick another empty one to
start writing to. This tailer would need to convey all the information
which is required by the scan/mount code.

Currently that is quite simple for data nodes -- it's just the address
and length of each physical node and the number of the inode which it
belongs. For directory entry nodes it's more complex -- during the
medium scan, we actually build up the entire child-dirent tree as if the
directory were being opened, and during the second stage of the mount we
increase nlink for every valid dirent accordingly, then delete those
inodes which we observe have nlink of zero.

Doing it that way basically means that we need _all_ the information
from the dirent to be available during the mount, and means that either
we'd need to go and read the dirents themselves after reading the node
tailer, or reproduce all that information in the tailer. 

Ideally, I want to be able to avoid that, and just treat dirent nodes
the same way as we can data nodes -- just say where they are and which
inode they belong to, and then sort it out later.

We have already moved the checking of crc32 on data nodes and obsoletion
of old and broken nodes to jffs2_do_read_inode_internal(), where it's
done the first time the inode is opened. We made the GC code touch every
inode on the system after mount to ensure this gets done.

It should be possible to move the nlink-counting and deletion of
zero-nlink inodes to the GC code too -- when a directory is first
opened, you increase nlink on all its children. Then when the GC thread
has finished opening every inode on the medium, we know that our nlink
counts are complete, and it can then do a pass removing inodes with
nlink == zero.

The problem with this approach is that it means nlink is inaccurate
until the GC thread has finished its complete pass. You may stat a file
and observe that it has nlink == 1, but in fact there's _another_ link
to it from another directory which hasn't been looked at yet. 

Other than forcing the mount to wait while we go through the entire
dirent tree, I see two possibilities for dealing with this. Either we
could just deal with it being inaccurate briefly, or we try to keep
nlink as part of the _inode_ metadata, which I'm slightly reluctant to
do for two reasons:
   - It involves an incompatible change to the datanode struct.
   - It's redundant information and can hence be inconsistent. Whether
     write the metadata node with changed nlink before or after the 
     dirent pointing to the inode in question, if you lose power in
     between you still have an inconsistent state on the medium. So 
     we'd still need to do the tree-walk _anyway_ to ensure complete
     consistency.

Once that's sorted, and we just have to associate each physical node
with the inode to which it belongs, the eraseblock tailer can be
something really simple -- a table of the inode numbers which are seen
in this eraseblock, followed by a set of tuples of 
	{ length, index into ino# table }
for each physical node. There'll be a handful of 'magic' indices which
indicate obsolete nodes or nodes which don't belong to any particular
inode, such as cleanmarker nodes, fs-options nodes, etc. 

That should be fairly small and easy to write out to the end of each
eraseblock, then the scan code just needs to look at the _end_ of the
block first, eat state from that if it's there and fall back to scanning
the old way.

Comments, ideas?

-- 
dwmw2


To unsubscribe from this list: send the line "unsubscribe jffs-dev" in
the body of a message to majordomo@xxxxxxx.com