[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Killing JFFS under 2.2



1. scan_flash should never fail, no matter how many bugs there are in the
writing algorithm. it is ok to skip corrupt nodes of course but it should
_always_ catch on to correct nodes aligned correctly.

notice that it requires the code to be correct that inserts "dirty"
alloced areas between the last good node and the new good node of course.
was that the mount problem ?

2. where is the schedule point between entering JFFS and physically
writing anything to disk ? where can interleaving of tasks matter ?

what i mean is, how can another task get into jffs and physically write a
node _between_ another task is inside and allocating areas ?

-bjorn

On Thu, 27 Jul 2000, Finn Hakansson wrote:

> 
> Hi David and Sébastien,
> 
> Hmmm... I haven't encountered this problem before.
> 
> I would prefer the solution with the mutex around the allocate and the
> write.
> 
> The latest patch where you mark space as dirty is more like a fast hack.
> That way we don't solve the real reason to why it failed. And what if we
> have a flash with a very large "hole"? (Due to many threads trying to
> write at the same time.) Then one could loose a significant part of the
> space on-flash which could prevent JFFS from doing a GC perhaps. There
> are more things that could go wrong. As I see it, there are many more
> pitfalls with this latter solution.
> 
> What do you say?
> 
> /Finn
> 
> 
> On Wed, 26 Jul 2000, David Woodhouse wrote:
> 
> > 
> > scote1@xxxxxxx.com said:
> > >  Looking at your log, I'll have to agree with your GC affirmation.
> > > After the 2 "Cool stuff's happening!", we should have a
> > > jffs_garbage_collect_next(): "libncurses.so.4.2", ino:42, version: 25
> > 
> > > Right where the problem happens!  Maybe we should take a look at
> > > jffs_rewrite_data()
> > 
> > I think it's a write of a new node which is getting interleaved with the GC.
> > 
> > Something like:
> > 
> > 	Thread 1 (GC)			Thread 2 (write)
> > 	-------------			----------------
> > 
> > 	allocate space for ino:42,v25
> > 	write ino:42, v25
> > 					allocate space for new 8K node
> > 					lose CPU somehow
> > 	allocate space for ino:42,v26
> > 	write ino:42, v26
> > 	... write lots more....
> > 
> > 					... still waiting for CPU...
> > 					... want to write new 8K node 
> > 						which was alloc'd ...
> > 				POWER OFF
> > 
> > 
> > If I'm right, then I see two possible fixes - either stick a big mutex 
> > around the (allocate,write) code, or have all node writes done sequentially 
> > from a single kernel thread.
> > 
> > 
> > 	
> > 
> > 
> > --
> > dwmw2
> > 
> > 
> > 
> 
>