[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

arch/cris/arch-v32/drivers/nandflash.c problems and resolution

Recently I noticed, that JFFS reports read block errors when the flash is accessed while data is transferred over DMA. Going back to our older firmware showed that it was there always - we just did not notice it because usually flash is read before the DMA transfers (in our case it is video data from the FPGA) take place. After trying to play with the wait states for the flash (reg_bif_core_rw_grp3_cfg) I noticed that the longer I make those wait states the worse things go. With the maximal wait states  NAND was able to read just the first block even without any DMA.

It still took me couple days (and I used an oscilloscope in addition to software tools)  to understand where the problem comes from. Most of the NAND flash control signals (CE, CLE, ALE) are driven by the general purpose I/O pins, but the RD and WR are different - they control the system bus and so have to wait for others there. When the flash page is to be written there is a sequence of steps to initiate the process:

1 - set the CE low (chip enable active) and CLE high (Command Latch Enable)
2 - write the command byte over the system bus - that cycle puts data on the bus and sends pulse on the WE pin (atomic action)
3- Set CLE low and ALE (address Latch Enable) high
4 - write first address byte over the system bys, pulsing the WE pin again.
and so on ...

The problem is that the program does not wait for step 2 to be finished before going to step 3 (it just initiates the write cycle performed by hardware), and if the write operation is delayed waiting for the bus (it happens with DMA or just with long wait states - next write pules in step 4 may be delayed by long write pulse of the step 2. That  makes it possible that step 3 (or similar next ones) starts before the end of the (automatically generated) write pules of the step 2. Then the NAND flash chip will interpret it not as a command cycle, but as an address one.

Authors of the arch/cris/arch-v32/drivers/nandflash.c probably did have some problems with it and disabled interrupts in the crisv32_hwcontrol() function, but it could not solve the problem completely - it just added some computational delay to the function and made it less likely for the code execution to run ahead of the bus writes.

I used a different approach - after each bus write I added dummy read from the unused I/O device (it was CSR1 in our case), so the program had to wait for the actual bus operation to be completed - even if it takes as long as a DMA burst.

And the problem is gone completely, no IRQ disabling is needed, no problems with DMA and NAND flash works nicely with any wait states, including the longest ones that caused 100% failure initially.