[MinnowBoard] Firmware hang

Krau, Michael P michael.p.krau at intel.com
Wed Oct 26 19:25:15 UTC 2016


For quick answers:

1) I would interpret unstable, as physically unstable (damaged or worn out).  But I will let David make the definitive call on this.

2) The Garbage collection can only be run when the firmware has control of the system, which is during the boot process.  Once the firmware is able to hand off control to an OS it no longer has precedence in the system.  The OS may not take kindly to the firmware 'stealing' execution cycles for the purposes of garbage collecting the NVRAM variable space.  

However, if each reboot causes some change to the boot variable space, then each reboot would slowly generate garbage, and would be causing writes to the NVRAM.  Depending on the size of those changes, garbage collection could be a fairly regular occurrence (lots of big changes on every boot filling space that has to be reclaimed).   

Also, it should be noted that some OS's will use the firmware SPI  to store their boot variables (on every boot), which is another place where the NVRAM could be getting some serious exercise.   Forcing more wear on the SPI part (plus causing more garbage collection). 

Sincerely, 

Michael Krau
 
-----Original Message-----
From: elinux-MinnowBoard [mailto:elinux-minnowboard-bounces at lists.elinux.org] On Behalf Of Sjoerd Simons
Sent: Wednesday, October 26, 2016 12:10 PM
To: MinnowBoard Development and Community Discussion <elinux-minnowboard at lists.elinux.org>
Subject: Re: [MinnowBoard] Firmware hang

On Wed, 2016-10-26 at 08:36 +0000, Wei, David wrote:
> The Minnowboard max UEFI firmware could pass the stress test of 
> thousands of system reset cycle.

We're definately in the 10 thousands of reset cycles of these board ;)

> There are two causes I can think:
> (1) The SPI flash is already under unstable state. As we are talking 
> in another mail loop.

By this you mean the flash has already worn out or?

> (2) The UEFI variable region garbage reclaiming process is interrupted 
> by system reset/shutdown.

Does the GC in UEFI happen on system startup or is that something that might runs later on? These systems do get reset quite often but never during very early boot, only when already in the actual bootloader or linux.


> Thanks,
> David  Wei
> 
> 
> -----Original Message-----
> From: elinux-MinnowBoard [mailto:elinux-minnowboard-bounces at lists.eli
> nux.org] On Behalf Of Sjoerd Simons
> Sent: Wednesday, October 26, 2016 3:01 PM
> To: MinnowBoard Development and Community Discussion <elinux-minnowbo 
> ard at lists.elinux.org>
> Subject: Re: [MinnowBoard] Firmware hang
> 
> On Wed, 2016-10-26 at 01:40 +0000, Wei, David wrote:
> > The log shows that the UEFI "Setup" and "SetupRecovery" variables in 
> > SPI flash are corrupted.
> > What kind of test were you running in lab?
> 
> These board are in our lava lab, running mostly boot tests (part of
> kernelci.org) and project image testing. For each test there will be a 
> hard power off/on event one or more times (remote controlled power 
> socket). I didn't do the math but i would expect these boards to have 
> seen at least 10 to 20 hard resets every day for the last 2 years or 
> so (with old firmware).
> 
> We don't run any tests which write to the SPI flash so the only write 
> cycles that should see are those from the firmware itself.
> 
> 
> > Thanks,
> > David  Wei
> > 
> > -----Original Message-----
> > From: elinux-MinnowBoard [mailto:elinux-minnowboard-bounces at lists.e
> > li
> > nux.org] On Behalf Of Sjoerd Simons
> > Sent: Tuesday, October 25, 2016 9:42 PM
> > To: elinux-minnowboard at lists.elinux.org
> > Subject: [MinnowBoard] Firmware hang
> > 
> > Hey,
> > 
> > I've been having issues with some of the minnowboard max boards in 
> > our test lab for a while where at times they never finished post.
> > Reflashing the flash seemed to help for a while, until the issues 
> > came back again. With the debug version of the firmware the last few 
> > lines
> > are:
> > 
> > Gpio_S5_4 value is 0x3
> > Gpio_S5_17 value is 0x3
> > Firmware Volume for Variable Store is corrupted Firmware Volume for 
> > Variable Store is corrupted
> > 
> > ASSERT_EFI_ERROR (Status = Not Found) ASSERT
> > m:\Vlv2TbltDevicePkg\PlatformInitPei\PlatformEarlyInit.c(213):
> > !EFI_ERROR (Status)
> > 
> > 
> > A full log can be found here:
> >   https://lava.collabora.co.uk/scheduler/job/367353/log_file
> > 
> > 
> > Any ideas?
> 
> --
> Sjoerd Simons
> Collabora Ltd.
> _______________________________________________
> elinux-MinnowBoard mailing list
> elinux-MinnowBoard at lists.elinux.org
> http://lists.elinux.org/mailman/listinfo/elinux-minnowboard
> _______________________________________________
> elinux-MinnowBoard mailing list
> elinux-MinnowBoard at lists.elinux.org
> http://lists.elinux.org/mailman/listinfo/elinux-minnowboard

--
Sjoerd Simons
Collabora Ltd.
_______________________________________________
elinux-MinnowBoard mailing list
elinux-MinnowBoard at lists.elinux.org
http://lists.elinux.org/mailman/listinfo/elinux-minnowboard


More information about the elinux-MinnowBoard mailing list