Minutes of Weekly Meeting, 2008-02-25
Meeting was called to order at 8:23am EST
1. Roll Call (Participants):
Brad Van Treuren
Carl Nielsen
Adam Ley
Peter Horwood
Heiko Ehrenberg
By Proxy:
Ian McIntosh (Comment embedded in transcript)
2. Review and approve 2/20/2008 minutes
minutes were approved (moved by Carl, second by Adam)
3. Review old action items:
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the ATCA
specification if it is not going to be managed globally (All)
- Register on new SJTAG web site (
http://www.sjtag.org) (All)
- All need to check and add any missing Doc's to the site (All)
- Respond to Brad and Ian with suggestions for static web site structure (Brad
suggests we model the site after an existing IEEE web site to ease migration of
tooling later) (All)
- Look at proposed scope and purpose from ITC 2006 presentation (attached slides)
and propose scope and purpose for ATCA activity group (All)
- Look at use cases and capture alternatives used to perform similar functions
to better capture value add for SJTAG (All)
- Volunteers needed for Use Case Forum ownership (All)
- Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
- Continue Structural Test use case discussion on SJTAG Forum page (All)
- We will need to begin writing a white paper for the System JTAG use cases to
provide to the ATCA working group (All)
Most likely, champions will own their subject section and draft the section with
help from others. This paper will be based on the paper Gunnar Carlsson started in
2005.
- All: review how to use the forum
- Locate ATCA glossary of board and system states (Adam, Brad)
- Ian and Brad work on setting up a Glossary Page on the SJTAG site. (Done -
http://www.sjtag.org/glossary.html)
- Continue POST use case discussion on SJTAG Forum page (All)
- Brad submit an abstract regarding SJTAG Use Cases for ITC. (Done)
4. Discussion Topics
- SJTAG Value Proposition - Programming and Updates
- [Brad] There have been a few papers on this subject available in the public
domain.
- [Brad] A good set of papers to review are one presented by Peter at BTW2002
and another one presented by myself at BTW2002 which highlights this capability
as a feature available within systems once Boundary-Scan is embedded in the
design. [Additional note is a good paper on CPLD programming in the system was
also presented by Greg Noeninckx at BTW2002.) The papers can be found at:
- [Brad] good paper to review for this use case: "Remote Diagnostics and
Upgrades" BTW2003, Tim Pender, Eastman Kodak
(
http://www.molesystems.com/BTW/material/BTW03/BTW03 Session 3 Slides/3-3 TimothyPender Kodak-Slides.pdf,
http://www.molesystems.com/BTW/material/BTW03/BTW03 Session 3 Papers/Remote Diagnostics Upgrades (3.3).pdf)
- [Brad] I now open the floor for discussion.
- Long Silence...
- [Brad] How does the group feel about programming and updates via
Boundary Scan in a system? Today's FPGA's are very large (data files may be
Megabytes);
- [Brad] Do people feel that fpga getting larger and is bscan
still applicable? The time required for programming is not changing but the
volume of data is increasing.
- [Heiko] Sounds like a lot of data to transfer to or to store in the system;
- [Heiko] Ian brought up the point of using CPLD's for power-up sequencing in
last week's conference call; programming the CPLD might cause problems with the
power-up states while it is programmed? also, that CPLD may not be usable in
structural tests or at least will require constraints to avoid problems with
power distribution;
- [Brad] New CPLD's allow the device to be programmed whilst operational then
reload
- [Peter] Care must be taken with DFT to ensure that the card will
not be unrecoverable if something goes wrong
- [Brad] This is taken into consideration for well designed cards, this can
be an issue with FPGA tooling and hardware, there must be a method to do a
role back to a known good boot image.
- [Carl N.] I agree.
- [Ian] I think we have a duty to re-inforce this message and not just assume
that every designer knows to handle this. From some discussions I've had, I know
that some observers are looking to us to provide some direction on "best practice"
for system level JTAG embodiment. It's not directly relevant to this discussion,
but a list of "should" and "should nots" would probably help a lot of people and
may even help the tool vendors understand what design scenarios must be accounted
for and which may be set aside.
- [Brad] Is the reason for the silence from the group today because all of you
are more tool or technology vendors and not the end users of this use case?
- [Adam] (standard parallel) FLASH based storage allow for dual-image,
supporting multiple versions of programming data in a system;
- [Brad] do you suggest that this programming should be done with some other
means that Boundary Scan?
- [Brad] There are many different ways for an FPGA to load. Are you talking
about not using bscan?
- [Carl N.] Is that with a mission mode processor or something else?
- [Adam] Bscan could used to program the flash and then a test controller
like a CPLD sequencer could be used to load the FPGAs.
- [Adam] Thus, during boot you can switch between the 2 images
- [Brad] This dual image problem is no different than what is required for
general software updates currently performed in the system.
- [Adam] In the typical use case, where is the update from? If you are going
to do a pcb update, then is it stored local on the card before being applied?
How does the data get into the system for the remote upgrade?
- [Brad] Remote upgrade via boundary scan is most useful for a card that will
not come up and then have multi-drop to be used to bring the card back up.
- [Adam] for updates how do we handle getting the update to the card (ie
rs232, shelf manager)
- [Brad] normally from the shelf manager, which runs the system diagnostic
software can talk off shelf. The diagnostic software will have an interface
for the craft operator to perform the upload of the data to be used.
- [Ian] You're assuming that your system has a shelf-manager! No doubt you're
right for many telecomms racks, and I guess most systems will have some sort of
"Master Processor" to manage things, but it's dangerous to generalise on what
facilities or resources it may provide.
- [Adam] So the craft operator initiates the operation for the data to come
across some network
- [Brad] for systems that do not have this access to a WAN, field updates may
be done by connected via a local network from a technician's test computer if
the customer allows the access. Otherwise, remote updates are not an option.
- [Ian] One thing I can almost guarantee is that we won't have WAN access
to our systems in the field! Typically, updates will need one of our Field
Service Engineers with a laptop hooked directly onto the box.
- [Adam] fair to large amount of the system has to be available to make
this happen would it not be cheaper to just plug in a bscan tester?
- [Brad] this would exceed the ability of the personnel required to do
the update, the network method provides a common look and feel. Per T Pender's
paper, each FPGA image update can save 1 million dollars. With new FPGA's this
data may not be applicable now.
- [Adam] At the time Tim's paper was written it was applicable to update the
FPGA prom's. the FPGAs now tend to use standard flash or SPI prom's that are easy
to make this work via IEEE1149.1. One FPGA vendor uses a small load image to
provide an 1149.1 link to their SPI FLASH as an 1149.1 programming interface
inside the FPGA. Are you familiar with this method?
- [Brad] Yes, I am aware of this method.
- [Adam] Is it not better to to use alternate methods rather than bscan to
program these devices?
- [Brad] Is parallel mode FPGA programming not the best way to go forward?
Thus, FPGA and software image updates can be done using the same process.
The problem comes when the board is unable to boot because the data needed
to run the load process is corrupted. In this case, a multi-drop solution
from the Shelf Controller might come is handy to correct the problem.
- [Brad] In the field, people usually swap boards/modules and let the
repair depot deal with updates, although the field personnel still has the
possibility to update the system in the field; downtime of the system is
the issue: if loading new updates / correcting data loads takes too
long, swapping cards is preferred;
- [Ian] We generally want to retain the configuration/build-standard
control of our systems exactly as delivered, so we'll tend to only swap
out boards if there's actually a fault suspected. In our case, downtime,
although important, probably isn't the driving issue since any update will
mean that the unit has been scheduled out of service - Our boxes are only
part of a much larger system (the aircraft/ship/vehicle) and updates will
usually have been planned in as part of an overall maintenance program,
even for "bug fixes". In the past we would only offer a field capability
to update the "application software" or data libraries, and this would
be performed using the mission databus: Re-programming of FPGAs, CPLDs or
microcontrollers was a return to factory job. That was quite a disruptive
way to operate from our customer's perspective, so now we recognize the need
to make as many of the programmable elements in the system as possible
updatable, "covers on". Often, specification or security constraints will
mean that we can't offer firmware re-programming over the mission bus and
so we have to use the Test buses instead (whatever they may turn out to be).
- [Heiko] Is major FLASH upgrades done in repair depot due to the time
it takes to perform the upgrade in the field? A swap out takes less down
time then waiting for the update to complete.
- [Brad] The reason for doing remote field up grades is that it
is still quicker than sending a person to fix the problem. Many times the
remote update is taking place while a technician is dispatched to ensure the
customer has a solution with the lowest possible down time.
- [Adam] Are we not off topic? My understanding is that the UUT will
be operational and that we just want to upgrade. Is it not more cost
effective in down time to send the person on site as you bring the system
down to ensure you can recover the system back to operational status?
- [Brad] Indicates that the whole system could be up and running and
just 1 card will be updated at a time, If you have A/B suite for fault
tolerance and redundancy, you can update the redundant unit whilst the
other is on line
- [Brad] we also need to talk about updating regular Flash; becomes
important as last-resort option; even if programming takes hours for
Megabytes of data, it still may be faster and more economic than
sending someone out to the field to replace modules;
- [Adam] this may be true for fixing problems; but is that really true
for firmware version upgrades, too?
- [Brad] if the system has problems because of bugs in the firmware
(not causing the whole system to totally fail but rather parts of the system
not to function properly or at full performance), such problems do not
require tech personnel in the field to swap hardware but more often than not
can be fixed with an incremental update(s) to the firmware which could be
applied remotely;
- [Brad] time is running out for today's discussion; to be continued next
week ...
5. Schedule next meeting
Monday, 3/3/08, 8:15am EST
6. Any other business
glossary section: if you have any updates for the glossary, submit suggestions
to SJTAG Forum "Suggestions » Glossary for the website"
(
http://forums.sjtag.org/viewtopic.php?t=30)
7. Review new action items
none
8. Adjourned at 9:25am EST
(moved by Peter, second by Heiko)
Many thanks to Heiko and Peter for assisting in preparing these minutes.
Best regards,
Brad