Minutes of Weekly Meeting, 2008-02-25
Meeting was called to order at 8:23am EST
1. Roll Call (Participants):
Brad Van Treuren
Ian McIntosh (Comment embedded in transcript)
2. Review and approve 2/20/2008 minutes
minutes were approved (moved by Carl, second by Adam)
3. Review old action items:
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
- Register on new SJTAG web site ( http://www.sjtag.org) (All)
- All need to check and add any missing Doc's to the site (All)
- Respond to Brad and Ian with suggestions for static web site structure (Brad suggests we model the site after an existing IEEE web site to ease migration of tooling later) (All)
- Look at proposed scope and purpose from ITC 2006 presentation (attached slides) and propose scope and purpose for ATCA activity group (All)
- Look at use cases and capture alternatives used to perform similar functions to better capture value add for SJTAG (All)
- Volunteers needed for Use Case Forum ownership (All)
- Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
- Continue Structural Test use case discussion on SJTAG Forum page (All)
- We will need to begin writing a white paper for the System JTAG use cases to provide to the ATCA working group (All)
Most likely, champions will own their subject section and draft the section with help from others. This paper will be based on the paper Gunnar Carlsson started in 2005.
- All: review how to use the forum
- Locate ATCA glossary of board and system states (Adam, Brad)
- Ian and Brad work on setting up a Glossary Page on the SJTAG site. (Done - http://www.sjtag.org/glossary.html)
- Continue POST use case discussion on SJTAG Forum page (All)
- Brad submit an abstract regarding SJTAG Use Cases for ITC. (Done)
4. Discussion Topics
- SJTAG Value Proposition - Programming and Updates
- [Brad] There have been a few papers on this subject available in the public domain.
- [Brad] A good set of papers to review are one presented by Peter at BTW2002 and another one presented by myself at BTW2002 which highlights this capability as a feature available within systems once Boundary-Scan is embedded in the design. [Additional note is a good paper on CPLD programming in the system was also presented by Greg Noeninckx at BTW2002.) The papers can be found at:
- [Brad] good paper to review for this use case: "Remote Diagnostics and Upgrades" BTW2003, Tim Pender, Eastman Kodak
( http://www.molesystems.com/BTW/material/BTW03/BTW03 Session 3 Slides/3-3 TimothyPender Kodak-Slides.pdf,
http://www.molesystems.com/BTW/material/BTW03/BTW03 Session 3 Papers/Remote Diagnostics Upgrades (3.3).pdf)
- [Brad] I now open the floor for discussion.
- Long Silence...
- [Brad] How does the group feel about programming and updates via Boundary Scan in a system? Today's FPGA's are very large (data files may be Megabytes);
- [Brad] Do people feel that fpga getting larger and is bscan still applicable? The time required for programming is not changing but the volume of data is increasing.
- [Heiko] Sounds like a lot of data to transfer to or to store in the system;
- [Heiko] Ian brought up the point of using CPLD's for power-up sequencing in last week's conference call; programming the CPLD might cause problems with the power-up states while it is programmed? also, that CPLD may not be usable in structural tests or at least will require constraints to avoid problems with power distribution;
- [Brad] New CPLD's allow the device to be programmed whilst operational then reload
- [Peter] Care must be taken with DFT to ensure that the card will not be unrecoverable if something goes wrong
- [Brad] This is taken into consideration for well designed cards, this can be an issue with FPGA tooling and hardware, there must be a method to do a role back to a known good boot image.
- [Carl N.] I agree.
- [Ian] I think we have a duty to re-inforce this message and not just assume that every designer knows to handle this. From some discussions I've had, I know that some observers are looking to us to provide some direction on "best practice" for system level JTAG embodiment. It's not directly relevant to this discussion, but a list of "should" and "should nots" would probably help a lot of people and may even help the tool vendors understand what design scenarios must be accounted for and which may be set aside.
- [Brad] Is the reason for the silence from the group today because all of you are more tool or technology vendors and not the end users of this use case?
- [Adam] (standard parallel) FLASH based storage allow for dual-image, supporting multiple versions of programming data in a system;
- [Brad] do you suggest that this programming should be done with some other means that Boundary Scan?
- [Brad] There are many different ways for an FPGA to load. Are you talking about not using bscan?
- [Carl N.] Is that with a mission mode processor or something else?
- [Adam] Bscan could used to program the flash and then a test controller like a CPLD sequencer could be used to load the FPGAs.
- [Adam] Thus, during boot you can switch between the 2 images
- [Brad] This dual image problem is no different than what is required for general software updates currently performed in the system.
- [Adam] In the typical use case, where is the update from? If you are going to do a pcb update, then is it stored local on the card before being applied? How does the data get into the system for the remote upgrade?
- [Brad] Remote upgrade via boundary scan is most useful for a card that will not come up and then have multi-drop to be used to bring the card back up.
- [Adam] for updates how do we handle getting the update to the card (ie rs232, shelf manager)
- [Brad] normally from the shelf manager, which runs the system diagnostic software can talk off shelf. The diagnostic software will have an interface for the craft operator to perform the upload of the data to be used.
- [Ian] You're assuming that your system has a shelf-manager! No doubt you're right for many telecomms racks, and I guess most systems will have some sort of "Master Processor" to manage things, but it's dangerous to generalise on what facilities or resources it may provide.
- [Adam] So the craft operator initiates the operation for the data to come across some network
- [Brad] for systems that do not have this access to a WAN, field updates may be done by connected via a local network from a technician's test computer if the customer allows the access. Otherwise, remote updates are not an option.
- [Ian] One thing I can almost guarantee is that we won't have WAN access to our systems in the field! Typically, updates will need one of our Field Service Engineers with a laptop hooked directly onto the box.
- [Adam] fair to large amount of the system has to be available to make this happen would it not be cheaper to just plug in a bscan tester?
- [Brad] this would exceed the ability of the personnel required to do the update, the network method provides a common look and feel. Per T Pender's paper, each FPGA image update can save 1 million dollars. With new FPGA's this data may not be applicable now.
- [Adam] At the time Tim's paper was written it was applicable to update the FPGA prom's. the FPGAs now tend to use standard flash or SPI prom's that are easy to make this work via IEEE1149.1. One FPGA vendor uses a small load image to provide an 1149.1 link to their SPI FLASH as an 1149.1 programming interface inside the FPGA. Are you familiar with this method?
- [Brad] Yes, I am aware of this method.
- [Adam] Is it not better to to use alternate methods rather than bscan to program these devices?
- [Brad] Is parallel mode FPGA programming not the best way to go forward? Thus, FPGA and software image updates can be done using the same process. The problem comes when the board is unable to boot because the data needed to run the load process is corrupted. In this case, a multi-drop solution from the Shelf Controller might come is handy to correct the problem.
- [Brad] In the field, people usually swap boards/modules and let the repair depot deal with updates, although the field personnel still has the possibility to update the system in the field; downtime of the system is the issue: if loading new updates / correcting data loads takes too long, swapping cards is preferred;
- [Ian] We generally want to retain the configuration/build-standard control of our systems exactly as delivered, so we'll tend to only swap out boards if there's actually a fault suspected. In our case, downtime, although important, probably isn't the driving issue since any update will mean that the unit has been scheduled out of service - Our boxes are only part of a much larger system (the aircraft/ship/vehicle) and updates will usually have been planned in as part of an overall maintenance program, even for "bug fixes". In the past we would only offer a field capability to update the "application software" or data libraries, and this would be performed using the mission databus: Re-programming of FPGAs, CPLDs or microcontrollers was a return to factory job. That was quite a disruptive way to operate from our customer's perspective, so now we recognize the need to make as many of the programmable elements in the system as possible updatable, "covers on". Often, specification or security constraints will mean that we can't offer firmware re-programming over the mission bus and so we have to use the Test buses instead (whatever they may turn out to be).
- [Heiko] Is major FLASH upgrades done in repair depot due to the time it takes to perform the upgrade in the field? A swap out takes less down time then waiting for the update to complete.
- [Brad] The reason for doing remote field up grades is that it is still quicker than sending a person to fix the problem. Many times the remote update is taking place while a technician is dispatched to ensure the customer has a solution with the lowest possible down time.
- [Adam] Are we not off topic? My understanding is that the UUT will be operational and that we just want to upgrade. Is it not more cost effective in down time to send the person on site as you bring the system down to ensure you can recover the system back to operational status?
- [Brad] Indicates that the whole system could be up and running and just 1 card will be updated at a time, If you have A/B suite for fault tolerance and redundancy, you can update the redundant unit whilst the other is on line
- [Brad] we also need to talk about updating regular Flash; becomes important as last-resort option; even if programming takes hours for Megabytes of data, it still may be faster and more economic than sending someone out to the field to replace modules;
- [Adam] this may be true for fixing problems; but is that really true for firmware version upgrades, too?
- [Brad] if the system has problems because of bugs in the firmware (not causing the whole system to totally fail but rather parts of the system not to function properly or at full performance), such problems do not require tech personnel in the field to swap hardware but more often than not can be fixed with an incremental update(s) to the firmware which could be applied remotely;
- [Brad] time is running out for today's discussion; to be continued next week ...
5. Schedule next meeting
Monday, 3/3/08, 8:15am EST
6. Any other business
glossary section: if you have any updates for the glossary, submit suggestions to SJTAG Forum "Suggestions » Glossary for the website" ( http://forums.sjtag.org/viewtopic.php?t=30)
7. Review new action items
8. Adjourned at 9:25am EST
(moved by Peter, second by Heiko)
Many thanks to Heiko and Peter for assisting in preparing these minutes.