Minutes of Weekly Meeting, 2008-03-19

Meeting was called to order at 8:20am EDT

1. Roll Call (Participants):

Brad Van Treuren
Peter Horwood
Patrick Au
Ian McIntosh
Heiko Ehrenberg
Adam Ley
Carl Nielsen

Proxy responses requested from:
Carl Walker
Yingwu Li
Timothy Pender
Anthony Sparks

2. Review and approve 3/10/2008 minutes

minutes were approved (moved by Ian, second by Heiko)

3. Review old action items:

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Register on new SJTAG web site (http://www.sjtag.org) (All)
  • All need to check and add any missing Doc's to the site (All)
  • Respond to Brad and Ian with suggestions for static web site structure (Brad suggests we model the site after an existing IEEE web site to ease migration of tooling later) (All)
  • Look at proposed scope and purpose from ITC 2006 presentation (attached slides) and propose scope and purpose for ATCA activity group (All)
  • Look at use cases and capture alternatives used to perform similar functions to better capture value add for SJTAG (All)
  • Volunteers needed for Use Case Forum ownership (All)
  • Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
  • Continue Structural Test use case discussion on SJTAG Forum page (All)
  • We will need to begin writing a white paper for the System JTAG use cases to provide to the ATCA working group (All)
    Most likely, champions will own their subject section and draft the section with help from others. This paper will be based on the paper Gunnar Carlsson started in 2005.
  • All: review how to use the forum
  • Locate ATCA glossary of board and system states (Adam, Brad)
  • Continue POST use case discussion on SJTAG Forum page (All)
  • Brad to create a list of types of things programmed in a system for the Forum discussion (Done)
  • Adam review ATCA standard document for FRU's states
  • Brad to review Service Availability Forum

4. Discussion Topics

  1. SJTAG Value Proposition - Environmental Stress Testing (EST)
    • [Ian] We use JTAG in EST less than we should. We use functional tests mostly.
    • [Ian] Functional Tests quite long, need shorter cycle times - e.g. Boundary Scan - to catch faults that functional test may miss because of its run time;
      functional test equipment for EST can get very expensive, compared to a JTAG controller and some adapters;
    • [Patrick] JTAG is expected to allow us to write faster test code as well.
    • [Ian] Functional test is quite costly to produce. JTAG is much more cost effective.
    • [Patrick] investing millions of dollars in EST "boxes"; JTAG/Boundary Scan test coverage needs to be as good or better than Functional Test
    • [Ian] Due to the cost of Functional Test Equipment, cost tradeoffs mean that not all external interfaces are covered.
    • [Ian] some faults cannot be covered by Boundary Scan because of the technology limitations;
    • [Ian] Vibration and Shock test: cables become unreliable after a while; we exchange our cables every three months or so; cables can get quite expensive;
    • [Ian] Some of our cables are actually fiber optics.
    • [Patrick] testing in a pretty wide range -10C to 85C degrees; does JTAG have a problem with that?
    • [Heiko] JTAG itself should not have a problem with that, but you'd need to use proper cabling or adapters; I've seen applications with temperature ranges from -40C to 220C
    • [Ian] we also run some tests from -55C up to 125C degrees; I agree that the infrastructure needs to be designed to handle this
    • [Brad] EST was one of the motivation factors for investing in embedded boundary-scan. It simplifies the cabling to just an ethernet connection which is the same interface needed for the functional tests. We had tried running external JTAG cabling to the fixtures, but found the 50ft cables became unreliable quite quickly. One of the greatest benefits for running JTAG in EST is the tests are short enough to apply during the ramp times. We have found this to be very important. Greg Jordan, of Cisco, shared during last ITC how applying boundary-scan tests during ramp times during EST found defects much faster then what their functional tests were able to identify. He further stated that sometimes the functional tests were unable to identify the failures because the temperature of the board was able to stabilize before the test could complete which masked the problem. It is why this ramp time testing is important.
    • [Ian] we do that routinely; I'm surprised so many people wait for a stable temperature to run the test
    • [Brad] as you said yourself the functional test can take quiet long, sometimes as long or longer than the temperature cycle period
    • [Ian] JTAG/Boundary Scan tests many parts of the circuitry at the same time during interconnect test; opportunity to detect faults a functional test on stable temperature may miss;
    • [Brad] How about the tooling aspects for JTAG support of EST?
    • [Ian] Tooling may not be much different than what general JTAG testing requires.
    • [Ian] Do you use stop on first failure or log each fault?
    • [Ian] lots of data when running 100s or 1000s or test loops in a temperature cycle; how to handle all the test result data;
    • [Ian] I'm a proponent for running the test through, even though there may be an early fault, in order to catch all the faults and to fix them all at once
    • [Adam] It depends on whether one fault will cause catastrophic failures to damage the system.
    • [Adam] there may be some faults that can be damaging to the UUT and you should stop running the test right then
    • [Ian] Still there is a need for peripheral monitoring of voltage and current to shut down the system if a catastrophic failure occurs.
    • [Brad] One difference I can see is the need for looping constructs in the flow control of the application because you may want to run a set of tests for the entire time duration. Some of the tooling does not provide such capability in their tooling. The embedded software that is available for free also does not contain such capability and so that must be written into the process.
    • [Brad] EST of systems vs. EST of Boards; even for multiple boards in parallel, perhaps with special fixturing; include test adapter I/O features or loopbacks to cover I/O on board edges; These fixtures are not systems in themselves, but allow multiple boards to think they are part of a system.
    • [Ian] My comments for edge interfaces being excluded were more for system level testing where the connectors are round military connectors that rely on cable connections to other systems.
    • [Brad] test management needs to differentiate between tests that can be run in the system vs. tests that can be run at the board level with test adapters / loopback connections on edges; need to have both sets of tests stored on the board; We have found the Composite software design pattern to be useful.
    • [Adam] This is a similar issue in dealing with board level tests vs. system level tests.
    • [Adam] we seem to focus on "mechanical" faults (opens, shorts, etc.); don't we have any silicon failures anymore
    • [Ian] I tend to agree that we are mostly looking for connectivity problems in EST/ESS;
    • [Ian] We will probably catch these types of device failures with some other testing.
    • [Heiko] One would think the device vendor would have caught these types of failures before shipping.
    • [Ian (P)] But at system level, I don't expect to be looking for "Dead on Arrival" device faults (board test should have got those) - I'm looking at system integration faults, burn-in and in-service failures.
    • [Heiko] some devices themselves may have some problems that cause faults or even permanent damage (example: humidity enclosed in the IC package, under certain conditions the IC package "popped open")
    • [Ian] silicon faults are hard faults, not really intermittent faults (e.g. a bonding problem or a void solder joint)
    • [Patrick] It really depends on if the device vendor does environmental stress test.
    • [Brad] Boundary Scan exercises the UUT much more exhaustive/stressing than functional test, possibly detecting problems that would never show up in mission mode (e.g. FPGA that heats up so much during Boundary Scan tests that it fell off the board because the solder balls melted. This was a process problem for a batch of FPGAs and not a design defect.)
    • [Brad] Another case in point is the familiar ground bounce problem we have all experienced.
    • [Ian] We have to limit vector transitions for the ground bounce problem.
    • [Ian] Internal silicon failure generally manifests itself in the next phase of testing.
    • [Ian] Boundary Scan also exercises pins so that they change in a way/sequence that mission mode may never do, detecting potential problems functional test may not find;
    • [Adam] Boundary Scan focuses on external connections (from edges of the IC to the outside of the IC); does usually not use a significant part of the silicon
      [**Brad, not sure if he captured it correctly, requested clarification of Adam's comment for the record from people**]
      • [Ian (P)]Answering for Adam(!):
        I interpreted that Adam meant that a BSCAN interconnect test wasn't going to tell you if the gates within a device were all working properly - all you could say for sure was that the TAP logic and the BSCAN IO cells were OK. Some of the core may be getting stimulated but you don't get feedback on how functional it is.
        Having said that, a functional test may not be a whole lot better - If a device has a faulty macrocell that your design doesn't use, then you won't detect that fault; some microprocessor faults only show up with specific combinations of data and instruction. But do you really want to do a full INTEST, as intuitively, the number of device defects that won't manifest themselves in some readily observable way must be quite small? It's a cost-benefit trade off: Test cost (both development and execution) vs potential for Escaping defects.
    • [Patrick] unless you have some BIST inside the chips; SJTAG should suggest BIST to be included in devices;
    • [Brad] I agree; even for FPGA's it is possible to exercise the silicon extensively with different loads that the vendor uses at the foundry testing; It was always AT&T and Lucent's policy that all ASICs must contain BIST technology to support testing.
    • [Adam] what about Life Testing (system is run until it actually fails), some may deem that Boundary Scan is not appropriate for such applications; so far today we talked about environmental stress screening as opposed to such "early end of life" testing;
    • [Ian] Most times life cycle testing is specified by the customer contract. The customer may actually require a focus on system behavior (functional test) for Life Testing, taking away the option of Boundary Scan;
    • [Adam] You really need to know how long the system should live to be able to set up the granularity of the testing.
    • [Ian] Our life testing is used to prove the system meets the requirements. We have a group of people dedicated to studying this.
    • [Adam] As Brad pointed out that boundary scan stresses a system more than mission mode, so does it adversely affect the life of the system when performing the test?
    • [Brad] You should be designing the board and system to support not only mission mode but also test mode.
    • [Ian] I can understand where Adam is coming from is there could be a possibility that devices could be unduly stressed through the use of boundary scan testing.
    • [Adam] Boundary Scan testing may have an adverse effect on the systems life time, since it is more stressful than functional test; this needs to be taken into account; at the chip-level, Boundary Scan may be much less stressful than mission mode (since there are only a small number or transistors switching at any time, and at much lower speed than in mission mode)
    • [Adam] Life Test results are a fundamental input into the environmental stress screening scheme
    • [Adam] I was just trying to show that boundary scan could be more or less stressful on a board and that testing needs to be calculated in the life testing formula. My experience of stress screening is tightly coupled to life cycle testing.
    • [Yingwu (P)] A special case of ESS is burn-in which require a long time. During the burn-in, we can finish the JTAG interconnect test and FLASH programming such as long time tasks. This is very valuable. In the case, the system JTAG is more convenient than board JTAG.

5. Schedule next meeting

Monday, March 31, 2008, 8:15am EDT

6. Any other business


7. Review new action items


8. Adjourned at 9:30am EDT

(moved by Ian, second by Heiko)

Many thanks again to Heiko and Peter for assisting in recording the meeting notes!

Respectfully submitted,