Minutes of Weekly Meeting, 2008-03-19
Meeting was called to order at 8:20am EDT
1. Roll Call (Participants):
Brad Van Treuren
Peter Horwood
Patrick Au
Ian McIntosh
Heiko Ehrenberg
Adam Ley
Carl Nielsen
Proxy responses requested from:
Carl Walker
Yingwu Li
Timothy Pender
Anthony Sparks
2. Review and approve 3/10/2008 minutes
minutes were approved (moved by Ian, second by Heiko)
3. Review old action items:
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the
ATCA specification if it is not going to be managed globally (All)
- Register on new SJTAG web site (http://www.sjtag.org) (All)
- All need to check and add any missing Doc's to the site (All)
- Respond to Brad and Ian with suggestions for static web site structure
(Brad suggests we model the site after an existing IEEE web site to ease
migration of tooling later) (All)
- Look at proposed scope and purpose from ITC 2006 presentation
(attached slides) and propose scope and purpose for ATCA activity group
(All)
- Look at use cases and capture alternatives used to perform similar
functions to better capture value add for SJTAG (All)
- Volunteers needed for Use Case Forum ownership (All)
- Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
- Continue Structural Test use case discussion on SJTAG Forum page (All)
- We will need to begin writing a white paper for the System JTAG use
cases to provide to the ATCA working group (All)
Most likely, champions will own their subject section and draft the
section with help from others. This paper will be based on the paper
Gunnar Carlsson started in 2005.
- All: review how to use the forum
- Locate ATCA glossary of board and system states (Adam, Brad)
- Continue POST use case discussion on SJTAG Forum page (All)
- Brad to create a list of types of things programmed in a system for
the Forum discussion (Done)
- Adam review ATCA standard document for FRU's states
- Brad to review Service Availability Forum
4. Discussion Topics
- SJTAG Value Proposition - Environmental Stress Testing (EST)
- [Ian] We use JTAG in EST less than we should. We use functional tests
mostly.
- [Ian] Functional Tests quite long, need shorter cycle times - e.g.
Boundary Scan - to catch faults that functional test may miss because of
its run time;
functional test equipment for EST can get very expensive, compared to a
JTAG controller and some adapters;
- [Patrick] JTAG is expected to allow us to write faster test code as well.
- [Ian] Functional test is quite costly to produce. JTAG is much more
cost effective.
- [Patrick] investing millions of dollars in EST "boxes"; JTAG/Boundary
Scan test coverage needs to be as good or better than Functional Test
- [Ian] Due to the cost of Functional Test Equipment, cost tradeoffs
mean that not all external interfaces are covered.
- [Ian] some faults cannot be covered by Boundary Scan because of the
technology limitations;
- [Ian] Vibration and Shock test: cables become unreliable after a
while; we exchange our cables every three months or so; cables can get
quite expensive;
- [Ian] Some of our cables are actually fiber optics.
- [Patrick] testing in a pretty wide range -10C to 85C degrees; does
JTAG have a problem with that?
- [Heiko] JTAG itself should not have a problem with that, but you'd
need to use proper cabling or adapters; I've seen applications with
temperature ranges from -40C to 220C
- [Ian] we also run some tests from -55C up to 125C degrees; I agree
that the infrastructure needs to be designed to handle this
- [Brad] EST was one of the motivation factors for investing in embedded
boundary-scan. It simplifies the cabling to just an ethernet connection
which is the same interface needed for the functional tests. We had
tried running external JTAG cabling to the fixtures, but found the 50ft
cables became unreliable quite quickly. One of the greatest benefits
for running JTAG in EST is the tests are short enough to apply during
the ramp times. We have found this to be very important. Greg Jordan,
of Cisco, shared during last ITC how applying boundary-scan tests during
ramp times during EST found defects much faster then what their
functional tests were able to identify. He further stated that
sometimes the functional tests were unable to identify the failures
because the temperature of the board was able to stabilize before the
test could complete which masked the problem. It is why this ramp time
testing is important.
- [Ian] we do that routinely; I'm surprised so many people wait for a
stable temperature to run the test
- [Brad] as you said yourself the functional test can take quiet long,
sometimes as long or longer than the temperature cycle period
- [Ian] JTAG/Boundary Scan tests many parts of the circuitry at the same
time during interconnect test; opportunity to detect faults a functional
test on stable temperature may miss;
- [Brad] How about the tooling aspects for JTAG support of EST?
- [Ian] Tooling may not be much different than what general JTAG testing
requires.
- [Ian] Do you use stop on first failure or log each fault?
- [Ian] lots of data when running 100s or 1000s or test loops in a
temperature cycle; how to handle all the test result data;
- [Ian] I'm a proponent for running the test through, even though there
may be an early fault, in order to catch all the faults and to fix them
all at once
- [Adam] It depends on whether one fault will cause catastrophic
failures to damage the system.
- [Adam] there may be some faults that can be damaging to the UUT and
you should stop running the test right then
- [Ian] Still there is a need for peripheral monitoring of voltage and
current to shut down the system if a catastrophic failure occurs.
- [Brad] One difference I can see is the need for looping constructs in
the flow control of the application because you may want to run a set of
tests for the entire time duration. Some of the tooling does not
provide such capability in their tooling. The embedded software that is
available for free also does not contain such capability and so that
must be written into the process.
- [Brad] EST of systems vs. EST of Boards; even for multiple boards in
parallel, perhaps with special fixturing; include test adapter I/O
features or loopbacks to cover I/O on board edges; These fixtures are
not systems in themselves, but allow multiple boards to think they are
part of a system.
- [Ian] My comments for edge interfaces being excluded were more for
system level testing where the connectors are round military connectors
that rely on cable connections to other systems.
- [Brad] test management needs to differentiate between tests that can
be run in the system vs. tests that can be run at the board level with
test adapters / loopback connections on edges; need to have both sets of
tests stored on the board; We have found the Composite software design
pattern to be useful.
- [Adam] This is a similar issue in dealing with board level tests vs.
system level tests.
- [Adam] we seem to focus on "mechanical" faults (opens, shorts, etc.);
don't we have any silicon failures anymore
- [Ian] I tend to agree that we are mostly looking for connectivity
problems in EST/ESS;
- [Ian] We will probably catch these types of device failures with some
other testing.
- [Heiko] One would think the device vendor would have caught these
types of failures before shipping.
- [Ian (P)] But at system level, I don't expect to be looking for "Dead
on Arrival" device faults (board test should have got those) - I'm
looking at system integration faults, burn-in and in-service failures.
- [Heiko] some devices themselves may have some problems that cause
faults or even permanent damage (example: humidity enclosed in the IC
package, under certain conditions the IC package "popped open")
- [Ian] silicon faults are hard faults, not really intermittent faults
(e.g. a bonding problem or a void solder joint)
- [Patrick] It really depends on if the device vendor does environmental
stress test.
- [Brad] Boundary Scan exercises the UUT much more exhaustive/stressing
than functional test, possibly detecting problems that would never show
up in mission mode (e.g. FPGA that heats up so much during Boundary Scan
tests that it fell off the board because the solder balls melted. This
was a process problem for a batch of FPGAs and not a design defect.)
- [Brad] Another case in point is the familiar ground bounce problem we
have all experienced.
- [Ian] We have to limit vector transitions for the ground bounce problem.
- [Ian] Internal silicon failure generally manifests itself in the next
phase of testing.
- [Ian] Boundary Scan also exercises pins so that they change in a
way/sequence that mission mode may never do, detecting potential
problems functional test may not find;
- [Adam] Boundary Scan focuses on external connections (from edges of
the IC to the outside of the IC); does usually not use a significant
part of the silicon
[**Brad, not sure if he captured it correctly, requested clarification
of Adam's comment for the record from people**]
- [Ian (P)]Answering for Adam(!):
I interpreted that Adam meant that a BSCAN interconnect test wasn't going to
tell you if the gates within a device were all working properly - all you
could say for sure was that the TAP logic and the BSCAN IO cells were OK.
Some of the core may be getting stimulated but you don't get feedback on how
functional it is.
Having said that, a functional test may not be a whole lot better - If a
device has a faulty macrocell that your design doesn't use, then you won't
detect that fault; some microprocessor faults only show up with specific
combinations of data and instruction. But do you really want to do a full
INTEST, as intuitively, the number of device defects that won't manifest
themselves in some readily observable way must be quite small? It's a
cost-benefit trade off: Test cost (both development and execution) vs
potential for Escaping defects.
- [Patrick] unless you have some BIST inside the chips; SJTAG should
suggest BIST to be included in devices;
- [Brad] I agree; even for FPGA's it is possible to exercise the silicon
extensively with different loads that the vendor uses at the foundry
testing; It was always AT&T and Lucent's policy that all ASICs must
contain BIST technology to support testing.
- [Adam] what about Life Testing (system is run until it actually
fails), some may deem that Boundary Scan is not appropriate for such
applications; so far today we talked about environmental stress
screening as opposed to such "early end of life" testing;
- [Ian] Most times life cycle testing is specified by the customer
contract. The customer may actually require a focus on system behavior
(functional test) for Life Testing, taking away the option of Boundary
Scan;
- [Adam] You really need to know how long the system should live to be
able to set up the granularity of the testing.
- [Ian] Our life testing is used to prove the system meets the
requirements. We have a group of people dedicated to studying this.
- [Adam] As Brad pointed out that boundary scan stresses a system more
than mission mode, so does it adversely affect the life of the system
when performing the test?
- [Brad] You should be designing the board and system to support not
only mission mode but also test mode.
- [Ian] I can understand where Adam is coming from is there could be a
possibility that devices could be unduly stressed through the use of
boundary scan testing.
- [Adam] Boundary Scan testing may have an adverse effect on the systems
life time, since it is more stressful than functional test; this needs
to be taken into account; at the chip-level, Boundary Scan may be much
less stressful than mission mode (since there are only a small number or
transistors switching at any time, and at much lower speed than in
mission mode)
- [Adam] Life Test results are a fundamental input into the
environmental stress screening scheme
- [Adam] I was just trying to show that boundary scan could be more or
less stressful on a board and that testing needs to be calculated in the
life testing formula. My experience of stress screening is tightly
coupled to life cycle testing.
- [Yingwu (P)] A special case of ESS is burn-in which require a long
time. During the burn-in, we can finish the JTAG interconnect test and
FLASH programming such as long time tasks. This is very valuable. In the
case, the system JTAG is more convenient than board JTAG.
5. Schedule next meeting
Monday, March 31, 2008, 8:15am EDT
6. Any other business
none
7. Review new action items
none
8. Adjourned at 9:30am EDT
(moved by Ian, second by Heiko)
Many thanks again to Heiko and Peter for assisting in recording the
meeting notes!
Respectfully submitted,
Brad