Minutes of Weekly Meeting, 2008-04-07
Meeting was called to order at 8:25am EDT
1. Roll Call (Participants):
Brad Van Treuren
Carl Walker
Ian McIntosh
Timothy Pender
Heiko Ehrenberg
Adam Ley (joined at 8:28am)
Proxy additions provided by:
Yingwu Li
Ian McIntosh
Request for proxy feedback sent to:
Peter Horwood
Patrick Au
Carl Nielsen
Yingwu Li
Anthony Sparks
Jim Webster
2.a. Review and approve 3/19/2008 minutes
meeting minutes approved (moved by Adam, second by Ian)
2.b. Review and approve 3/31/2008 minutes
meeting minutes approved (moved by Ian, second by Carl)
3. Review old action items:
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the
ATCA specification if it is not going to be managed globally (All)
- Register on new SJTAG web site (http://www.sjtag.org) (All)
- All need to check and add any missing Doc's to the site (All)
- Respond to Brad and Ian with suggestions for static web site structure
(Brad suggests we model the site after an existing IEEE web site to ease
migration of tooling later) (All)
- Look at proposed scope and purpose from ITC 2006 presentation and
propose scope and purpose for ATCA activity group (All)
- Look at use cases and capture alternatives used to perform similar
functions to better capture value add for SJTAG (All)
- Volunteers needed for Use Case Forum ownership (All)
- Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
- Continue Structural Test use case discussion on SJTAG Forum page (All)
- We will need to begin writing a white paper for the System JTAG use
cases to provide to the ATCA working group (All)
Most likely, champions will own their subject section and draft the
section with help from others. This paper will be based on the paper
Gunnar Carlsson started in 2005.
- All: review how to use the forum
- Locate ATCA glossary of board and system states (Adam, Brad)
- Continue POST use case discussion on SJTAG Forum page (All)
- Adam review ATCA standard document for FRU's states
- Brad to review Service Availability Forum
4. Discussion Topics
- [Brad] all: we need champions for additional Use Cases
- [Brad] all: remember to frequently review and communicate on SJTAG
forum, as well
- SJTAG Value Proposition – Environmental Stress Screening
- [Brad] Review of 3/19 discussion topics
- [Brad] Can you clarify your comment, for the record, from the 3/19 meeting?
- [Brad] if we are doing things different with BScan vs. mission mode
(e.g. related to power consumption), then does BScan have an impact on
the life of the device?
- [Adam] having connections between external I/O and IC core is only
permissible in EXTEST if the signal does not put the core into a
hazardous state; also, toggle rate of external I/O signals controlled by
Boundary Scan is much lower than in mission mode, hence the power
consumption would be less (current draw is highest during switching, at
least in CMOS);
- [Timothy] current draw is determined by length of rising and falling
edges (rise and fall times);
- [Ian (P)] Tim and Adam raised two different points: Tim pointing out
that low slew rates (I guess typically TCK) will result in higher
current draw as the "push" and "pull" transistors will be on together
for longer time, and Adam noting that the relatively long Update DR
interval during a BSCAN test will produce a lower current draw than
functional mode might. I have to comment that my observation is that we
typically see many of our processor cards drawing half the current
during BSCAN testing than they would in functional mode, so Adam's view
seems to prevail. I guess most devices will buffer TCK, and so will many
board designs so these buffers should sharpen up the edges even if the
test controller has a poor clock.
I'm actually finding that we need to take even greater care in routing
TCK now because the newer buffers are so fast we are seeing greater
problems with ringing if the impedance control on the PCB isn't right
(and a lot of people seem to forget that this can't be "fixed" by
reducing the TCK frequency).
- [Brad] IEEE 1149.6 was discussing an Initialize function as an addendum
to that would keep the core in a known state; was in touch with Ken
Parker, who forwarded the note to Bill Eklow as well;
- [Adam] I don't recall the addendum going that far, but rather that it
specified initializations (e.g. of I/O) before going into EXTEST;
- [Adam] Having signals propagate to the core is only permissible where
the signals will not cause the core to go into a hazardous rate. The
toggle rate with BScan is orders of magnitude slower than normal mode so
the stress is thought to be less than in functional mode.
- [Timothy] As long as you have reasonable fast rise and fall times. Slow
rise and fall times will generally yield higher current draw.
- [Adam] Slower transition times during EXTEST could affect the current
draw during these transitions or this could cause oscillations in the
core. Typically, the IO in EXTEST should be configured similarly to what
the functional mode is so the rise and fall times should be the same. We
need to look for cases where the rise and fall times are different. It
is generally preferred to isolate the core during EXTEST if the
environment is not well understood.
- [*********************** I missed the middle of Adam's comment. Any help
filling in the blanks? ***************************]
- [Adam] No bias intended, my background at TI was that INTEST and EXTEST
used the same IO mechanisms where the values...core has static stable
stimulus and therefore a static stable state.
- [Ian (P)] I think he said that both INTEST and EXTEST utilised the same
internal op-codes and that it was possible to pre-load the DR presented
to the core in order that it had stable stimulus.
- [Adam] The toggle rate is a function of the time between one update DR
and the next.
- [Brad] Talked about Ken Parker's response and that 1149.6 is suppose to
have an Annex about an "Initialize" instruction...Brad to send out Ken's
response to the sjtag team.
- [Brad (P)] The 1149.6 Annex referred to by Ken is the Annex E.
- [Adam] I would like to see this response. Generally an Annex is not so
strong as stating a position on an issue but more revealing what the
problem is and suggested methods to deal with these issues.
- [Adam] The definition of EXTEST already deals with requiring the core be
isolated if there is a potential for a hazardous state upon entry.
- [Brad] In response to Heiko's comment on 3/19 regarding device failures
being usually caught by the device vendor prior to shipping, Ian made a
follow-up comment stating, "But at system level, I don't expect to be
looking for 'Dead on Arrival' device faults (board test should have got
those) - I'm looking at system integration faults, burn-in and
in-service failures.". Does anyone want to follow up with more comments?
- [Adam] In my view, we are seeing much greater use of stress testing at
the board level. I guess that says 2 things, if deferring stress test to
the board level, one would expect these types of failures to occur at
the board stress level. If it is not system level test, it is something
that looks a lot like system level test.
- [Ian] We do separate what we do at the board level from the system
level. At the system level we are looking more at the integration
faults. Typically, it is hard to perform functional test with just a
rack of identical boards and not a whole system when stressing just
boards. BScan is useful to detect faults here to run as pre-stress and
post-stress processes.
- [Brad] Yingwu made a comment following up to the 3/19 minutes as, "A
special case of ESS is burn-in which require a long time. During the
burn-in, we can finish the JTAG interconnect test and FLASH programming
such as long time tasks.This is very valuable. In the case, the system
JTAG is more convenient than board JTAG."
- [Ian] Yingwu also talked about performing FLASH programming during the
first long burn-in cycle to reduce process time. This is an interesting
concept.
- [Brad] Tooling issues?
- [Brad] Do tests in ESS need to be synchronized to the environmental
conditions?
- [Ian] This may be part of the overall ESS control, so it would be no
problem for the external test controller; but how would the embedded
test controller know when the environmental conditions are as desired
and that the test needs to start?
- [YingWu(P)] There are no problem for the embedded controller. We have a
Ethernet port connecting to the embedded controller. Out of the ESS
room, we can control the FLASH programming and FLASH test and others
tests through the Ethernet port.
- [Brad] Control flow needs to be decoupled from programs containing the
actual test pattern / test vectors
- [YingWu(P)] The test pattern can be downloaded to the master board or
stored in the master board or included in the control flow.
- [Ian] For porting tests from board level to system level, tests have to
be separate from the flow
- [Ian] A lot of tooling allows you to do something like that. We have
things built into our test sequencer that is controlling the chamber and
the system.
- [Brad] External or internal software?
- [Ian] Generally, this is external software. We used to have an embedded
BScan chip that would help the functional test (TCON chip).
- [Ian] If you have an external test controller, there are no
synchronization issues because the external box controls the flow. The
problem comes when the controller is embedded in the system.
- [Brad] We typically use embedded BScan so we can use a single
communications interface for all the tests - both BScan and functional.
- [Ian] Our external tester runs like it is doing POST and then continues
beyond that with EST mode testing.
- [Brad] Our TFCL allows us to decouple the tests from the control flow.
- [Ian] I think the issue of control flow becomes more important as we
move into the system level.
- [Ian (P)] Yingwu has offerred up a comment on using ethernet to
synchronise the start of testing when using an embedded controller: I
think he's taken my rhetorical question literally! He's quite right of
course, that is a possible solution, but in the end, ethernet isn't part
of 1149.1 so it isn't something we can expect to be present. I suppose
it raises the question of whether the method of synchronisation needs to
be part of a SJTAG standard or whether is simply needs a recommendation
that some mechanism should be provided, since this is really starting to
touch on the functional design of the UUT.
- [Timothy] You want to have good diagnostics - not just a pass fail
because of how long it takes to perform the setup and test operation.
Flow control is important when it comes to embedding JTAG into the
product. We have used the free stuff out there and there is no flow
control. Everyone seems to be doing ad hoc flow control right now.
- [Ian] Agree with Tim that you need a good diagnostic out of it.
- [**************** Can people comment on the aspect of diagnostics if it
may be off-line (captured failure information processed later) or does
it require real-time diagnostics to match the flow coming from the
functional test process?*****************************]
- [Ian (P)] Depends on the definition of "real-time": To be able to
properly analyze your fault syndromes and arrive at a good diagnostic,
you need to have completed a "set" of tests so that you separate out the
bridging faults from the stuck-at faults etc. You could then argue that
in EST you should wait till you have all the test runs completed so that
you can inspect for possible correlation of apparently discrete faults
across test runs. For practicality though, I feel that becomes a tough
ask for tooling and is probably best left to an engineer's assessment!
From a point of view of monitoring EST progress, I would like to have
access to the diagnostics as each test cycles completes, so that I can
decide whether or not it is worth continuing to run the EST as a) We
might be wasting chamber time and b) We could be lifing the UUT needlessly.
- [Tim] You may have an intermittent problem that makes it difficult.
- [Carl W.] This is one of issues we have to deal with at Cisco with
complex boards where we have to capture the failing vectors before
continuing.
- [Ian] Power on ramp up - power down ramp down.
- [Carl W.] We try to scan them during quick ramps both ramp up and ramp down.
- [Ian] We have a lot of metal in our systems so we find the ramp times
slow, but try to make them as fast as possible.
- [Carl W.] We took a sledge hammer approach to run as many tests as we
can on the ramp cycles.
- [Ian] We do have some thermal shock chambers, but these are not used
very often.
- [Carl W.] Mechanical Stress Testing - BScan catching more than what
functional test does when done during ramp. It can be more expensive to
run due to the cost of the resources used. It is not clear yet the
correlation of effectiveness of the BScan at ramp compared to function
test at static temps in a system. My boss knows more about this subject
and talks about it quite often. I will see if I can find some more
information on the subject.
- [Ian] Our testing is limited to what the customer requires. It tends to
be written in the contract that ESS be performed because that is what
they always do - not because it is the best thing to do. The space
programs are different where they think about why they want ESS.
- [Timothy] The other point for good diagnostics, every time you ramp up
and down, the ground testing is typically more than 50% of the product
life line. The less you can run ESS with good coverage the better.
- [Ian] We do have limits on the number of ESS runs we can do as these
reduce the life of the product.
- [Timothy] You really want good diagnostics as part of the ESS; flow
control is important for the test control; standardized is better than
many different ad-hoc implementations
- [Brad] We, as a group, need to look into defining the requirements for
the flow control;
- [Brad] Reviewed the remaining use case topic we need to discuss.
- [Ian] I suggest we discuss BIST as next topic, since it is related to POST
- [Brad] agreed
- [Brad] Ian, can you host that discussion as moderator since I am sure
people are tired of hearing my voice leading things right now.
- [Ian] That would be fine.
5. Schedule next meeting
Monday, April 14, 2008, 8:15am EDT: Topic is BIST Use Case
6. Any other business
- [Ian] Would like a backup admin for the web.
- ATCA FRU states if available
- SAF FRU states if available
7. Review new action items
- Brad send out Ken Parker's response on the device stress issue
8. Adjourned at 9:40am EDT
(moved by Tim, second by Carl)
Many thanks to Heiko for assisting in taking notes for these minutes.
Respectfully submitted,
Brad