Minutes of Weekly Meeting, 2008-04-07

Meeting was called to order at 8:25am EDT

1. Roll Call (Participants):

Brad Van Treuren
Carl Walker
Ian McIntosh
Timothy Pender
Heiko Ehrenberg
Adam Ley (joined at 8:28am)

Proxy additions provided by:
Yingwu Li
Ian McIntosh

Request for proxy feedback sent to:
Peter Horwood
Patrick Au
Carl Nielsen
Yingwu Li
Anthony Sparks
Jim Webster

2.a. Review and approve 3/19/2008 minutes

meeting minutes approved (moved by Adam, second by Ian)

2.b. Review and approve 3/31/2008 minutes

meeting minutes approved (moved by Ian, second by Carl)

3. Review old action items:

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Register on new SJTAG web site (http://www.sjtag.org) (All)
  • All need to check and add any missing Doc's to the site (All)
  • Respond to Brad and Ian with suggestions for static web site structure (Brad suggests we model the site after an existing IEEE web site to ease migration of tooling later) (All)
  • Look at proposed scope and purpose from ITC 2006 presentation and propose scope and purpose for ATCA activity group (All)
  • Look at use cases and capture alternatives used to perform similar functions to better capture value add for SJTAG (All)
  • Volunteers needed for Use Case Forum ownership (All)
  • Continue Fault Injection/Insertion discussion on SJTAG Forum page (All)
  • Continue Structural Test use case discussion on SJTAG Forum page (All)
  • We will need to begin writing a white paper for the System JTAG use cases to provide to the ATCA working group (All)
    Most likely, champions will own their subject section and draft the section with help from others. This paper will be based on the paper Gunnar Carlsson started in 2005.
  • All: review how to use the forum
  • Locate ATCA glossary of board and system states (Adam, Brad)
  • Continue POST use case discussion on SJTAG Forum page (All)
  • Adam review ATCA standard document for FRU's states
  • Brad to review Service Availability Forum

4. Discussion Topics

  • [Brad] all: we need champions for additional Use Cases
  • [Brad] all: remember to frequently review and communicate on SJTAG forum, as well
  1. SJTAG Value Proposition – Environmental Stress Screening
    • [Brad] Review of 3/19 discussion topics
    • [Brad] Can you clarify your comment, for the record, from the 3/19 meeting?
    • [Brad] if we are doing things different with BScan vs. mission mode (e.g. related to power consumption), then does BScan have an impact on the life of the device?
    • [Adam] having connections between external I/O and IC core is only permissible in EXTEST if the signal does not put the core into a hazardous state; also, toggle rate of external I/O signals controlled by Boundary Scan is much lower than in mission mode, hence the power consumption would be less (current draw is highest during switching, at least in CMOS);
    • [Timothy] current draw is determined by length of rising and falling edges (rise and fall times);
    • [Ian (P)] Tim and Adam raised two different points: Tim pointing out that low slew rates (I guess typically TCK) will result in higher current draw as the "push" and "pull" transistors will be on together for longer time, and Adam noting that the relatively long Update DR interval during a BSCAN test will produce a lower current draw than functional mode might. I have to comment that my observation is that we typically see many of our processor cards drawing half the current during BSCAN testing than they would in functional mode, so Adam's view seems to prevail. I guess most devices will buffer TCK, and so will many board designs so these buffers should sharpen up the edges even if the test controller has a poor clock.
      I'm actually finding that we need to take even greater care in routing TCK now because the newer buffers are so fast we are seeing greater problems with ringing if the impedance control on the PCB isn't right (and a lot of people seem to forget that this can't be "fixed" by reducing the TCK frequency).
    • [Brad] IEEE 1149.6 was discussing an Initialize function as an addendum to that would keep the core in a known state; was in touch with Ken Parker, who forwarded the note to Bill Eklow as well;
    • [Adam] I don't recall the addendum going that far, but rather that it specified initializations (e.g. of I/O) before going into EXTEST;
    • [Adam] Having signals propagate to the core is only permissible where the signals will not cause the core to go into a hazardous rate. The toggle rate with BScan is orders of magnitude slower than normal mode so the stress is thought to be less than in functional mode.
    • [Timothy] As long as you have reasonable fast rise and fall times. Slow rise and fall times will generally yield higher current draw.
    • [Adam] Slower transition times during EXTEST could affect the current draw during these transitions or this could cause oscillations in the core. Typically, the IO in EXTEST should be configured similarly to what the functional mode is so the rise and fall times should be the same. We need to look for cases where the rise and fall times are different. It is generally preferred to isolate the core during EXTEST if the environment is not well understood.
    • [*********************** I missed the middle of Adam's comment. Any help filling in the blanks? ***************************]
    • [Adam] No bias intended, my background at TI was that INTEST and EXTEST used the same IO mechanisms where the values...core has static stable stimulus and therefore a static stable state.
    • [Ian (P)] I think he said that both INTEST and EXTEST utilised the same internal op-codes and that it was possible to pre-load the DR presented to the core in order that it had stable stimulus.
    • [Adam] The toggle rate is a function of the time between one update DR and the next.
    • [Brad] Talked about Ken Parker's response and that 1149.6 is suppose to have an Annex about an "Initialize" instruction...Brad to send out Ken's response to the sjtag team.
    • [Brad (P)] The 1149.6 Annex referred to by Ken is the Annex E.
    • [Adam] I would like to see this response. Generally an Annex is not so strong as stating a position on an issue but more revealing what the problem is and suggested methods to deal with these issues.
    • [Adam] The definition of EXTEST already deals with requiring the core be isolated if there is a potential for a hazardous state upon entry.
    • [Brad] In response to Heiko's comment on 3/19 regarding device failures being usually caught by the device vendor prior to shipping, Ian made a follow-up comment stating, "But at system level, I don't expect to be looking for 'Dead on Arrival' device faults (board test should have got those) - I'm looking at system integration faults, burn-in and in-service failures.". Does anyone want to follow up with more comments?
    • [Adam] In my view, we are seeing much greater use of stress testing at the board level. I guess that says 2 things, if deferring stress test to the board level, one would expect these types of failures to occur at the board stress level. If it is not system level test, it is something that looks a lot like system level test.
    • [Ian] We do separate what we do at the board level from the system level. At the system level we are looking more at the integration faults. Typically, it is hard to perform functional test with just a rack of identical boards and not a whole system when stressing just boards. BScan is useful to detect faults here to run as pre-stress and post-stress processes.
    • [Brad] Yingwu made a comment following up to the 3/19 minutes as, "A special case of ESS is burn-in which require a long time. During the burn-in, we can finish the JTAG interconnect test and FLASH programming such as long time tasks.This is very valuable. In the case, the system JTAG is more convenient than board JTAG."
    • [Ian] Yingwu also talked about performing FLASH programming during the first long burn-in cycle to reduce process time. This is an interesting concept.
    • [Brad] Tooling issues?
    • [Brad] Do tests in ESS need to be synchronized to the environmental conditions?
    • [Ian] This may be part of the overall ESS control, so it would be no problem for the external test controller; but how would the embedded test controller know when the environmental conditions are as desired and that the test needs to start?
    • [YingWu(P)] There are no problem for the embedded controller. We have a Ethernet port connecting to the embedded controller. Out of the ESS room, we can control the FLASH programming and FLASH test and others tests through the Ethernet port.
    • [Brad] Control flow needs to be decoupled from programs containing the actual test pattern / test vectors
    • [YingWu(P)] The test pattern can be downloaded to the master board or stored in the master board or included in the control flow.
    • [Ian] For porting tests from board level to system level, tests have to be separate from the flow
    • [Ian] A lot of tooling allows you to do something like that. We have things built into our test sequencer that is controlling the chamber and the system.
    • [Brad] External or internal software?
    • [Ian] Generally, this is external software. We used to have an embedded BScan chip that would help the functional test (TCON chip).
    • [Ian] If you have an external test controller, there are no synchronization issues because the external box controls the flow. The problem comes when the controller is embedded in the system.
    • [Brad] We typically use embedded BScan so we can use a single communications interface for all the tests - both BScan and functional.
    • [Ian] Our external tester runs like it is doing POST and then continues beyond that with EST mode testing.
    • [Brad] Our TFCL allows us to decouple the tests from the control flow.
    • [Ian] I think the issue of control flow becomes more important as we move into the system level.
    • [Ian (P)] Yingwu has offerred up a comment on using ethernet to synchronise the start of testing when using an embedded controller: I think he's taken my rhetorical question literally! He's quite right of course, that is a possible solution, but in the end, ethernet isn't part of 1149.1 so it isn't something we can expect to be present. I suppose it raises the question of whether the method of synchronisation needs to be part of a SJTAG standard or whether is simply needs a recommendation that some mechanism should be provided, since this is really starting to touch on the functional design of the UUT.
    • [Timothy] You want to have good diagnostics - not just a pass fail because of how long it takes to perform the setup and test operation. Flow control is important when it comes to embedding JTAG into the product. We have used the free stuff out there and there is no flow control. Everyone seems to be doing ad hoc flow control right now.
    • [Ian] Agree with Tim that you need a good diagnostic out of it.
    • [**************** Can people comment on the aspect of diagnostics if it may be off-line (captured failure information processed later) or does it require real-time diagnostics to match the flow coming from the functional test process?*****************************]
    • [Ian (P)] Depends on the definition of "real-time": To be able to properly analyze your fault syndromes and arrive at a good diagnostic, you need to have completed a "set" of tests so that you separate out the bridging faults from the stuck-at faults etc. You could then argue that in EST you should wait till you have all the test runs completed so that you can inspect for possible correlation of apparently discrete faults across test runs. For practicality though, I feel that becomes a tough ask for tooling and is probably best left to an engineer's assessment!
      From a point of view of monitoring EST progress, I would like to have access to the diagnostics as each test cycles completes, so that I can decide whether or not it is worth continuing to run the EST as a) We might be wasting chamber time and b) We could be lifing the UUT needlessly.
    • [Tim] You may have an intermittent problem that makes it difficult.
    • [Carl W.] This is one of issues we have to deal with at Cisco with complex boards where we have to capture the failing vectors before continuing.
    • [Ian] Power on ramp up - power down ramp down.
    • [Carl W.] We try to scan them during quick ramps both ramp up and ramp down.
    • [Ian] We have a lot of metal in our systems so we find the ramp times slow, but try to make them as fast as possible.
    • [Carl W.] We took a sledge hammer approach to run as many tests as we can on the ramp cycles.
    • [Ian] We do have some thermal shock chambers, but these are not used very often.
    • [Carl W.] Mechanical Stress Testing - BScan catching more than what functional test does when done during ramp. It can be more expensive to run due to the cost of the resources used. It is not clear yet the correlation of effectiveness of the BScan at ramp compared to function test at static temps in a system. My boss knows more about this subject and talks about it quite often. I will see if I can find some more information on the subject.
    • [Ian] Our testing is limited to what the customer requires. It tends to be written in the contract that ESS be performed because that is what they always do - not because it is the best thing to do. The space programs are different where they think about why they want ESS.
    • [Timothy] The other point for good diagnostics, every time you ramp up and down, the ground testing is typically more than 50% of the product life line. The less you can run ESS with good coverage the better.
    • [Ian] We do have limits on the number of ESS runs we can do as these reduce the life of the product.
    • [Timothy] You really want good diagnostics as part of the ESS; flow control is important for the test control; standardized is better than many different ad-hoc implementations
    • [Brad] We, as a group, need to look into defining the requirements for the flow control;
    • [Brad] Reviewed the remaining use case topic we need to discuss.
    • [Ian] I suggest we discuss BIST as next topic, since it is related to POST
    • [Brad] agreed
    • [Brad] Ian, can you host that discussion as moderator since I am sure people are tired of hearing my voice leading things right now.
    • [Ian] That would be fine.

5. Schedule next meeting

Monday, April 14, 2008, 8:15am EDT: Topic is BIST Use Case

6. Any other business

  • [Ian] Would like a backup admin for the web.
  • ATCA FRU states if available
  • SAF FRU states if available

7. Review new action items

  • Brad send out Ken Parker's response on the device stress issue

8. Adjourned at 9:40am EDT

(moved by Tim, second by Carl)

Many thanks to Heiko for assisting in taking notes for these minutes.

Respectfully submitted,
Brad