Minutes of Weekly Meeting, 2009-04-20

Meeting called to order at 10:34 AM EDT

1. Roll Call

Eric Cormack
Ian McIntosh
Brian Erickson
Adam Ley (left 11:07))
Brad Van Treuren
Tim Pender
Carl Walker
Heiko Ehrenberg

Excused:
Patrick Au

2. Review and approve previous minutes:

4/13/2009 minutes:

  • Draft circulated on 6th April:
  • Corrections:
  • In Review of Actions:
    • Delete " - Discussed in topic 4b." from 7th action
    • Change "as discussed today" to "as discussed Apr 6th" in 16th action
    • Change "as discussed today" to "as discussed Apr 6th" in 17th action
  • In Discussion Topic 4a:
    • Change "[Brad] Yes, it seem very different ..." to "[Brad] Yes, it seems very different ..."
  • Brad moved to approve with the above amendments, Heiko seconded, no objections.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Adam review ATCA standard document for FRU's states
  • Patrick contact Cadence for EDA support person.
  • All to consider what data items are missing from Data Elements diagram
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient?
    see also Gunnar's presentation, in particular the new information he'd be looking for in a test language (http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
  • Carl W/Andrew: Set up conference call to organise review of Vol. 3 - Ongoing
  • Andrew: Make contact with VXI Consortium/Charles Greenberg. - Ongoing
  • Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
  • All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
  • All: Consider structure/content of survey - Ongoing
  • Harrison: Virtual system exploiting Configuration/Tuning/Instrumentation and Root Cause Analysis/Failure Mode Analysis Use Cases. - Ongoing
  • Brad: Virtual system exploiting POST and BIST Use Cases. - Ongoing
  • Ian: Virtual system exploiting Environmental Stress Test Use Cases. - Ongoing
  • Brad/Ian - Prepare draft survey for review by group. - Ongoing.

4. Discussion Topics

    1. Group Focus
      • [Ian] Some of our recent discussions have maybe be going a bit deep and not making the progress we'd like, so I thought we should step back a bit and use the results from last year's survey to look at what others thought were the important objectives for SJTAG.
      • [Ian] Referring to Priority Objectives in the results summary on the web site (http://www.sjtag.org/misc/results_aug08.html), there are seven headings that show as being of greatest importance: Two of those 'Common Test Language' and 'Defined Data Formats' are subjects we've struggled with a bit recently, so should we set those aside for now?
      • [Eric] Yes, I think so.
      • [Ian] System Diagnostics is then the next biggest topic. Can we consider what that means for SJTAG?
      • [Brad] I think 2 (System Diagnostics) and 3 (Reuse of Board Test Vectors) are maybe related. Part of System Diagnostics is board diagnostics and that can mean reusing board-level tests.
      • [Brad] Look at Gunnar's system-level control of board-level tests, where the tests are embedded in the board and you can say to a board "Go test yourself and report what you find".
      • [Brad] Compare that with the multidrop architecture where a Shelf Controller is in charge of running tests that have been repurposed from the board-level. In system JTAG, how do these different architectures impact System Diagnostics? e.g. how do you report back which board is failing?
      • [Ian] I think that the purely external test case is very similar to the multidrop scenario you describe, Brad.
      • [Brad] In the external case you have the benefit of access to the whole tool set, but the principle is essentially similar.
      • [Brad] Thinking back to the slides I presented a few weeks back (23rd Feb.) where I described the typical functional test model with diagnostic software interacting with diagnostics agents on each board: Gunnar's approach maps quite neatly onto that, whereas the Shelf Controller maybe doesn't follow the functional test model in terms of locality of control.
      • [Tim] If you're looking at the Agent's perspective, it could be a set of register values being returned for diagnostic evaluation. It then depends on how you're decoding the data in those registers.
      • [Brad] It's interesting that you talk about diagnostics based on a register set. Mostly tooling looks at the super-vector for the whole chain. Is there some way we leverage registers for our use?
      • [Ian] Brad, is this similar to the discussion you had with Paul Holowko of BAE Systems at ITC? I recall some discussion of fault codes and failure groups.
      • [Brad] That's where I struggle; one of the issues we have to deal with is faults in the field that are then No Fault Found when the FRU get to the repair station. In those cases, having a snapshot of all the data may be very important.
      • [Tim] It could be a connection on the backplane; you change the FRU and re- make the backplane connection so it works. I don't think you can rely on the agent to have all the diagnostics.
      • [Ian] I think that is what Brad was saying: in the case of an external controller, extra diagnostic tools can be made available; but what about the case of embedded controllers?
      • [Brad] Right; external tooling should be able to partition result pattern to identify Boundary Scan register cells, pins, and nets involved in a particular fault.
      • [Brad] If the same error occurs on multiple boards, is that due to environmental issues, is it due to a problem on the backplane, etc? Having recorded response pattern (from the system being employed in the field) will be helpful in determining the root cause for certain faults.
      • [Tim] Maybe those environmental test are the cases when you would use an external control for the tests. Then you can have all the diagnostics tools available.
      • [Ian] I think Brad is referring to environmental effects in the field rather than a controlled environmental test.
      • [Brad] Correct. For example, we had seen (and identified) problems related to high humidity that haven't been encountered before in controlled test environments. Ian I guess you could have similar problems on radars if there's a broken seal somewhere?
      • [Ian] Yes, and I've seen space applications where poor processing results in outgassing at connectors, causing opens.
      • [Brad] Perhaps SJTAG should define some sort of SJTAG interface (maybe through Ethernet or something that is commonly used) and defined data formats that would allow external equipment to access internal system functions/data.
      • [Ian] I just wonder if that might end up trespassing on some existing IP that has been produced for similar functions.
      • [Brad] If you'd define a software interface, a messaging protocol, you'd not need a separate hardware interface, for example.
      • [Tim] USB is basically a two-wire interface and it seems that it might suit converting to 1149.7.
      • {Adam's connection was dropped at this point}
      • [Brad] Security would be an issue, especially if it is a shared interface also used for system functions.
      • [Brad] Something else I'd like to discuss is the difference in diagnostics between interconnect tests and other operations like programming or instrumentation.
      • [Ian] Are you looking at the type of data or the format?
      • [Brad] I'm looking at it from the test flow perspective.
      • [Brad] There are primitives that are the same across Use Cases. At some point we separate out to Use Case specifics. Where is that point? Is there commonality in the core of diagnostic results for the different use cases? Or is the split too early?
      • [Ian] I'd say it is: Tooling for device programming typically won't give you much diagnostic other that 'this vector failed'. But maybe I'm taking too narrow a view here - thinking about it, this probably really a case that most tooling is aimed at a specific Use Case, so while there may be diagnostic data in the return vectors of a failed programming operation it may be getting discarded.
      • [Brad] Yes, I think that's what I see too. Fault analysis is the Utopia, but we live with the fact that we just report that the pin state isn't what was expected.
      • [Brad] Instrumentation and programming, for example, can be compared in a sense that both write to registers and results are read back from registers. Interconnect tests can be looked at in a similar way. However, tools are written for specific Use Cases; perhaps you can't tie them together.
      • [Ian] It may be more historical than technical. Maybe the tool vendors have more of a view on this?
      • [Brad] Are we misrepresenting the position here? I'd be keen to hear what the tool vendors have to say.
      • [Heiko] I kind of agree. There are certain Use Cases that could be more informative. But look at something like Flash programming. You could run all the structural tests and cluster tests and find nothing wrong, but still get a programming fail because a pin on the Flash isn't connected. What you get may not represent the right type of error.
      • [Brad] I'd agree, but are there some common data types or operations? Are these coming down to some set of registers?
      • [Brian] Sounds like interpretation of the resulting data. There seem to be two issues: How to store result data in an onboard/embedded controller, and how to analyze that data, e.g. you could report back which bits are failing. I think we're more used to dealing with the primitives; trying to deal with things at this level could be difficult.
      • [Brad] Good point, we're more interested in where it fails.
      • [Brian] You could say 'This location failed' and which vector it failed in.
      • [Brad] There was an IEEE standard about representation of diagnostic data. Teradyne were involved in it.
      • [Ian] Possibly IEEE 1445, Standard for Digital Test Interchange Format [DTIF]?
      • [Brad] That sounds like the right one. I'm not say we need to follow it, but it may give us some pointers on what we need to think about.
      • [Adam, proxy] Concerning the IEEE standard about representation of diagnostic data, I presume that "STDF" (Standard Test Data Format) is what's intended (ref http://stdf.nanoisi.com)
      • [Brian] We have to be careful about possibly over-specifying. We would want to make sure not to be too restrictive, so we don't miss out on some failure information that may not fit the fault types in our definitions.
      • [Brad] I think there's some value in defining some elements in a persistent form.
      • [Brian] The concern is also that the results are variable in size. Consider a catastrophic failure, let's say a clock signal that is stuck, so every other vector might fail, resulting in a lot of diagnostic data.
      • [Brad] Typically we have an upper bound; if we exceed that then some alternate strategy has to be used.
      • [Tim] Also, we need to think about test flow control: Do you stop on fail or continue? If you fail to configure LVDS pins would you want to continue if there's a risk of damage? Under which circumstances do we want to continue a test vs. skip a test vs. execute a specific test? Can we include some test flow control information in the diagnostic pattern?
      • [Ian] This is similar to issue I raised when we talked about EST: Critical failures vs. noncritical failures. Criticality isn't something that JTAG really has a concept of.
      • [Brian] Probably have to some flag to indicate 'stop on first fail'.
      • [Brad] We're able to provide an adornment in a layer above the SVF or STAPL to help with this. And there are many cases where you do want to carry on and complete all the tests.
      • [Heiko] A simple example is checking the scan chain; there's little point in carrying on if that fails.
      • [Ian] I think we're going to have to continue this discussion; I think there's more to come out of this yet.
      • [Eric] Oh yes, we haven't really addressed multilayered diagnostics yet.
      • [Brad] Also, we now see systems with a lot of redundancy, so I can bypass a fault to keep running, maybe with reduced performance.
      • [Eric] We touched on that when we discussed autodial up in the event of a fault: Let the redundancy kick in, but contact base to report the fault. There's more room for discussion.
      • [Brad] When you get into network filters, with ganging of FPGAs and DSPs to boost performance, you can lose one but still operate, just maybe not so well.
      • [Ian] We can have multiple identical processing elements that can handoff tasks in the event of failure or reduce functionality.
      • [Brad] So can BScan help in these applications?
      • [Carl] We even have nested hierarchies which provide similar features.
      • [Brian] Do we need to address false failures? What about reconfiguring of an FPGA after a corruption?
      • [Brad] Maybe that's outside of our scope - a Diagnostics Manager.
      • [Brian] OK, I just wanted see what we thought was in the scope of our Use Cases.
      • [Brad] Ultimate invocation may be in one of our Use Cases, but the decision taking may be a system issue.
      • [Ian] OK, I think we'll continue this discussion next week.

 

  1. April Newsletter
    • [Ian] I've circulated a draft Newsletter. It's not due out until next week, so I'm not proposing that we approve this yet.
    • [Ian] I'd ask that you consider during the course of this week if there are any other items we ought to add in.

5. Schedule next meeting

Schedule for April 2009:
Monday Apr. 27, 2009, 10:30 AM EDT

Schedule for May 2009:
Monday May 4, 2009, 10:30 AM EDT
Monday May 11, 2009, 10:30 AM EDT
Monday May 18, 2009, 10:30 AM EDT

6. Any other business

None

7. Review new action items

None

8. Adjourn

Moved to adjourn at 11:38 AM EDT by Eric, seconded by Carl.

Thanks to Heiko for supplying additional notes.

Respectfully submitted,
Ian McIntosh