Minutes of Weekly Meeting, 2008-05-19

Meeting was called to order at 8:25am EDT

1. Roll Call (Participants):

Brad Van Treuren
Carl Nielsen
Carl Walker
Peter Horwood
Heiko Ehrenberg
Tim Pender

Proxy feedback provided by:
Ian McIntosh
Yingwu Li

2. Review and approve minutes

  1. 5/12/2008 minutes approved (Carl N. moved, Heiko second)

3. Review old action items:

4. Discussion Topics

  1. SJTAG Value Proposition – Configuration and Tuning
    • [Brad] Did not want to drop the conversation since there were a lot of proxy feedback statements
    • [Heiko] P1149.7 group right now is also discussing language issues to try to describe feature and structure. What do 7 introduces is the addressing mechanism for the star configuration. With a 2 pin or 4 pin interface, the addressing scheme uses a TAP controller address needs to be specified on a reference designator context. Each instance of a device must have its own TAP controller address. Thus, the address cannot be part of any standard BSDL description because the information is different per instance. They are proposing using a separate file for address assignment to device reference designators on the Unit Under Test and BSDL Extensions for common dot 7 features.
    • [Tim] Doesn’t it also use bidirectional TDI and TDO?
    • [Heiko] Yes.
    • [Tim] What about devices that have multiple cores in a device?
    • [Heiko] Each core would have to have its own TAP controller, I think. Not their own TAP Pins, but TAP Controller.
    • [Heiko] Dot 7 does not specify how you obtain the addressing definition
    • [Brad] Was the discussion on this use case helpful for people and did you find you were able to gain a better understanding of what can be done with dot 1?
    • [Heiko] The discussion helped me get a clearer picture of this use case
    • [Ian (P)] The dot 7 issue raises the point I mentioned in a previous e-mail: Should SJTAG concentrate on current technologies or embrace new (future) technologies. I think dot 7 might be an "all or nothing" step, and therefore possibly a step too far. Nevertheless, the concept of being able to explicitly address individual devices is appealing, whether for config/tuning, programming or anything else.
  2. Root Cause Analysis
    • [Brad] Some cases that I see the 1149.1 interrogations useful for are:
      1. Trending same failures on boards indicating a manufacturing problem, design problem, or possibly thermal hot spot problem in a chassis design
      2. Same failure of a device – especially the same code response from BIST operations. This is good to identify if a failure is isolated to a particular lot of devices.
      3. The ability to dump the contents of a device configuration to identify if a configuration changed causing the failure. I had an example of an older Xilinx FPGA a while ago that was failing as the device would just stop working. Xilinx had us attempt to dump the contents of the FPGA programming using 1149.1 as much as we could to identify if a failure occurred in the configuration. With the data we captured, Xilinx was able to identify a process problem in a new foundry that was making that lot of devices and resolve the rather obscure problem fairly quickly. This was a good story about how a device vendor understood the importance of good tooling and how to leverage it and design for its use.
    • [Peter] Isolating a failure to a component rather than an FRU is most important for this in a simple sense
    • [Brad] using SAMPLE can help identify the state of a board, may give clues of where a failure is when looking at signal states
    • [Heiko] Fault Injection crosses over into this use case as well.
    • [Brad] Where have you found Boundary Scan of most use to identify failures in the field?
    • [Tim] SAMPLE though is just a static snapshot of the board state and does not really show you what is happening on signals.
    • [Carl W.] There is a potential that customers may really appreciate this type of use case – especially in highly available systems where they want to know “why” it failed.
    • [Tim] system level requires special strategy to tap into extra signal lines that may be required for failure analysis; "ChipScope", "Signal TAP" has been useful for troubleshooting, more dynamic than SAMPLE;
    • [Tim] Designers like debug tools like ChipScope, SignalTAP, and others at the device level so at a board or system level this would be useful. We also brought back the appropriate lines for these technologies in our board level TAP Mux devices to be able to use these as well at the board and system level. It lets us gather information about what is going on inside the device as well.
    • [Brad] there are more tools available for device level failure analysis; there are no complementary tools at the system level; the quasi-static SAMPLE cannot provide the granularity that may be needed to capture dynamic problems / failures;
    • [Brad] some instrumentation interfaces may lend themselves to provide access to device level functions that support board level failure analysis (e.g. iBIST and some BERT embedded tooling)
    • [Brad] Given it is 9:00, I want to table our discussion on this topic and deal with the new business items.
    • [Yingwu (P)] We may use SAMPLE to capture the static or very Low-speed signals to help identify the state of a board. But most are High-speed sigals. So SAMPLE is limited.
    • [Ian (P)] I think this Use Case is one I'll have trouble with. I guess the "value" is probably derived mainly from the application field and customer expectation within that field, and I'm not sure I can see the avionics industry getting too excited about this. It's probably a culture thing related to the relatively low ratio of flying hours versus service hours, so as long as spares are on hand to get the service complete, the question of a timely detailed analysis isn't too much of an issue. The Xilinx case you mention is one where I guess a systematic problem was being observed - a manifestation of a process problem - and I can see the benefits here, but for random failures, a detailed diagnostic is really only going to confirm /what/ failed, not /why/ it failed, and I think I can wait until a board gets back to a repair facility for that, and it stops being an SJTAG method at that point.

5. Schedule next meetings:

Wednesday, May 28, 2008, 8:15am EDT

6. Any other business

7. Review new action items

8. Adjourned at 9:16am EDT

(moved by Tim, second by Heiko)

I want to thank Heiko for his assistance in writing these minutes.

Respectfully submitted,