Minutes of Weekly Meeting, 2009-04-27

Meeting called to order at 10:35 AM EDT

1. Roll Call

Eric Cormack
Ian McIntosh
Brian Erickson
Carl Walker
Brad Van Treuren
Tim Pender
Adam Ley
Peter Horwood (joined 10:36)
Heiko Ehrenberg (joined 10:38)

Excused:
Patrick Au

2. Review and approve previous minutes:

4/20/2009 minutes:

  • Updated draft circulated on 22nd April:
  • No further corrections noted.

Eric moved to approve, Brad seconded, no objections.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Adam review ATCA standard document for FRU's states
  • Patrick contact Cadence for EDA support person.
  • All to consider what data items are missing from Data Elements diagram
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient?
    see also Gunnar's presentation, in particular the new information he'd be looking for in a test language
    (http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
  • Carl W/Andrew: Set up conference call to organise review of Vol. 3 - Ongoing
  • Andrew: Make contact with VXI Consortium/Charles Greenberg. - Ongoing
  • Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
  • All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
  • All: Consider structure/content of survey - Ongoing
  • Harrison: Virtual system exploiting Configuration/Tuning/Instrumentation and Root Cause Analysis/Failure Mode Analysis Use Cases. - Ongoing
  • Brad: Virtual system exploiting POST and BIST Use Cases. - Ongoing.
  • Ian: Virtual system exploiting Environmental Stress Test Use Cases. - Ongoing
  • Brad/Ian - Prepare draft survey for review by group. - Discussed in AOB.

4. Discussion Topics

    1. Group Focus - System Diagnostics (continued)
      • [Ian] Last week we still felt there was a lot to discuss on this subject. Adam indicated that he may have some things to raise that didn't fit a proxy response to the last minutes.
      • [Adam] That's correct.
      • [Ian] Eric, you also brought up multilayered diagnostics; can we start there?
      • [Eric] I've been thinking about some of the issues we've discussed recently with regard to remote sites, etc.: Things like redundancy in systems, and Brad mentioned systems with self-repair capability. I've also been thinking about banking systems with triple-redundancy.
      • [Eric] Also, there's the question of how deep is diagnostics expected to go? We're looking at systems, but do we also go to board diagnostics or device level in SJTAG. Brad, do you have any comment?
      • [Brad] Not right now
      • [Ian] I feel that 'system' is a bit amorphous. What I call a system may not be a system to someone else and vice versa. But if you're looking at it abstractly, it probably ought not to matter whether you're considering a device, board or system.
      • [Eric] We have System-on-Chip, so surely we have to be at device level to deal with those.
      • [Ian] You'll get different perspectives. Generally, once you've built a system up, the field service guy probably only wants to know which FRU to swap when it fails. But the service depot might like more detail from the point of failure. Also you might want more detail if you're running the tests as part of manufacturing. I think you'll end up with tradeoffs.
      • [Brad] Yeah, there is a tradeoff; that's where I was trying to go. With more modularisation in systems, and availability demands you sometimes can't wait until an FRU is replaced.
      • [Ian] The question might be what granularity do you need? Maybe it's enough to identify a failed functional block that you can work round.
      • [Brad] We're seeing highly parallel systems to achieve performance.
      • [Brad] Another aspect is that BSCAN is just one of many test and maintenance capabilities in a system, so we can't start mandating how SJTAG fits into all of these.
      • [Ian] That's true. We're not using embedded JTAG much just now, but we did so things back in the mid-90s, where JTAG wasn't used for itself, but as an adjunct to functional BIST. It was used to set signals or mode devices to support the functional tests, rather than run interconnect tests.
      • [Brad] Yeah, we need to get across that these other uses for JTAG exist.
      • [Eric] Don't you think that's already the case?
      • [Ian] No. I've come across many people who think JTAG is only for device programming.
      • [Brad] And I've seen some who think it's only for emulation. Do the tool vendors see this too?
      • [Brian] Yes, there's a perception for some people that they can only do interconnect test or for others that they only do emulation and programming.
      • [Eric] I'm surprised.
      • [Ian] I'm not. I think it's because of a lack of a DFT perspective in some designers; no training in BSCAN. They get introduced to BSCAN by a device vendor using JTAG to program their parts and the designer assumes that is all it can be used for.
      • [Tim] Going back to low-level diagnostics: Getting diagnostic data doesn't mean that the fault report gets attached to the FRU. Or should the data be stored on the board?
      • [Brad] Ideally we want to get that data onto the UUT, but if you can't do that then you need some way to make that data downloadable.
      • [Ian] We produced an ASIC years ago to do that kind of job, which we fit to each replaceable item. It stores information like serial number and build state, but the tests can write time-tagged fault messages to nonvolatile storage that can be read back at a later time.
      • [Brad] In the ATCA world, the Board Management Controller on each board does a similar job, although that really used more by the functional tests.
      • [Ian] The same applies in our case.
      • [Brad] The diagnostics for BSCAN tend to be much larger than for functional test.
      • [Brad] Tim is right, why test for data if you're not preserving it? There has to be a conscious decision there.
      • [Ian] I can see the value in have stored fault data, having used it, but it can be difficult to convince others.
      • [Brad] That's part of the message on Return on Investment that needs to be put across.
      • [Ian] But maybe requiring storage of fault data is more a DFT recommendation than being within the SJTAG domain?
      • [Brad] I think it's the kind of thing that's worth having in the White Paper, although maybe not something that goes into the standard.
      • [Brad] We've had some cases where we've been able to leverage features of some, mainly programmable, devices to support test; Repurposing the device after a power-up test so there's no extra hardware needed.
      • [Ian] Yes, if you do start adding hardware to support test, then the Reliability people will complain that the increased component count is reducing the MTBF.
      • [Brad] True.
      • [Tim] But at least you'll know it's failed.
      • [Carl] It's the cost of doing business; you need to invest or your ROI is zero. Some percentage of your system has to be dedicated to availability
      • [Brad] We can come up with at least a list for Boundary Scan diagnostics, e.g. interconnect test results. Test data registers would be a low level element to consider.
      • [Ian] In the case of OEM boards where the vendor doesn't supply net data then you need at least a description of the TDR, so you can interpret the results.
      • [Brad] With dot 1 you have fixed length TDRs. Once you get into 1687 you get variable length TDRs. We could argue that an assembly is represented by the TDR, but that length will vary if the board chains get reconfigured.
      • [Ian] I think that's just part of dealing with systems, so SJTAG will need to cope with that.
      • [Brad] And I don't think we'll get away with expecting board vendors to provide netlists.
      • [Ian] In that case, it may not matter about internal behaviour of the board. We just need to know how to control it's boundary and run any built-in tests.
      • [Brad] Yes delegation of the internal self-tests. Maybe in SJTAG we should settle on maximum delegation. We've got to deal with connectivity at the system level, and write tests that avoid conflicts.
      • [Ian] If you're not getting board design details, the details on possible conflicts needs to be part of the board's data pack.
      • [Brad] You could get the situation where no drivers are active.
      • [Brad] Thinking about interfaces, where do we store data on a fail? If you have multiple sites reporting fails on a pin then there's a good chance that's where the fault is, but in other cases you may not have enough to narrow it down.
      • [Ian] It's important to indict the right item, but sometimes you can't get to an ambiguity group of one.
      • [Brad] Is it a bent pin, or has the circuit genuinely failed, or gone into some weird state that it can't recover from. Re-seat and re-power and the fault goes away.
      • [Tim] Counting how many times a reset is needed may be a way indicating if there's a problem.
      • [Brad] Yeah, again these are things that could go in the White Paper.
      • [Adam] Some things I wanted to follow up on from last week; how Use Cases affect the diagnostics. The main thing I wanted to develop was a more complete picture of the diagnostic process; there a lot of different dimensions to factor in.
      • [Adam] Firstly, there's always a context for diagnostics, e.g. certain tests that can't be run before others have been run and the results are known; infrastructure is the most obvious example.
      • [Adam] The intent of the test flavours the diagnostic you expect. Then what I'll call 'focal points': Are you really just interested in FRUs or is it some other class?
      • [Adam] In any diagnostic process there is an interaction between the intent, the stimulus, the expected response and the actual response. If you don't know the intent and the stimulus then the result is just 'bits'.
      • [Adam] Diagnostics involve a sort of reverse abstraction. You throw away concrete data in favour of an abstraction such as 'possible short at this point'. Otherwise, we preserve the data for offline diagnostics to the highest level.
      • [Brad] We could reduce the storage requirement, by not holding all the data.
      • [Adam] You can always find ways to make information storage as efficient as possible.
      • [Brad] I mean that you can compress in ways that will allow you to reconstruct the original data set.
      • [Adam] You can record fail data rather than diagnostics. But as a system OEM you'd expect to have access to all the available diagnostic data.
      • [Adam] I wanted to form a few slides around these things, but time hasn't allowed that this week.
      • [Brad] Then there is configuration or state dependencies: If part of my system is handling a call then I need to wait or it to go offline before testing it.
      • [Adam] Depends on what kind of testing; some tests can be run online.
      • [Brad] Yes, Bit Error Rate for example.
      • [Adam] Or hardware based assertions, if you have those types of hook in your devices. These background checkers are running all the time, so you only need to test the flags that get thrown.
      • [Ian] I think there's still a bit to discuss on this, so I think we'll need to come back to this again.

 

  1. April Newsletter
    • [Ian] I sent out a slightly revised draft over the weekend.
    • [Ian] I didn't receive any other suggestions for the Newsletter. Since I spent a little time last week on refreshing the look of the website, I added something on that.
    • [Heiko] It looks good, Ian.
    • [Ian] The approval of last week's minutes will go in, but is there anything else?
    • {Silence}
    • [Ian] In that case, I'll take a motion to approve the Newsletter for release.
    • Peter moved to approve, seconded by Brad, no objections.
    • [Ian] OK, I'll get that issued by Thursday.

5. Schedule next meeting

Schedule for May 2009:
Monday May 4, 2009, 10:30 AM EDT
Monday May 11, 2009, 10:30 AM EDT
Monday May 18, 2009, 10:30 AM EDT
  • [Ian] Does anyone have a problem with the 4th May for the next meeting?
  • [Eric] Isn't it a Bank Holiday here?
  • [Ian] Yes, it's the May Day holiday, is that causing a problem?
  • [Eric] Not particularly for me, I'm just thinking about you guys.
  • [Ian] I'm OK. Anyone taking the holiday?
  • {Silence}
  • [Ian] Seems like 4th May is good.

6. Any other business

  • [Ian] I've put a post on the forums, to show the position Brad and I have got to with the survey (http://forums.sjtag.org/viewtopic.php?f=32&t=83). The post includes a link to the latest version of the spreadsheet (http://files.sjtag.org/wip/SurveyGrid_2009-04-27_imm.xls).
  • [Ian] Basically, everything is on the one sheet, but we probably want to sort these out into three blocks: The way I see the survey coming together is to have a general section to gather some information about the respondent, then split into either a technical or a managerial section.
  • [Ian] On the spreadsheet, I've put columns to mark which section you think the question belongs in: Brad pointed out that I've put in columns for 'General' and 'Common'. My idea was that 'Common' was for detailed questions that you might want to set to both groups and went beyond the scope of being general background information, but maybe it isn't a great idea.
  • [Ian] Brad, you also had some concerns about the question categories?
  • [Brad] The problem I was having on Friday was that some of the Questions overlap more than one category. In unsure whether subcategories are beneficial, as you might see the same subject pop up in different places. We will have to be careful about how we use categories.
  • [Ian] Something I was reminded of just before the meeting: When we prepared last year's survey we didn't start out with any categories. We simply pulled the questions into the order that seemed to work best, and the categories sort of suggested themselves afterwards. Maybe we drop the categories for now and see what we end with?
  • [Brad] I'm in favour of that.
  • [Ian] There are a couple of questions that still need some options added to them. I'd suggest a task for next week is for the team to propose some answer options for these two questions. {ACTION}
  • [Brad] The questions are the ones on rows 21 and 49.
  • [Ian] The second objective would be to look at each question and decide if it should be in the General, Technical or Managerial section of the survey.
  • [Brad] We haven't formally welcomed Brian to the group. I tried to do that at the end of the last meeting but I was on mute!
  • [Ian] Of course, welcome, Brian.
  • [Brian] The pleasure is mine, thank you for including me.

7. Review new action items

  • All: Propose answer options for the questions shown as needing completion.
  • All: Assess which section each question should be placed into.

Ian would prefer to receive feedback by close of business on Friday.

8. Adjourn

Peter moved to adjourn at 11:46 AM EDT, seconded by Brian.

Thanks to Heiko for additional notes.

Respectfully submitted,
Ian McIntosh