Minutes of Weekly Meeting, 2012-03-12

Meeting called to order: 11:06 AM EDT

1. Roll Call

Ian McIntosh
Carl Walker
Tim Pender
Harrison Miles
Brian Erickson
Patrick Au
Eric Cormack (joined 11:50)

Heiko Ehrenberg
Brad Van Treuren

2. Review and approve previous minutes:

03/05/2012 minutes:

  • Draft circulated on 03/05/2012.
  • No corrections noted at this time.
  • Insufficient attendees to vote on approval.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient? see also Gunnar's presentation, in particular the new information he'd be looking for in a test language
  • Ian to consider an appeal to SJTAG email newsletter recipients for participation from the automotive industry in the next newsletter.

4. Discussion Topics

  1. Discussion: "SJTAG in a nutshell - What I think SJTAG is all about"
    Concluding with Brad's thoughts, views from other members.
    • [Ian] Unfortunately Brad can't join us today, so I'll ask if anyone else has views on what they hope to get out of SJTAG.
    • [Patrick] Well I hope that eventually it becomes a standard.
    • [Ian] I think we all hope for that.
    • [Carl] The importance of standardization can't be underplayed. In Cisco every business unit devises its own solution to using JTAG in the system.
    • [Ian] I think that's probably what Brad sees too; he's mentioned that each business unit will have its own reasons why it uses JTAG, so there's probably a correlation to using different solutions.
    • [Carl] I'm sure we could find a set of common denominators in why we use JTAG but the ways of implementing it are varied.
    • [Ian] From our perspective, the big driver is programming the system to bring it alive.
    • [Carl] I guess you don't use preprogrammed parts? For us anything under 5000 units is a small run so we use preprogrammed parts.
    • [Ian] A big run for us is maybe a 100 radars in a year. We may sometimes only build two unit to a given build standard. We want to have common hardware on the shelf that we can build into a radar and then program it for a specific customer.
    • [Harrison] We have several customers who are in the same position. Although a lot of people think it's only the Department of Defense that have these issues, it isn't.
    • [Carl] I think it's fairly common everywhere. It shifts the burden towards engineering and away from manufacturing. Plug it into the test fixture and tell it what it is.
    • [Tim] There's also the lack of test points. You might have a big flip-chip that covers most of the board.
    • [Harrison] If you take speed out of the equation then device size is generating the need to go in this direction.
    • [Tim] There are high speed interfaces that we can't put test points on. The ICT fixtures don't tend to have the multiple chains needed so you might end up with relays in the test fixture. So you start looking at alternative methods to use in the fixture.
    • [Ian] Are you saying that ICT designs are naive?
    • [Tim] Well high speed interfaces can't be easily tested on the conventional pin tests that were used in the past.
    • [Carl] We have the same issue, and have to provision for loopbacks. There's often a symmetry on SERDES links that makes that possible.
    • [Ian] Yes, but we often find a lack of symmetry where there may be more Tx channels than Rx channels or vice versa on a given board. So to create the loopbacks you're back to adding relays in the fixture.
    • [Harrison] There's an increasing use of memory, and that's a lot of single ended signals. At least with SERDES you have a matched return signal so there's symmetry there.
    • [Carl] But SERDES can have asymmetry in terms of the numbers of Tx and Rx links.
    • [Harrison] With memory there are many signals with a common signal return.
    • [Ian] Even with simple differential interfaces like RS422 you can get the situation where one line goes open circuit but the data still gets through. I remember having to add pull-downs to a test fixture to force the link to fail when there was an open circuit.
    • [Carl] And as the device manufacturers get the AC coupled links to work more reliably then they can continue to work in the presence of faults.
    • [Ian] That's actually a general point: We try to make things robust and fault tolerant, but then you have to know during test whether something is working because it's fault-free or because it's fault-tolerant.
    • [Carl] It depends on the defect. I'll concentrate on SERDES. It has instrumentation but it varies in what you can get out of the receiver. You can run a sampling pattern but that impacts on throughput.
    • [Harrison] The other piece of it is if the thing comes up. You've got the tuning phase before the operational phase. One thing I'd like to know is how far did I have to turn the knob?
    • [Carl] It can be worse than that because often it's dynamic.
    • [Ian] We tend to have a lot of stuff that needs compensation over temperature ranges because we can fairly quickly go from -55°C to +50°C or above. The question becomes how you discriminate between 'normal' variation over the range and abnormal variation in a specific case.
    • [Harrison] Instrumentation may need to be able to report 'the junction temperature was X while the outside temperature was Y'.
    • [Carl] Getting back to the original question, what does SJTAG do for all this? It allows me to collect all that information over a common interface.
    • [Harrison] Not only a common interface, but also a common method.
    • [Ian] Despite the discussion here, I'd wonder if we'd rather test things like SERDES using firmware rather than JTAG.
    • [Harrison] P1687 instruments are going to have to have vectors, and SJTAG can take advantage of that.
    • [Harrison] There are layers as I've said before: You can accomodate the system design, but you can have primitives at the lower levels that don't need to be aware of anything about the system.
    • [Ian] I think in essence that's similar to Brad's 2005 paper on the JTAG Plug'n'Play: The board carried vectors to allow the system to test the board but that probably doesn't extend to how that board can help to test the system.
    • [Harrison] The P1687 ICL can be brought up to the system level - it's about the connectivity.
    • [Ian] So are we saying that at the system level we want to find out if the constituent boards are working rather than testing the system?
    • [Carl] It's a chicken and egg situation: Sometimes you can only find out if certain parts are working properly once they're in a system.
    • [Ian] We'd probably rely on System BIT once we get to that point. BIT is usually mandatory and runs at power up, runs during operation and runs on demand in various forms. So it's there anyway once we bring the system alive.
    • [Carl] It depends on the types of device. For device level fault finding on high speed links you've got to have Dot6 on JTAG. I may see a higher than normal error rate, but I may need to run it for hours before I know that there's a problem.
    • [Ian] Yes, we'd have the same with False Alarm Rate on the radar. You may need to run several sorties before you see a trend indicating a problem.
    • [Carl] We've not actually seen a lot of electrical problems, they're mostly down to handling.
    • [Harrison] ESD problems?
    • [Carl] No. In some weak designs in the early stages maybe. More it's manufacturing saying 'when we handle these, they break' - capacitors being knocked off, etc.
    • [Ian] We've had boards in the past that had heavy MIL38999 connectors fitted on flexis attached. The weight of the connector tended to damage the flexi unless you very careful. So after fitting into the box you wanted to check that everything was still connected up.
    • [Tim] The other thing is intermittent failures. During ICT they're only looking for a pass, so if it fails once and they retest and get a pass then it gets shipped. Then later on it fails again.
    • [Carl] And then it gets returned to test and is 'No Trouble Found'.
    • {Eric joined}
    • [Ian] We had an 'Aha moment' on testing a while back. We had a particular board that had an IC with a BIST mode, probably one the Hotlinks devices: During test some boards would fail the BIST but pass on retest. After more checking we'd find that these boards would pass 9 times out of 10, then we found that the BIST result signal was pulsing with a 10:1 mark to space. The fail was only signalled at the end of the BIST, but it was running in an asynchronous loop and the fault indication was cleared when the next test ran so it all depended on when we sampled whether or not we saw the the fault. So we had to add latching of the fault signal in the test fixture.
    • [Tim] We tend to latch a signal if it's ever passed and latch another signal if it's ever failed.
    • [Harrison] Instrumentation should help with that.
    • [Ian] Device instrumentation probably would, if it can be more intelligent about what it can tell you about faults.
    • [Carl] In ASICs I'd say yes, not so much for off-the-shelf parts.
    • [Ian] Well this was an off-the-shelf part and the BIST wasn't too helpful.
    • [Harrison] P1687 is just setting up and not helping yet. A lot of the current instrumentation is about testing the silicon floorplan. There's a hope that it'll help at the board level later. But there's maybe not the drive for the silicon people to make it useful for boards or systems.
    • [Carl] That's probably true in 90% of cases. It's not true where the customer has a lot of leverage, and says they need this.

5. Key Takeaway for today's meeting

  • [Ian] In testing at system level, it seems to be more important to be able to verify individual boards/devices than to test the system as a whole.
  • [Carl] That may just be our particular perspective.
  • [Ian] We have tried to get a broader consideration through the surveys.
  • [Harrison] You know Zoe Conroy? I have a slide deck from the iNEMI survey that has some interesting answers. It may be enlightening. Maybe we can even call Zoe in to a meeting?
  • [Ian] I can make a slot in next week's meeting for that.
  • [Tim] The situation where a fault can /only/ be detected in a system.
  • [Tim] It's good to be able to narrow down to the lowest possible level, also to reuse the board tests, but that maybe requires the gateways to be transparent.

6. Schedule next meeting

Next Meetings:
19th March - Note that UK/Europe will still be on GMT/CET
26th March

7. Any other business

  • [Ian] I should probably have mentioned this earlier: The 1149.1-2012 ballot group is being formed.
  • [Harrison] I think that maybe it closed yesterday?
  • [Ian] That's probably right - it was sometime around now.
  • [Tim] Do we know what's in the update.
  • [Ian] I know some, like the new Initialization stuff.
  • [Harrison] There's about half a dozen things identified in the section at the top of the Dot1 project page - http://grouper.ieee.org/groups/1149/1/

8. Review new action items


9. Adjourn

Patrick moved to adjourn at 12:04 PM EDT, seconded by Brian.

Respectfully submitted,
Ian McIntosh