Minutes of Weekly Meeting, 2009-05-18

Meeting called to order at 10:34 AM EDT

1. Roll Call

Eric Cormack
Brad Van Treuren
Ian McIntosh
Brian Erickson
Tim Pender
Carl Walker
Peter Horwood

Excused:
Heiko Ehrenberg
Patrick Au

2. Review and approve previous minutes:

5/11/2009 minutes:

  • Draft circulated on 11th May:
  • No corrections noted.

Eric moved to approve, Brad seconded, no objections.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • Establish whether TRST needs to be addressed as requirements in the ATCA specification if it is not going to be managed globally (All)
  • Adam review ATCA standard document for FRU's states
  • Patrick contact Cadence for EDA support person.
  • All to consider what data items are missing from Data Elements diagram
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient?
    see also Gunnar's presentation, in particular the new information he'd be looking for in a test language
    (http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
  • Carl W/Andrew: Set up conference call to organise review of Vol. 3 - Ongoing
  • Andrew: Make contact with VXI Consortium/Charles Greenberg. - Ongoing
  • Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
  • All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
  • All: Consider structure/content of survey - Ongoing
  • Harrison: Virtual system exploiting Configuration/Tuning/Instrumentation and Root Cause Analysis/Failure Mode Analysis Use Cases. - Ongoing
  • Brad: Virtual system exploiting POST and BIST Use Cases. - Ongoing.
  • Ian: Virtual system exploiting Environmental Stress Test Use Cases. - Ongoing
  • Brad/Ian - Prepare draft survey for review by group. - Ongoing
  • All: Propose answer options for the questions shown as needing completion. - Ongoing
  • All: Assess which section each question should be placed into. - Ongoing
  • Ian/Brad: Construct new question(s) for row 21 based on Brad's previous graphic. - Ongoing.
  • Ian/Brad: Construct new question(s) on gateway devices (linkers, bridges, instrumentation gateways).
    Ian has added one additional question (labelled 'X3') only - Ongoing.

4. Discussion Topics

    1. System Diagnostics (Continuation)
      • [Ian] I sent out a list of the key points I picked up from the previous discussions on this subject;
      • [Ian] When we last discussed this subject, Adam was listing a few points that he'd been considering; it's unfortunate that Adam isn't on the call as some of those thoughts may have been worth expanding on.
      • [Ian] However, there were a couple of points I picked up on which seem to be worth discussing: The first was the differences in data that may need to be preserved for offline diagnostics versus online diagnostics, the second was the matter of some tests being dependant on the state of the system or board.
      • [Ian] On preservation of data, Brian had highlighted that what was important was the anomalies in the response vectors, not the vector as a whole.
      • [Brian] Yes, you have an expected result vector, so if you just flag vectors that are in error then you can reconstruct the full set of vectors.
      • [Eric] Are we talking of something like a BIST were a result is fed into some sort of compression?
      • [Brad] The problem with compression is that it loses data. There's a tradeoff though; when you get a catastrophic failure you don't want to hold all the fault data that you would traditionally collect. Your diagnostic system needs to have adaptability to cope with differing classes of fault. For example, if we get more than 10 bits failing in a vector, we just save the vector as it indicates a fault where we likely need to change an FRU.
      • [Ian] In effect you've declared a threshold for a critical failure?
      • [Brad] It's a programmable value through the TFCL code, but we've honed it over the years and 10 bits seems to be about right for saying we can't recover or work around the faults.
      • [Eric] Is there something in the redundancy mechanisms we can make use of here?
      • [Brad] There's also a time factor to consider. One fail per month may not be something we'd say indicated that a replacement was needed.
      • [Ian] Mentioning fault tolerance reminds me of something else, although it's more related to functional test: Sometimes redundancy or error correcting features can mask faults, and you need to have a means to turn those features off or bypass them for test. JTAG may have an advantage in sometimes being able to tunnel behind those features. Some designers don't appreciate that "robust" and "testable" aren't the same thing.
      • [Brad] Getting back to Brian's point, we can note the minimum set of data: We need to know the cell or bit position, that's about all since the state obviously has to be the opposite of what's expected.
      • [Peter] You need to be thinking about portability. If you produce test vectors on one set of tooling, and try to diagnose on a different tool, then you need to know about the whole set of vectors.
      • [Peter] We did this at EBTW back in 2006 with Asset Intertech - you need to be able to share the test database. The question is, will people give that out?
      • [Brad] For diagnostics there are really two things: There's the Test Controller portion and then there's the Test Data Management. They are two different things
      • [Peter] OK, I see you're trying to separate those.
      • [Brad] I have my own embedded Test Controller and I want to apply tests through my proprietary interface, even if they're being supplied externally.
      • [Brad] This is where there's a "big win" for SJTAG. What has been raised goes back to when we introduced the graphic for the data in a system: What data needs to be known to describe a system? How do we represent a test?
      • [Peter] You're describing a return format. After that, how it gets back into the ATPG for offline analysis is a matter for the user to manage.
      • [Brad] It's not as simple as that as there's a context; a dynamic associated with time.
      • [Peter] Yes, if you have multiple controllers, then they will need synchronisation.
      • [Brad] This could become more of an issue with dot 7. Even in P1687 this arose when trying to coordinate two instruments, even within a single device, because of the changing context. Need to sync with time. How do we get synchronisation? We know that problem domain is coming.
      • [Brad] The context of why the vector is being applied is what is missing with today's vector languages. A software emulation vector has specific targets to a specific processor register that is very different from an interconnect test vector. The handling of each vector is different depending on the context of how it is used. This is why I am partitioning things into the Test Controller space and the Test Manager space. The Test Controller is being told to scan a set of bits through a sequence of TAP states as part of the vector description. However, there are particular things a Test Controller needs to know when dealing with these vectors. Should it preserve the response data or toss it away? The test controller needs to know what bits are important in the scan, both from a driving perspective (TDI) and the observing perspective (TDO). This is why our languages have the expected value vectors and the mask vectors to try to represent this. But the Test Controller is able to scan out a portion of an overall scan vector from the perspective of the Test Manager and not understand that the current scan is just a segment of an overall data transfer from the UUT. So we need to begin to understand how we can reveal the information to the Test Controller as to the scan context it must understand to know what to do with the data being presented. This is part of the protocol that is missing from the hardware 1149.1 specification that is vital to the success of useful and efficient scan operations. This gets worse when you have to deal with a dynamically changing scan chain topology inside of a device where a P1687 gateway just dynamically changed the length of the TDR in that device and now different bits are important for TDI and TDO. So if you have to ping pong between two different instrument chain configurations because the length of each instrument TDR is excessive, you end up juggling significantly different contexts of the vectors in an ordered time sequence to get the real job done.
         
      • [Brad] Ian, you wanted to classify tests that were dependant on system state?
      • [Ian] Yes.
      • [Brad] Some operations don't really have an expected return vector, it may depend on something several vector earlier. Also there are aspects of recovery operations to bring a board back to life after test.
      • [Brad] Checking the scan path should be nonintrusive so it ought be possible at any time. An interconnect test is intrusive, so the board has to be in an offline state. After the test, you have to restart the board, maybe by a reset or power cycle, as the testing will also have stimulated the cores.
      • [Brad] Inspecting a board state using SAMPLE can be done at any time.
      • [Ian] Some other things that report flags like some BERR test or that utilise continuous BIST features within devices are really just a further extension of the SAMPLE case.
      • [Brad] Sometimes, in a case where there's an Administration Block managing the board that reports back to the System Administration Unit the condition or state of that board, sampling condition signals on a board from a multi-drop interface can show that the board is operational even if the Administration Unit doesn't recognise it. This might be useful in allowing a circuit to limp through its service until a replacement could be installed.
      • [Brad] I actually had a case about 10 years ago where a prototype system was not responding through the administration links, but the functional connections seemed to be operational. My boundary-scan sampling program was able to validate the functional circuit did indeed have the proper hardware state on its status signals to prove the circuit was usable.
        The admin module was redesigned to correct its design defect, but the boundary-scan test was able to be used so development of the hardware and software on the system could continue during the redesign.
      • [Brad] The issue with Fault Injection is that you know it's an invasive test, so you need safeguards; usually wouldn't include it in released software. But it usually needs to run with the system in a functional mode.
      • [Ian] I guess I was falling into the trap of thinking of it as an offline operation, but clearly you need to be testing against the functional behaviour of the system.
      • [Brad] We have to be careful not to begin categorizing tests based solely on whether the test is an intrusive test or a non-intrusive test. Fault Injection is one good example of where an intrusive test must actually run on an active circuit. We need to look at the context of each test and what factors are affecting its application and response. There are constraints that must be applied. There are timing considerations regarding the order in which data is presented and observed. There are different bits that have importance on update and others that are important on capture. There are specific states of an application's functional mode that certain tests may be applied on (e.g., off-line, power-up). There are also specific states of a functional mode where tests cannot be applied. For example, a low power state where part of a circuit is sleeping to save power prohibits portions of the scan chain from operating. Thus, there needs to be context awareness of design constraints on the tests and the chain topology to know if a particular vector pattern can really be applied to a circuit or not in a system based on the current configuration and state of the system.
      • [Brad] We need to think about managing changes to the topology between vectors. There's more to it than "there's a vector and it's of length X".
      • [Ian] And maybe that blends us into the next topic.

 

    1. Select new subject from Priority Objectives in 2008 Survey
      • [Ian] The next two biggest items from last years survey were "Common Test Processes" and "Gateway Definition". The latter feels like something we could discuss: Gateways for board access, scan path selectors/linkers, instrumentation gateways - is that the range of things here?
      • [Brad] Or are they even in our scope at all.
      • [Ian] That's what we need to explore. At this point, I'm trying to get some direction for the survey on this subject, so we can see if other users feel this is part of our remit.
      • [Brad] There are two approaches I've seen with these devices on how you represent or manage a gateway.
      • [Brad] One is to supply maybe XML that defines attributes of the gateway, while others may have some sort of interface description, but you apply it is left to the user.
      • [Brad] Do users need to know the protocol behind making the connection through the gateway?
      • [Ian] I think that may come down to the sophistication of the user. Many people just want the tooling to do all the work for them based on their CAD data. I think, Brad, you've hand crafted a lot more for your embedded applications?
      • [Brad] Mainly because the tooling didn't really exist!
      • [Ian] Yes. Without trying to sound too hard on the tool vendors, board level applications are handled pretty well by ATPGs but some system connectivity issues aren't handled very well.
      • [Brian] Tooling has gotten a lot better in that respect. Maybe 90% of customers just want to push a button a get their tests, but a few want to have bit-level control. We need to service those differing levels of user sophistication.
      • [Tim] Some things I haven't really seen addressed up to now: You don't want to interrupt a system; you need to know which bits of the circuit are busy. You need to know the last state of the chain, whenever you switch chains. Do you manage that in hardware or software?
      • [Brad] Preconditioning is an extremely important piece. Do we need introspection; report back state, condition?
      • [Brian] Maybe need separate status registers for the system.
      • [Ian] JTAG is OK for controlling the states of things, but it's not so good at reporting states back.
      • [Brad] It's not really in the protocol.
      • [Brian] It needs to be managed at a higher level. But is this another White Paper or Design Guide topic or is part of an SJTAG standard?
      • [Brad] It needs to be captured in the use cases so we don't preclude these things
      • [Brian] But we also don't want to promote any bad practices.
      • [Ian] We'll leave it there for now; there are a few other things I need to cover today.

 

    1. 2009 Survey

 

  1. May Newsletter
    • [Ian] Since we don't have a meeting next week, how should we handle approval of the newsletter this month?
    • [Eric] Can't we do this as an email vote?
    • [Ian] Yes, if we agree that's appropriate.
    • [Brad] The usual process is for you to present a draft, the group makes suggestions for revisions and then votes on the revised draft.
    • [Ian] To make that work I need to avoid expecting anyone to be able to look at the draft over the holiday. So I guess I need to get the first draft out before the end of this week, then get comments by middle of the following week. I can usually turn revisions round pretty quickly.
    • [Brad] That sounds like it'd work.
    • [Ian] OK, so I'll issue draft by 22nd. I'll expect comments in by 27th and I'll send out redraft that evening. Then I'll take email votes up until 29th. {ACTION}
    • {Brad moved to approve the proposed schedule, seconded by Brian, no objections}

5. Schedule next meeting

Schedule for June 2009:
Monday June 1, 2009, 10:30 AM EDT
Monday June 8, 2009, 10:30 AM EDT
Monday June 15, 2009, 10:30 AM EDT
Monday June 22, 2009, 10:30 AM EDT - Brian will be absent
Monday June 29, 2009, 10:30 AM EDT

6. Any other business

None.

7. Review new action items

  • Ian: Circulate draft May Newsletter to group by May 22nd.

8. Adjourn

Eric moved to adjourn at 11:34 AM EDT, seconded by Brad.

Respectfully submitted,
Ian McIntosh