Minutes of Weekly Meeting, 2009-05-18
Meeting called to order at 10:34 AM EDT
1. Roll Call
Eric Cormack
Brad Van Treuren
Ian McIntosh
Brian Erickson
Tim Pender
Carl Walker
Peter Horwood
Excused:
Heiko Ehrenberg
Patrick Au
2. Review and approve previous minutes:
5/11/2009 minutes:
- Draft circulated on 11th May:
- No corrections noted.
Eric moved to approve, Brad seconded, no objections.
3. Review old action items
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the ATCA
specification if it is not going to be managed globally (All)
- Adam review ATCA standard document for FRU's states
- Patrick contact Cadence for EDA support person.
- All to consider what data items are missing from Data Elements diagram
- All: do we feel SJTAG is requiring a new test language to obtain the
information needed for diagnostics or is STAPL/SVF sufficient?
see also Gunnar's presentation, in particular the new information he'd be
looking for in a test language
(http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
- Carl W/Andrew: Set up conference call to organise review of Vol. 3 - Ongoing
- Andrew: Make contact with VXI Consortium/Charles Greenberg. - Ongoing
- Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
- All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
- All: Consider structure/content of survey - Ongoing
- Harrison: Virtual system exploiting Configuration/Tuning/Instrumentation and
Root Cause Analysis/Failure Mode Analysis Use Cases. - Ongoing
- Brad: Virtual system exploiting POST and BIST Use Cases. - Ongoing.
- Ian: Virtual system exploiting Environmental Stress Test Use Cases. - Ongoing
- Brad/Ian - Prepare draft survey for review by group. - Ongoing
- All: Propose answer options for the questions shown as needing completion.
- Ongoing
- All: Assess which section each question should be placed into. - Ongoing
- Ian/Brad: Construct new question(s) for row 21 based on Brad's previous
graphic. - Ongoing.
- Ian/Brad: Construct new question(s) on gateway devices (linkers, bridges,
instrumentation gateways).
Ian has added one additional question (labelled 'X3') only - Ongoing.
4. Discussion Topics
- System Diagnostics (Continuation)
- [Ian] I sent out a list of the key points I picked up from the previous
discussions on this subject;
- [Ian] When we last discussed this subject, Adam was listing a few points that
he'd been considering; it's unfortunate that Adam isn't on the call as some
of those thoughts may have been worth expanding on.
- [Ian] However, there were a couple of points I picked up on which seem
to be worth discussing: The first was the differences in data that may need
to be preserved for offline diagnostics versus online diagnostics, the
second was the matter of some tests being dependant on the state of the system
or board.
- [Ian] On preservation of data, Brian had highlighted that what was
important was the anomalies in the response vectors, not the vector as a
whole.
- [Brian] Yes, you have an expected result vector, so if you just flag
vectors that are in error then you can reconstruct the full set of vectors.
- [Eric] Are we talking of something like a BIST were a result is fed into
some sort of compression?
- [Brad] The problem with compression is that it loses data. There's a
tradeoff though; when you get a catastrophic failure you don't want to hold
all the fault data that you would traditionally collect. Your diagnostic
system needs to have adaptability to cope with differing classes of fault.
For example, if we get more than 10 bits failing in a vector, we just save
the vector as it indicates a fault where we likely need to change an FRU.
- [Ian] In effect you've declared a threshold for a critical failure?
- [Brad] It's a programmable value through the TFCL code, but we've honed
it over the years and 10 bits seems to be about right for saying we can't
recover or work around the faults.
- [Eric] Is there something in the redundancy mechanisms we can make use
of here?
- [Brad] There's also a time factor to consider. One fail per month may
not be something we'd say indicated that a replacement was needed.
- [Ian] Mentioning fault tolerance reminds me of something else, although
it's more related to functional test: Sometimes redundancy or error correcting
features can mask faults, and you need to have a means to turn those features
off or bypass them for test. JTAG may have an advantage in sometimes being
able to tunnel behind those features. Some designers don't appreciate that
"robust" and "testable" aren't the same thing.
- [Brad] Getting back to Brian's point, we can note the minimum set of
data: We need to know the cell or bit position, that's about all since the
state obviously has to be the opposite of what's expected.
- [Peter] You need to be thinking about portability. If you produce test
vectors on one set of tooling, and try to diagnose on a different tool, then
you need to know about the whole set of vectors.
- [Peter] We did this at EBTW back in 2006 with Asset Intertech - you need to be
able to share the test database. The question is, will people give that out?
- [Brad] For diagnostics there are really two things: There's the Test
Controller portion and then there's the Test Data Management. They are two
different things
- [Peter] OK, I see you're trying to separate those.
- [Brad] I have my own embedded Test Controller and I want to apply tests
through my proprietary interface, even if they're being supplied externally.
- [Brad] This is where there's a "big win" for SJTAG. What has been
raised goes back to when we introduced the graphic for the data in a system:
What data needs to be known to describe a system? How do we represent a test?
- [Peter] You're describing a return format. After that, how it gets back
into the ATPG for offline analysis is a matter for the user to manage.
- [Brad] It's not as simple as that as there's a context; a dynamic
associated with time.
- [Peter] Yes, if you have multiple controllers, then they will need
synchronisation.
- [Brad] This could become more of an issue with dot 7. Even in P1687 this
arose when trying to coordinate two instruments, even within a single device,
because of the changing context. Need to sync with time. How do we get
synchronisation? We know that problem domain is coming.
- [Brad] The context of why the vector is being applied is what is missing
with today's vector languages. A software emulation vector has specific
targets to a specific processor register that is very different from an
interconnect test vector. The handling of each vector is different depending
on the context of how it is used. This is why I am partitioning things into
the Test Controller space and the Test Manager space. The Test Controller
is being told to scan a set of bits through a sequence of TAP states as part
of the vector description. However, there are particular things a Test
Controller needs to know when dealing with these vectors. Should it preserve
the response data or toss it away? The test controller needs to know what
bits are important in the scan, both from a driving perspective (TDI) and
the observing perspective (TDO). This is why our languages have the expected
value vectors and the mask vectors to try to represent this. But the Test
Controller is able to scan out a portion of an overall scan vector from the
perspective of the Test Manager and not understand that the current scan is
just a segment of an overall data transfer from the UUT. So we need to begin
to understand how we can reveal the information to the Test Controller as
to the scan context it must understand to know what to do with the data being
presented. This is part of the protocol that is missing from the hardware
1149.1 specification that is vital to the success of useful and efficient
scan operations. This gets worse when you have to deal with a dynamically
changing scan chain topology inside of a device where a P1687 gateway just
dynamically changed the length of the TDR in that device and now different
bits are important for TDI and TDO. So if you have to ping pong between two
different instrument chain configurations because the length of each
instrument TDR is excessive, you end up juggling significantly different
contexts of the vectors in an ordered time sequence to get the real job done.
- [Brad] Ian, you wanted to classify tests that were dependant on system
state?
- [Ian] Yes.
- [Brad] Some operations don't really have an expected return vector, it
may depend on something several vector earlier. Also there are aspects of
recovery operations to bring a board back to life after test.
- [Brad] Checking the scan path should be nonintrusive
so it ought be possible at any time. An interconnect test is intrusive, so
the board has to be in an offline state. After the test, you have to restart
the board, maybe by a reset or power cycle, as the testing will also have
stimulated the cores.
- [Brad] Inspecting a board state using SAMPLE can be done at any time.
- [Ian] Some other things that report flags like some BERR test or that
utilise continuous BIST features within devices are really just a further
extension of the SAMPLE case.
- [Brad] Sometimes, in a case where there's an Administration Block
managing the board that reports back to the System Administration Unit the
condition or state of that board, sampling condition signals on a board from
a multi-drop interface can show that the board is operational even if the
Administration Unit doesn't recognise it. This might be useful in allowing
a circuit to limp through its service until a replacement could be installed.
- [Brad] I actually had a case about 10 years ago where a prototype system
was not responding through the administration links, but the functional
connections seemed to be operational. My boundary-scan sampling program was
able to validate the functional circuit did indeed have the proper hardware
state on its status signals to prove the circuit was usable.
The admin module was redesigned to correct its design defect, but the
boundary-scan test was able to be used so development of the hardware and
software on the system could continue during the redesign.
- [Brad] The issue with Fault Injection is that you know it's an invasive
test, so you need safeguards; usually wouldn't include it in released
software. But it usually needs to run with the system in a functional mode.
- [Ian] I guess I was falling into the trap of thinking of it as an offline
operation, but clearly you need to be testing against the functional behaviour
of the system.
- [Brad] We have to be careful not to begin categorizing
tests based solely on whether the test is an intrusive test or a non-intrusive
test. Fault Injection is one good example of where an intrusive test must
actually run on an active circuit. We need to look at the context of each
test and what factors are affecting its application and response. There are
constraints that must be applied. There are timing considerations regarding
the order in which data is presented and observed. There are different bits
that have importance on update and others that are important on capture. There
are specific states of an application's functional mode that certain tests
may be applied on (e.g., off-line, power-up). There are also specific states
of a functional mode where tests cannot be applied. For example, a low power
state where part of a circuit is sleeping to save power prohibits portions
of the scan chain from operating. Thus, there needs to be context awareness
of design constraints on the tests and the chain topology to know if a
particular vector pattern can really be applied to a circuit or not in a
system based on the current configuration and state of the system.
- [Brad] We need to think about managing changes to the topology between
vectors. There's more to it than "there's a vector and it's of length X".
- [Ian] And maybe that blends us into the next topic.
- Select new subject from Priority Objectives in 2008 Survey
- [Ian] The next two biggest items from last years survey were "Common Test
Processes" and "Gateway Definition". The latter feels like something we
could discuss: Gateways for board access, scan path selectors/linkers,
instrumentation gateways - is that the range of things here?
- [Brad] Or are they even in our scope at all.
- [Ian] That's what we need to explore. At this point, I'm trying to get
some direction for the survey on this subject, so we can see if other users
feel this is part of our remit.
- [Brad] There are two approaches I've seen with these devices on how you
represent or manage a gateway.
- [Brad] One is to supply maybe XML that defines attributes of the gateway,
while others may have some sort of interface description, but you apply it
is left to the user.
- [Brad] Do users need to know the protocol behind making the connection
through the gateway?
- [Ian] I think that may come down to the sophistication of the user.
Many people just want the tooling to do all the work for them based on their
CAD data. I think, Brad, you've hand crafted a lot more for your embedded
applications?
- [Brad] Mainly because the tooling didn't really exist!
- [Ian] Yes. Without trying to sound too hard on the tool vendors, board
level applications are handled pretty well by ATPGs but some system
connectivity issues aren't handled very well.
- [Brian] Tooling has gotten a lot better in that respect. Maybe 90% of
customers just want to push a button a get their tests, but a few want to
have bit-level control. We need to service those differing levels of user
sophistication.
- [Tim] Some things I haven't really seen addressed up to now: You don't
want to interrupt a system; you need to know which bits of the circuit are
busy. You need to know the last state of the chain, whenever you switch
chains. Do you manage that in hardware or software?
- [Brad] Preconditioning is an extremely important piece. Do we need
introspection; report back state, condition?
- [Brian] Maybe need separate status registers for the system.
- [Ian] JTAG is OK for controlling the states of things, but it's not so
good at reporting states back.
- [Brad] It's not really in the protocol.
- [Brian] It needs to be managed at a higher level. But is this another
White Paper or Design Guide topic or is part of an SJTAG standard?
- [Brad] It needs to be captured in the use cases so we don't preclude
these things
- [Brian] But we also don't want to promote any bad practices.
- [Ian] We'll leave it there for now; there are a few other things I need
to cover today.
- 2009 Survey
- May Newsletter
- [Ian] Since we don't have a meeting next week, how should we handle
approval of the newsletter this month?
- [Eric] Can't we do this as an email vote?
- [Ian] Yes, if we agree that's appropriate.
- [Brad] The usual process is for you to present a draft, the group makes
suggestions for revisions and then votes on the revised draft.
- [Ian] To make that work I need to avoid expecting anyone to be able
to look at the draft over the holiday. So I guess I need to get the first
draft out before the end of this week, then get comments by middle of the
following week. I can usually turn revisions round pretty quickly.
- [Brad] That sounds like it'd work.
- [Ian] OK, so I'll issue draft by 22nd. I'll expect comments in by 27th and
I'll send out redraft that evening. Then I'll take email votes up until
29th. {ACTION}
- {Brad moved to approve the proposed schedule, seconded by Brian, no
objections}
5. Schedule next meeting
Schedule for June 2009:
Monday June 1, 2009, 10:30 AM EDT
Monday June 8, 2009, 10:30 AM EDT
Monday June 15, 2009, 10:30 AM EDT
Monday June 22, 2009, 10:30 AM EDT - Brian will be absent
Monday June 29, 2009, 10:30 AM EDT
6. Any other business
None.
7. Review new action items
- Ian: Circulate draft May Newsletter to group by May 22nd.
8. Adjourn
Eric moved to adjourn at 11:34 AM EDT, seconded by Brad.
Respectfully submitted,
Ian McIntosh