Minutes of Weekly Meeting, 2009-11-16
Meeting called to order at 10:36 AM EST
1. Roll Call
Brad Van Treuren
Eric Cormack
Ian McIntosh
Carl Walker
Michele Portolan
Adam Ley
Tim Pender
Excused:
Patrick Au
2. Review and approve previous minutes:
11/09/2009 minutes:
- Updated draft circulated on 12th November:
- No corrections noted.
- Insufficient attendees present to approve these minutes.
3. Review old action items
- Adam proposed we cover the following at the next meeting:
- Establish consensus on goals and constraints
- What are we trying to achieve?
- What restrictions are we faced with?
- Establish whether TRST needs to be addressed as requirements in the ATCA
specification if it is not going to be managed globally (All)
- Adam review ATCA standard document for FRU's states
- All to consider what data items are missing from Data Elements diagram
- All: do we feel SJTAG is requiring a new test language to obtain the
information needed for diagnostics or is STAPL/SVF sufficient?
see also Gunnar's presentation, in particular the new information he'd be
looking for in a test language
(http://files.sjtag.org/Ericsson-Nov2006/STAPL-Ideas.pdf)
- Ian/Brad: Draft "straw man" Volume 4 for review - Ongoing
- All: Review "Role of Languages" in White Paper Volume 4 - Ongoing
- All: Review 'straw man' virtual systems and notes on forums:
http://forums.sjtag.org/viewtopic.php?f=29&t=109. - Ongoing
- Ian: Check for mailer problems and resend survey invitations. - COMPLETE
- [Ian] Once I checked the mailer, I found that I had set the message as having
a RSS newsfeed but with was no RSS feed associated. As a result, since there
was no news, the mailer decided it didn't need to send the message to anyone.
Once I removed the RSS flag, it worked OK.
4. Discussion Topics
- White Paper Review - Review of Virtual Systems
- [Ian] Brad added some proxy comments to last week's notes. In one he asked
if we could discuss a bit more of Adam's remark on defining requirements.
- [Brad] I was really trying to get some scope on the requirements space - it
can mean many different things.
- [Adam] As I recall, I was asked to comment on the breakdown of the
application model. I didn't say anything about the model itself, but was
asked to comment on where boundaries are drawn, whether these map onto real
world applications. I think we need to stay focussed on the requirements: If
we see where we need to allow one application to interface with another,
then that's useful. I'm saying you shouldn't put artificial boundaries in
place before the applications are defined.
- [Brad] I agree, decomposing applications into solutions is the wrong way for
us to go. What you say is warranted and makes sense - identify the need for
the boundary.
- [Adam] I think we're very much on the same page. It's very early to be
mapping these virtual systems onto real applications.
- [Ian] Then Peter commented that embedded solutions may be stand-alone
circuitry in a corner of the board or may be hosted in some existing asset
of the design.
- [Brad] For POST, we've done that in hardware, similar to how Firecron and
Intellitech have shown. It is a very dedicated set of hardware for that.
- [Ian] And that's pretty much the way we've elected to go. It may not be the
most logical of reasons, but a major factor was the desire to avoid any need
for 'software' which would be implied by using a processor: That demarcation
opens up a whole load of issues, for organizational and process reasons.
- [Brad] It's interesting you mention that, because one of our product groups
has delegated the BScan diagnostics software back the the firmware people;
it's software but written by the hardware guys.
- [Ian] I suspect that's where we'll eventually get to, too. We have similar
issues on using soft processor cores in FPGAs.
- [Brad] One other thing, while we have a level of POST we also have firmware
self-test, but that's not on-demand test. Do everything you can in boot,
then there are things you can test running concurrently with the software
loading, etc.
- [Ian] That's very similar to the PBIT-1 and PBIT-2 phases we have on some
products.
- [Brad] Then there are the Green initiatives, where we see a lot more blocks
powering up into low power states, that presents other issues over what can
be tested.
- [Ian] From a different angle, we're seeing similar requirements: To reduce
the load on an aircraft's auxiliary power unit, we may have to start in a
low power mode, but are still expected to report 'readiness'. It's hard to
confirm there are no faults if you haven't fully powered everything up.
- [Brad] There's also the thermal factor. We're trying to cram more equipment
into the confines of old facilities.
- [Carl] We have the same issue.
- [Brad] You have the same issues as we have; your systems may be distributed
amongst a number of boxes and you have to test the interconnects between
the boxes.
- [Ian] Often that's the hard part. They're not covered by the board JTAG, so
often it means in-process monitoring of the data. If you don't see data on
one link for a while you might start suspecting it has gone faulty.
- [Brad] We have piggy backed some BScan onto links like that. There was paper
on a distributed base station[1] where one part was remote by maybe 20km.
It's an extreme case of a distributed system.
- [Ian] Brad, can you provide a citation for that, so I can record it
correctly?
- [Brad] OK, I'll dig it out. {ACTION}
- [Brad] Can we map this back onto the virtual systems? How do we show
distribution? Do we need or want to? Are there other places we can have
distribution?
- [Ian] I think what you're suggesting is showing the hierarchical delegation
of the test control. How we look at this is that each board can run it's own
suite of tests. The LRU's BIT system only needs to know how to run the board
tests and collect the results via some API; it doesn't need to know what the
tests are. That scales up to a system comprising several LRUs. But I'm not
sure if trying to show that wouldn't just add confusion.
- [Brad] I was reminded of Gunnar's paper[2] where the BScan Test Manager
resides on the FRU and has interfaces to a higher level that can apply tests
on demand.
- [Brad] This can get you into the philosophical debate of whether you have a
'push model' or a 'pull model'
- [Brad] Some of our products have self-test, but also have multidrop
capability.
- [Ian] I think that's essential: All of these embedded solutions have some
dependency on a degree of functionality being present, so you need a 'back
door' if the unit is apparently dead. It's one of the other things with
distributed systems. The fault reporting has to be communicated over a
mission bus. If that fails, you might not be able tell if it's the link or
the remote unit that's gone down.
- [Brad] Is this requiring that the link needs to be of some highly reliable
type?
- [Ian] Possibly, but maybe we need supplementary signals outside of JTAG. The
sort of thing we'll do is provide some critical discretes to show that power
is OK or that the main controller is alive. That can help you figure out
what is wrong if the BIT tests return nothing. We've even had a big debate
about whether or not the boxes should have 'power on' LEDs. I can see an
argument based on electrical noise but not on cost.
- [Brad] In the telecomm industry, certain LEDs are required to be on every
board and the colors are specified for particular indications.
- [Brad] There is an argument that if you know a board has a fault then you'll
need to exchange it anyway, so is there any point in conducting further
tests? But can you reproduce the fault? Will you be able to determine the
root cause? Do you need to take a snapshot? Was it a thermal overload or a
software glitch?
- [Ian] In that kind of vein, I've been looking at Single Event Upsets. These
are more likely to occur at high altitude than near sea level. Since they
affect SRAM and SRAM based FPGAs the effects can be either momentary or more
persistent. With some FPGAs you can detect an SEU by the configuration SRAM
CRC changing; other cases it's difficult to tell it from a hard fault. It
just goes away after the power is cycled.
- [1] 'Testing and remote field
update of distributed base stations in a wireless network', Chen-Huan Chiang
Wheatley, P.J. Ho, K.Y. Cheung, K.L. - Lucent Technol., Bell Labs., Holmdel,
NJ, USA; ITC 2004 Proceedings.
- [2] 'Remote boundary-scan system
test control for the ATCA standard', Backstrom, D. Carlsson, G. Larsson, E.
- Embedded Syst. Lab., Linkopings Universitet, Linkoping; ITC 2005 Proceedings.
- 2009 Survey
- [Ian] So far, we've had maybe a half-dozen responses, but a few have
generated additional referrals. As people complete the survey, I'll delete
them from the mailing list. I'll let the survey run for a couple of weeks
then send a reminder to the people still on the list; that worked quite well
last year.
- [Brad] I'd like to know, are people trying to answer the whole questionnaire
or just sections?
- [Ian] People are pretty much filling out the whole thing. There are some
bits getting missed out, but not a lot.
- [Ian] I can post a link to the results page, but we have to treat the data
as 'privileged'.
- [Brad] Yeah, when we did the 2006 survey, Ben tried to keep the detail data
to just the officers, and then give a summary, they way you did last year,
Ian. I'm just a bit cautious about privacy here.
- [Ian] That's a good point. What I can do is create a version of the results
page that removes names, companies and email addresses. {ACTION}
- [Brad] That would be good.
5. Schedule next meeting
Schedule for November 2009:
Monday November 23, 2009, 10:30 AM EST
Monday November 30, 2009, 10:30 AM EST
6. Any other business
- [Ian] I guess we should welcome Michele to the group.
- [Brad] I'd like to ask if anyone has a suggestion on a better way to move
forwards with Volume 3. I have the feeling we're going a bit stale now.
- [Ian] We have a lot of good content there; we maybe need to start by
reorganizing it a bit.
- [Ian] I have a thought that maybe we could start trying to set out headings
like we did with Volume 2, but I suspect it may not be as simple here: I
expect things will grow as we start to uncover more. That's probably not a bad
thing.
- [Ian] There are some things we've learned over the past few weeks that we need
to find a way to fit into the document: That JTAG is a 'plug-in' to a wider
test system, that we have 'stand-alone' and 'hosted' versions of JTAG within
the embedded solutions. These are things we hadn't really appreciated before.
- [Brad] I was thinking about requirements again. I guess what you put in the
headings in many cases will be 'it depends'. Looking at microTCA and what
happened when they added JTAG there - what most people wanted was multidrop,
but these were mezzanines with a directly connected TAP with no gateway. This
led to the concept of the JTAG switch: This was different and came after the
first version of the White Paper, so the the first White Paper completely
missed this.
- [Brad] The system requirements drove that. Perhaps we should look at
decomposing some existing systems, and look at what is available?
- [Ian] That sounds like a plan, but do we want to concentrate more on a 'best
practice' for future systems, rather than tying ourselves into legacy
architectures? I'm wary of being too retrospective.
- [Brad] I understand what you mean, but I can also see that there are ways to
draw an abstraction for legacy systems.
- [Ian] Do we look at the decompositions or the headings first? Is there a
preference?
- [Brad] I think the headings should be first; they give us a goal.
- [Tim] I don't know how this fits in, but what about the hardware state? If a
system is busy, you don't want to interrupt it. If you need to be in a Test
Mode, how do you get that across?
- [Carl] This is the online versus offline diagnostics debate.
- [Brad] We have to remember that we're governed by a higher level process. That
should be aware of what states are critical.
- [Carl] And what constitutes 'disruptive'.
- [Brad] Nonintrusive tests can take place at any time. You can say that in
order to run this test you need to be in one of this set of states.
- [Brad] If you decide to run a test outside of those states, then that's not
our problem. It gets back to the delegation issue.
7. Review new action items
- Brad: Provide citation for paper on distributed systems.
- Ian: Create sanitised survey results page and post link on private forums.
8. Adjourn
Tim moved to adjourn at 11:47 AM EST, seconded by Brad.
Respectfully submitted,
Ian McIntosh