Minutes of Weekly Meeting, 2011-02-21

Meeting called to order: 11:07 AM EST

1. Roll Call

Carl Walker
Eric Cormack
Ian McIntosh
Peter Horwood
Heiko Ehrenberg
Adam Ley (left at 11:52)
Brian Erickson
Brad Van Treuren (joined 11:22)

Patrick Au
Tim Pender

2. Review and approve previous minutes:

02/14/2011 minutes:

  • Draft circulated on 02/15/2011.
  • Three corrections noted, all near top of section 4a:
    • effor -> effort
    • [Bard] -> [Brad]
    • th -> the
  • Eric moved to approve with the above amendments, seconded by Heiko, no objections or abstentions.

3. Review old action items

  • Adam proposed we cover the following at the next meeting:
    • Establish consensus on goals and constraints
    • What are we trying to achieve?
    • What restrictions are we faced with?
  • All: do we feel SJTAG is requiring a new test language to obtain the information needed for diagnostics or is STAPL/SVF sufficient? see also Gunnar's presentation, in particular the new information he'd be looking for in a test language
  • Ian/Brad: Condense gateway comments and queries into a concise set of questions. - Ongoing
  • All: Forward text file to Ian containing keywords from review of meeting minutes. - Ongoing.
  • Ian to extract list of frequently used words from search engine database. - COMPLETE.

4. Discussion Topics

  1. Newsletter
    - Next issue due at the end of February
    • [Ian] I haven’t had a chance to put together a draft during the last week. As is often the case, I’m struggling for news items to be included. The update of the Working Group procedures would be worth mentioning, and perhaps we can mention the effort under way right now, to extract and collate key takeaway points from past meetings, to help us moving forward.
    • [Ian] Are there any other suggestions for the upcoming newsletter?
    • {silence}
    • [Ian] I'll prepare a draft during the early part of this week and if anyone comes up with ideas for the newsletter, please email them to me as soon as possible. We can review the draft next week as we'll need to get it out as soon as possible after that.
    • [Ian] I'll make that an action for me to prepare the draft {ACTION} and another on everyone else to email any topic suggestions to me (ACTION).
  2. Identification of key "Take Away Points"
    - Progress on review of past minutes
    - Pre-selection of keywords
    - Further thoughts on tooling{Ian shared the keyword spreadsheet he sent out earlier today}
    • [Ian] I got the table of keywords out of the database easily enough. The links tables indirectly give a count of how many times each word gets used. The database is actually organized into 16 links tables, but I exported them as two tables for ease of use in Excel. In Table_A and Table_B you can see the Keyword ID next the Link ID - i.e. the page where the word was found. Weight was a measure of relevance, but since we were probably collecting words from different contexts it's probably no help to us.
    • {Brad joined the call}
    • [Ian] I ran a script to count the number of time each keyword ID in the keyword sheet appeared in either of the two link table sheets. I then sorted that by frequency.
    • [Ian] But I realized that wasn't going to help much as it didn't show other related words that could affect the ranking. I copied this to a new sheet and resorted alphabetically.
    • [Ian] I reduced the list slightly by removing keywords with 0 search results, for example, fixed some typos and merged a few duplicates. But we still have a list of over 7000 keywords. These still include some words that are misspelled, and variations of the same word (including singular and plural, present and past tense, etc.). There are also a few words that can take a different meaning depending upon the context.
    • [Ian] Brad have you had a chance to look at this at all?
    • [Brad] Not really; I've browsed it very quickly. A lot of words are just nonsense and then there's things like you pointed out - "observe" and "observable".
    • [Ian] As there's still a lot work to be done here, I think I should ask if we feel this is a good use of our time?
    • [Brad] We should go through pruning this list to identify words that return results from multiple domains. The list obviously includes a lot of words that we can remove.
    • [Brad] I have a concern that we can't determine the contexts from this list.
    • [Ian] True, but we can find the keyword IDs in the other tables, and that gives us the link IDs, which we can then check to find the usage.
    • [Ian] I think we can split this list in three categories: General terms that won’t really return any meaningful, specific terms that will clearly return relevant search results, and terms that are maybe ambiguous and are in a sort of "gray area" and we're not sure whether to keep or not.
    • [Ian] We started off with about 9,200 rows, having taken out already about 2000 terms we are now at about 7200 terms, and looking at the list there are still a lot of terms that can be deleted from this list.
    • [Ian] There will still be typos in there, which mess up the alphabetical sorting. Also names - maybe they're relevant search terms, maybe they're not. I can continue sifting through this but it'll take time.
    • [Brad] I think divide and conquer is the only way to get through this.
    • [Ian] OK, so who feels able to take on some of this? Once we see how many way we have we can think how to split it up.
    • [Carl] I can probably look at a block.
    • [Brad] I don't really know how this week is going to go.
    • [Ian] I'll also take one. Anyone else?
    • {Silence}
    • [Ian] OK, taking half each isn't really going to work. I'm not sure how we deal with this - I was correcting spellings, resorting, then merging similar entries.
    • [Brad] We want to keep it simpler than that. Just markup whether it's to be kept or not.
    • [Carl] How?
    • [Brad] Use fill color, say red for ones we want to toss.
    • [Brian] Rather than color, why not use an extra column, then you can sort on that column and it'll save on clicks compared to using color.
    • [Brad] That's a good idea. Maybe we can enumerate the options now?
    • [Brad] I'm seeing four: Toss, Questionable, Don't know (where we need to see the context) and Keep.
    • [Ian] If we leave "Keep" as blank then we number the others. Or maybe use letters.
    • [Brad] Letters may be more intuitive: T for "toss", Q for "questionable" and D for "don't know".
    • [Ian] Yes, those don't look easily confused, and they're far enough apart on the keyboard to avoid accidental mistyping.
    • [Ian] I've already skimmed through the A's and B's. Perhaps if we start at row 1000 of the table labeled "Sorted (2)" for the first block through to 1499 (Carl), the next block starts at row 1500 through to 1999 (Ian), and the third block starts is row 2000 through to 2499 (Brad).
    • [Ian] If anyone else finds they have some time, they can just email to get the next available block.
    • [Ian] Hopefully, we can get rid of the "tosses" quickly and can concentrate on the "questionables" and "don't knows".
    • [Brad] That's the idea.
    • [Ian] Once you've done your block, strip off the other sheets from the workbook and email me the resulting table and I'll merge it into the master. Use my home email address for this.
  3. White Paper
    - Volumes 1 and 2 are "stable": How do we progress Volumes 3 to 5?
    • {Not discussed due to lack of time}

5. Key Takeaway for today's meeting


6. Schedule next meeting

Schedule for February 2011:

Schedule for March 2011:
7th, 14th, 21st, 28th.

Patrick will be unavailable until March 7th.

7. Any other business


8. Review new action items

  • Ian: Prepare draft of Newsletter and circulate for review during this week.
  • All: Email suggestions for Newsletter items to Ian prior to the end of this week.

9. Adjourn

Eric moved to adjourn at 12:00 AM EST, seconded by Brian.

Thanks to Heiko for providing additional notes for this meeting.

Respectfully submitted,
Ian McIntosh