Minutes of Weekly Meeting, 2011-02-14

Meeting called to order: 11:09 AM EST

1. Roll Call

Carl Walker (left at 11:30)
Adam Ley (left at 11:51)
Eric Cormack
Ian McIntosh
Tim Pender
Heiko Ehrenberg
Brad Van Treuren

Patrick Au

2. Review and approve previous minutes:

02/07/2011 minutes:

3. Review old action items

4. Discussion Topics

  1. Identification of key "Take Away Points"
    - Progress on review of past minutes
    - Search tooling?
    • [Ian] I completed converting my notes into the new indented text format and brought all the records together into a single code block.
    • [Ian] At this point, I don't really see any other indications of progress. Brad sent me through some work he'd been doing on the database end, so I guess that's where his effort has been this week.
    • [Brad] I thought it was important that we had that reading facility, so we have a low risk there.
    • [Ian] I've received keywords in different formats but that was to be expected.
    • {shared text file of merged lists}
    • [Ian] I tried to keep some of the context, so I used a cut down version of the indented text format. Heiko and Eric gave me simpler lists.
    • [Ian] Heiko suggested that specific terms, such as ATCA or MicroTCA, don't necessarily need to be listed as keywords, since they can be found with a simple text search. Do we think that's what we'll do?
    • [Brad] In regards to a database, a simple text search is not really an option, as those terms would be missed from the database. I’d recommend we include such specific terms in the keyword list.
    • [Heiko] I don’t have an objection to including them.
    • [Ian] One of the issues coming up is UK and US spelling differences. Another concern is singular and plural, such as "requirement" and "requirements". Or hyphenated and compound words.
    • [Ian] We are trying to come up with a list of standard keywords. We can choose which forms we're accepting.
    • [Brad] That was the goal, so we get more uniformity in the reports.
    • [Ian] Anyway, I extracted all the keyphrases and sorted them alphabetically. That gave me a list of 177 phrases. Within that, there were clearly some duplicates. Deleting those got me down to 144.
    • [Ian] Some of those are words that are different in terms of being singular or plural, or different forms, but are generally equivalent to each other.
    • [Ian] I wonder if we can use wildcards to cope with the plurals, etc. No, that's maybe not a good idea.
    • [Brad] You maybe get that list down to 124 words. That's a big drop-down list.
    • [Ian] A drop down-list for keywords is maybe overwhelming. Listing contexts in a drop-down list should be feasible.
    • [Brad] I think we can decide on singular or plural for our first cut.
    • [Ian] Singular feels like it'd have the more general applicability.
    • [Brad] Yes, I was thinking that it would be more semantically correct. I make a motion that we accept singular use for keywords, as opposed to plural use.
    • [Carl] I second that.
    • [Ian] Any objections to that?
    • {none voiced}
    • [Ian] OK, so we’ll keep all keywords in singular form.
    • [Ian] How about hyphens, do we need to include both forms (with and without)?
    • [Brad] I see people use it both way.
    • [Ian] We could convert all hyphens to spaces
    • [Brad] That's not always going to work.
    • {Carl Walker had to leave the meeting}
    • [Heiko] We could look up the dictionary spellings and use those.
    • [Ian] Not all dictionaries agree.
    • [Brad] We need to arrive at a convention for these
    • [Ian] What if, both during entering search terms and database lookup we stripped out whitespace and hyphens?
    • [Brad] I see where you're going, but what if we wrote a piece of software that flagged up any keywords that are inconsistent? We could create a Python dictionary and add terms over time as needed. When we get a word flagged then we vote on wether or not it gets added to the dictionary.
    • [Tim] Are you thinking about having a drop down list, with an "other" option?
    • [Ian] I think what Brad was planning was having a front end that parses the entire text record set, then outputs a list of the unknown words. You can then update the dictionary and re-run the parsing to generate the database tables completely fresh. I think that's what you intended Brad?
    • [Brad] Yes, it was.
    • [Tim] OK, I've got it.
    • [Ian] How about the spellings, do we lean towards US spelling? That's probably the majority usage.
    • [Brad] We’ll have to go with whatever IEEE mandates.
    • [Ian] So, we can clean out the current list of keywords a little more. But it's still a big list. Even if we only take 30 seconds deciding on each word we're still looking at an hour's work. And in many cases we'll need to refer to contexts.
    • [Brad] It should be part of the review that each person will object to removal based on their knowledge of use.
    • [Ian] It's difficult without context.
    • [Brad] Some piece of software could be written to list the word with their contexts.
    • [Ian] That would create a bigger list, since the same word could appear in several contexts.
    • [Brad] Yes, there's a many to many relationship there.
    • [Tim] Will there be hierarchy? Some kind of form where the list of keywords change based on the selected context? You select a context then the next field has a list of keywords.
    • [Ian] It's possible.
    • [Brad] There will be cases where you want to do a wildcard search for a keyword in all contexts.
    • [Ian] I had an idea that keywords could possibly be validated as you entered then during a search. Maybe coloring words that aren't in the database.
    • [Brad] You mean like code completion? something like what happen when you use a search engine?
    • [Ian] Something like that. I think our website search tool does that.
    • {Search tool demonstrated}
    • [Brad] We could post the dictionary next to the search engine.
    • {Adam had to leave the meeting}
    • [Ian] I'm wondering if we can use the search tool's databse to give us the keywords.
    • [Brad] Can you specify the dictionary to be used in the search of a page?
    • [Ian] It has "categories" that can be extended, but they're maybe not doing what you'd hoped for. They more just limit the scope of the search.
    • [Brad] We could let software go through the comments and collect keywords, but that doesn't deal with the context where the appropriate keyword doesn't explicitly appear.
    • [Ian] Let's see what the database has.
    • {Opened administration panel for search tool}
    • [Ian] Well, we seem to have 9249 words, with 59,875 relations. That isn't going to help too much.
    • [Brad] That becomes a really big list.
    • [Ian] I could may be metrics on keyword usage? Find the big hitters.
    • [Brad] That could be very useful. I'm sure a lot of the words we're looking for will be used quite widely. After you get rid of common words like articles, etc.
    • [Ian] I'm pretty sure that the search tool is filtering those out anyway. I'll take an action to try to extract a list of big hitter words from the search engine database. {ACTION}
    • [Ian] I'll also filter down the manual word list, as a backup.
    • [Ian] We've run out of time again. I had hoped to talk a little about the tooling, but that will need to wait.
    • [Ian] I'd like to see some progress on filling out the review records.
    • [Brad] I'd like to ask if I should be holding off on writing any software to produce word lists?
    • [Ian] I'd say so. Don't spend effort that isn't necessary just now.
  2. White Paper
    - Volumes 1 and 2 are "stable": How do we progress Volumes 3 to 5?
    • {Not discussed due to lack of time}

5. Key Takeaway for today's meeting

6. Schedule next meeting

Schedule for February 2011:

Patrick will be unavailable until March 7th.

7. Any other business


8. Review new action items

9. Adjourn

Eric moved to adjourn at 12:08 PM EST, seconded by Brad.

Thanks to Heiko for providing additional notes for this meeting.

Respectfully submitted,
Ian McIntosh