Chat History with elmcity (#jonudell/$1534ca0d7395b7da)

Created on 2009-09-10 19:50:30.

Jon Udell: 19:02:44: ok, bailing on voice and trying chat
Diane: 19:03:00: sounds good
Jon Udell: 19:03:15: roll call?
Meghan McNeil: 19:03:41: here!
Nikita Pchelin: 19:03:46: skype is unrealiable indeed :)
Diane: 19:03:52: here
Jory Graham: 19:03:59: Present.
Jon Udell: 19:04:17: OK, good enough. I'll just go in order. Diane, what piece do you want to tackle?
Jon Udell: 19:04:58: Actually, everybody can just say what task appeals.
Diane: 19:05:02: Are you refering to the two projects for civic minded students?
Jon Udell: 19:05:12: Or anything related.
Jory Graham: 19:06:01: I think that we could recreate FuseCal.
Diane: 19:06:17: I haven't had a chance to fully look into all the details of these pieces because I have been away for a few days. Just catching up now.
Meghan McNeil: 19:06:27: I was interested in working on the FuseCal replacement
Jon Udell: 19:06:27: That'd be excellent. I have some starter projects: warmup exercises for recreating FuseCal.
Jon Udell: 19:07:09: 1: MySpace. 2: LibraryThing.
Nikita Pchelin: 19:07:13: FuseCal seems to be the most appealing to me too, though I wouldn't mind working on the "site-specific scrapers", that seems to be useful too :)
Diane: 19:07:50: Yes from what I've read so far, the site scrapers looks very appealing.
Meghan McNeil: 19:07:56: I was the same way, I was intersted in FuseCal but I think working on the second project would be really cool
Jon Udell: 19:08:02: So, here is the MySpace recipe I was using courtesy of FuseCal: http://blog.jonudell.net/2009/05/01/myspace-fusecal-awesome/
Jon Udell: 19:08:43: The task would be to recreate the ability to bookmark a MySpace site and have it automagically produce an iCalendar feed, optionally filtered by a keyword.
Jon Udell: 19:09:07: Same for LibraryThing, which has a rich events system that's missing iCalendar export.
Jon Udell: 19:09:31: From those two exercise, you could then begin to try abstracting and generalizing.
Jon Udell: 19:10:00: In addition to the screenscraping details, there's a point here about loosely coupled services.
Jon Udell: 19:10:41: We want curators to be able to signal to your service, via the Delicious metadata, that they want these feeds to happen. Then the aggregator will expect to pick them up at a conventionally-named location.
Jon Udell: 19:11:31: I guess you'll be using whatever infrastructure your schools provide to do this. There's no UI, and no realtime request/response requirement, so it's reasonably straightforward. Sound OK?
Jory Graham: 19:12:18: I'd like to ask about the Delicious component.
Jon Udell: 19:12:32: OK, ask.
Jory Graham: 19:13:04: Are the bookmarks on Delicious typically already in iCal format, or would the service exist in between Delicious and the ElmCity aggregator?
Jon Udell: 19:13:38: Yes, the bookmarks point to iCal feeds.
Jon Udell: 19:13:55: Which are really just .ICS files hosted on a webserver anywhere.
Jon Udell: 19:15:06: The service I'm envisioning would infer, from a bookmark pointing to MySpace, that it should find the calendar web page on MySpace, create a corresponding iCal feed, and publish it at a conventionally-named location. The name might be the band's name.
Jon Udell: 19:15:58: Oh, and obviously this would run on some schedule, like daily or every 8 hrs or whatever.
Jon Udell: 19:16:40: Like I said, there's an analogous situation with LibraryThing and a few others. Do you want to do these individually or in teams?
Jory Graham: 19:18:16: It seems like the service component would only need to be done once, with the site specific stuff plugging into that architecture.
Nikita Pchelin: 19:18:17: I think doing these in teams, at least in the beginning, will help us understand the project faster
Diane: 19:18:24: I would prefer to do these in teams.
Meghan McNeil: 19:18:37: i agree, teams would be useful
Jon Udell: 19:18:57: Or maybe one single team, given Jory's point about generalizing the service aspect from the get-go.
Diane: 19:19:57: That could work. I think we'd gain a lot more by working together.
Jon Udell: 19:20:27: And everyone (at least on this chat) was interested, correct? Who has done HTML scraping before?
Jory Graham: 19:21:21: I've done it recently in python, less recently in perl.
Nikita Pchelin: 19:22:06: I've done a bit of scraping, as part of one of my programming courses
dashrantic: 19:22:19: hello all
Jon Udell: 19:22:32: Hi Jack, we bailed on audio as you can see.
Jon Udell: 19:22:53: Anyway, scraping is nasty grunt work but there are two higher-order interesting aspects to this.
dashrantic: 19:22:56: ah alright
Jon Udell: 19:23:22: First: Abstracting toward a general-purpose thing like FuseCal. Interesting and /hard/.
Jon Udell: 19:24:38: Second: Avoiding the problem entirely. I'm not kidding. The highest and best solution here -- and the real purpose of the whole project -- is to educate people about publishing their own data feeds /as data/. I have already talked to folks at MySpace and LibraryThing about this, to no avail. But anybody who can convince either to produce a feed, and obviate the need to screenscrape, wins big.
Jon Udell: 19:25:52: In other words, the social hack would trump the technical hack. Just something to keep in mind.
Nikita Pchelin: 19:26:56: it's just the second one probably won't happen any time soon :)
dashrantic: 19:27:35: hrm, is there a preferred calendar standard that you would recommend to push people to use? seems like the social push might be easier for some smaller websites than the big ones
Jon Udell: 19:27:49: Nikita: You never know. It's all about whom you can meet, discuss with, and convince.
Jon Udell: 19:28:24: Jack: The standard is iCalendar, all popular apps export it: Google Calendar, Outlook, Apple iCal, Drupal, WIndows Live, etc. The challenge is that nobody realizes this.
Jon Udell: 19:30:33: So it sounds like you folks want to go in this direction, which is great. You can start right away on the MySpace and LibraryThing exercises, I will look for other cases in broad use that will have impact. As you go through the exercises, be thinking about how to generalize.
Jon Udell: 19:32:07: Megan, you mentioned the second project from the "projects for civic-minded students" entry. Want to discuss that some too?
Meghan McNeil: 19:32:17: sure
Jon Udell: 19:32:50: Depending on how you look at it, it's a natural-language parsing problem or a crowd-sourcing problem.
Jon Udell: 19:32:59: Or actually, I think it's both.
Jon Udell: 19:33:34: My worry is that it's just too open-ended.
Jon Udell: 19:33:44: What do you folks think?
Jory Graham: 19:35:02: I'd say that the social aspect almost outweighs the programmatical aspect, since any candidate event really requires interaction to verify it.
Jon Udell: 19:35:58: Yes. Orchestrating that workflow will be quite challenging.
Jon Udell: 19:36:31: BTW, the writeup is here: http://blog.jonudell.net/2009/08/10/two-projects-for-civic-minded-student-programmers/
Jon Udell: 19:37:41: There are lots of ways to think about this, though. Anybody seen/used Mechanical Turk?
dashrantic: 19:37:56: the Amazon thing?
Jon Udell: 19:37:59: Yes.
dashrantic: 19:38:09: Yeah, I've heard of it, haven't used it though.
Jon Udell: 19:38:39: Imagine writing code that calls what look like web services but are, on the other end, people.
Jon Udell: 19:39:06: Those people can do whatever tasks you can adequately define. One such task might be verification.
Jon Udell: 19:40:10: Anyway, let's table that for now. If you want to go in that direction, think about it some more and we'll discuss later.
Jon Udell: 19:40:57: Is everybody OK with doing a team project that starts with site-specific scrapers and aims toward a FuseCal-style generalization?
Nikita Pchelin: 19:41:09: yes
Jory Graham: 19:41:11: Sounds good to me.
dashrantic: 19:41:12: yup
Meghan McNeil: 19:41:14: yeah, that sounds good
Diane: 19:41:18: yes
Jon Udell: 19:42:31: Unanimous! Cool. OK, we're done. Let me know where you'll be blogging your design discussions and development narration, so I can aggregate those feeds and participate in commentary.
Jon Udell: 19:43:23: Questions/comments?
Nikita Pchelin: 19:43:59: nope
Meghan McNeil: 19:44:27: right now I'm good, I think once we get started I'll have more.
Diane: 19:45:07: yes definitely questions will come up as we get started...
dashrantic: 19:45:16: ya
Jory Graham: 19:46:14: I'll be blogging at jorygraham.com, though I don't have it set up quite yet.
Diane: 19:46:26: Is there a specific place you would prefer us to blog to?
Jon Udell: 19:46:39: OK, then adjourned. Thanks very much for agreeing to help, I'm really looking forward to working with you on this. Rather than clutter the main project FriendFeed room, I think I'll make another where I can aggregate your blogs and merge the discussion.
Jon Udell: 19:47:52: Diane: Doesn't matter where, so long as it produces a feed. And if the blog mixes elmcity and non-elmcity stuff, I'll need a tagged feed. Could be, say, WordPress native tags, or could be a Delicious layer on top of any system. This is the recursive aspect of the project: we use tagged feeds to develop a system based on tagged feeds :-)
Diane: 19:48:15: Sounds good!
Jory Graham: 19:48:23: I'll email out a url when I have one available.
Diane: 19:48:31: same
dashrantic: 19:48:44: yeah, I'll need to setup a new wordpress blog, so I'll email that out when I get it setup as well
Jon Udell: 19:48:57: OK, bye all. Talk to you later.
Diane: 19:49:05: Thank you.
Nikita Pchelin: 19:49:16: bye, thanks
Meghan McNeil: 19:49:42: good bye.