I recently converted to the new layout provided by Blogger. As part of the face-lift process, I thought it would be fun to label my posts. You can find all the labels I came up with on the right sidebar.
In reality, these labels represent clusters of blog posts that are similar to each other in some way. Posts can belong to multiple labels or belong to none. Some labels might be very similar (so many posts belong to both) -- some might even be nested labels (although that doesn't happen in this blog).
Furthermore, these labels are very view-dependent. For example, while I have separate labels for computer science and machine learning (which is primarily a computer science sub-area), I have one big label for science and technology in general. But given my particular interest in life extension, I decided that could be kept separate from the rest.
I started wondering about how one might automatically cluster my blog posts. It's possible that conventional clustering techniques will yield interesting results, but I find that unlikely. That might work OK on very structured and high volume blogs like Overcoming Bias. But conventional techniques ignore the fact that these clusters are view dependent, so we probably need to leverage some amount of background knowledge (and possibly the network structure) for most blogs. In addition, we need a clustering model which can handle the fact that some posts belong to multiple labels and some posts belong to no labels. There is also a time-dependency factor that might play a big role.
Why is this interesting? First of all, I think it's compelling enough to be able to discover a person's view (or projection) of the global topic/discipline hierarchy. It might also help us discover more about ourselves and how we view the world (since the clusters discovered by any algorithm will inevitably be different from our own manually generated labels). From a technical standpoint, depending on the approach, tackling this problem might yield insight on designing new clustering techniques or utilizing the background information of the internet.
Like most ideas, this one probably won't lead to any interesting results. But it's fun to think about. In case anyone is interested, I harvested my blog posts (with labels and all) and it's available here.
Tuesday, December 30, 2008
Sunday, December 28, 2008
Combinatorics Question
Suppose we are sampling permutations of the first N positive integers from a uniform distribution. For any two integers x and y, what is the probability that they lie in same cycle?
For example, suppose we sampled the following permutation using the first 5 positive integers:
2 3 1 5 4
Then {1,2,3} form a cycle because position 1 points to 2, position 2 points to 3, and position 3 points to 1. Likewise {4,5} also form a cycle.
For example, suppose we sampled the following permutation using the first 5 positive integers:
2 3 1 5 4
Then {1,2,3} form a cycle because position 1 points to 2, position 2 points to 3, and position 3 points to 1. Likewise {4,5} also form a cycle.
Labels:
math
Saturday, December 27, 2008
Machine Learning Video Lectures
I've been spending a lot of time this week watching videos on videolectures.net (my way of celebrating Christmas). I found several that are very interesting as well as accessible to general audiences, and so I thought I'd share. The first few are short interviews with prominent machine learning researchers.

Bernhard Schölkopf (Max Planck Institute)

Tom Mitchell (CMU, chair of machine learning department)

Fei-Fei Li (UIUC -> Princeton)

Jon Kleinberg (Cornell)

Benardo Huberman (HP Labs / Stanford)
Some themes harvested from these interviews:
1) as any field matures, mathematical rigor inevitably becomes more important
2) scientific disciplines must evolve in order to pose (and answer) more challenging questions...
3) ... as such, interdisciplinary research tends to be the most compelling
4) it's difficult to predict the future, so make sure to stay flexible
5) general AI is probably still very far away
The next two are invited talks at large machine learning conferences and are a bit longer. Both are fairly high level and very well presented.

Andrew Ng (Stanford) invited talk at ICML 2008

Jon Kleinberg (Cornell) invited talk at KDD 2007
Bernhard Schölkopf (Max Planck Institute)
Tom Mitchell (CMU, chair of machine learning department)
Fei-Fei Li (UIUC -> Princeton)
Jon Kleinberg (Cornell)
Benardo Huberman (HP Labs / Stanford)
Some themes harvested from these interviews:
1) as any field matures, mathematical rigor inevitably becomes more important
2) scientific disciplines must evolve in order to pose (and answer) more challenging questions...
3) ... as such, interdisciplinary research tends to be the most compelling
4) it's difficult to predict the future, so make sure to stay flexible
5) general AI is probably still very far away
The next two are invited talks at large machine learning conferences and are a bit longer. Both are fairly high level and very well presented.
Andrew Ng (Stanford) invited talk at ICML 2008
Jon Kleinberg (Cornell) invited talk at KDD 2007
Labels:
internet / networks,
machine learning
Sunday, December 21, 2008
NIPS 2008
I recently spent a week in the Vancouver area attending NIPS 2008 (which stands for Neural Information Processing Systems). While the name might be slightly misleading, it is in fact the largest machine learning conference in existence. The main conference is a marathon that stretches across four days, with each day starting from the early morning and ending in poster sessions that often last until midnight.
NIPS has a strong tradition of tying together machine learning and biology-related research. This year featured invited talks by Daniel Wolpert on computational methods for human-motor control, Sabastian Seung on mapping every neural connection in the human brain, and by Rebecca Saxe whose talk I unfortunately missed. Attending research conferences can be quite invigorating. As one of the great enabling fields, machine learning offers remarkable promise in designing useful models for all types of application domains. I often find myself very inspired after listening to success stories.
After the main conference concluded, we gathered onto shuttle buses for a two and a half hour ride to the Whistler ski village, where the workshops are held. The NIPS workshops are the best research workshops I have ever attended or heard about. The location is great, the venue (the Hilton Ski Resort at Whistler) is fantastic, the audience is littered with prominent machine learning researchers. All in all, the NIPS workshops offer a great forum for discussion regarding exciting new machine learning research.
I'm currently working on algorithms that an information retrieval system (such as a search engine) might use for interactively learning from its users. My results are largely theoretical at the moment (i.e., proofs), but I hope to implement learning experiments on real search systems in the near future. You can catch a video of my workshop talk using the link below.
NIPS has a strong tradition of tying together machine learning and biology-related research. This year featured invited talks by Daniel Wolpert on computational methods for human-motor control, Sabastian Seung on mapping every neural connection in the human brain, and by Rebecca Saxe whose talk I unfortunately missed. Attending research conferences can be quite invigorating. As one of the great enabling fields, machine learning offers remarkable promise in designing useful models for all types of application domains. I often find myself very inspired after listening to success stories.
After the main conference concluded, we gathered onto shuttle buses for a two and a half hour ride to the Whistler ski village, where the workshops are held. The NIPS workshops are the best research workshops I have ever attended or heard about. The location is great, the venue (the Hilton Ski Resort at Whistler) is fantastic, the audience is littered with prominent machine learning researchers. All in all, the NIPS workshops offer a great forum for discussion regarding exciting new machine learning research.
I'm currently working on algorithms that an information retrieval system (such as a search engine) might use for interactively learning from its users. My results are largely theoretical at the moment (i.e., proofs), but I hope to implement learning experiments on real search systems in the near future. You can catch a video of my workshop talk using the link below.
Labels:
adventures,
machine learning
Tuesday, December 02, 2008
Scooping the Poop
Barack Obama has put his foot down. He's not going to scoop the poop.
Labels:
humor
Subscribe to:
Posts (Atom)