Scientific Blogging is currently holding a University Writing Competition, which invites graduate students from the "top science universities" to submit popular science blog articles on the topics of their choosing.
I'm happy to announce that my submission, Self-Improving Systems that Learn Through Human Interaction, has been named one of the finalists. The final winner is determined through popular voting (such as by people like you). So I encourage everyone to peruse all the submissions and, if motivated, cast your vote =)
(The voting mechanism is the grey box located across from the university seal in each of the submissions. It appears to be a bit buggy.)
If you know others who would enjoy reading my article (or any of the others), let them know!
Sunday, November 01, 2009
Sunday, October 18, 2009
The Internet: Our Telescope Into Human Society
Imagine that we've been developing telescopes to capture increasingly detailed snapshots of an alien civilization. At first, we could only detect their large structures such as buildings and roads. Based on how their cities are organized, we might infer that certain buildings or city regions are important. We might even be able to tease apart different communities just based on infrastructure information. From this, we start to formulate theories of their social behavior and organization.
As our telescopes grew more powerful, we became able to detect traces of activity, such as from transportation vehicles and eventually even from individual people. With this new wealth of information, we can further refine our understanding of this blossoming extraterrestrial society, and paint an amazingly vivid picture of their social dynamics.
Of course, we could just as easily tell the same story about our own society. Our telescope is the internet. In the early days, we relied on relatively the static hyperlink structure to infer the authoritative or trustworthy websites. We could even detect online communities by analyzing link density.
As the internet (and our dependence on it) expanded, much of its content became more dynamic (e.g., blogs, forums, Yahoo! Answers). Nowadays, we can even trace real-time online activities on sites such as Google, Twitter, Facebook, Amazon, and many others. Our numerous online activities all leave digital footprints which reflect the fine grained dynamics of our own society. Such information can be incredibly useful. For example, a Nature article published earlier this year showed how analyzing search behavior on Google can provide a faster turnaround time to detecting influenza outbreaks. As another example, a recently published PNAS article showed how mobile phone data can be used to infer friendships. That is the power of the digital medium.
We now also have an opportunity to introduce new levels of empirical rigor to many social science disciplines. In years past, acquiring social data was a very labor intensive task, often requiring months or years to collect a modestly sized dataset. Nowadays, sociologists can mine all of Livejournal or Facebook to study the global structure of things like social influence and gossip. Companies like Google constantly run auctions to determine which ads to show whenever someone issues a query -- this is an economist's dream.
Here at Cornell, Professor Jon Kleinberg is one of the leaders in studying the convergence of social and technological networks. As part of an undergraduate course he and Professor David Easley have been teaching the past few years, they have written a new textbook. From their website: "Drawing on ideas from economics, sociology, computing and information science, and applied mathematics, it describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected."
Incidentally, this was the topic I initially wanted to write about for the University Writing Competition hosted by Scientific Blogging. But seeing as how any meager offering of mine would pale in comparison to Jon Kleinberg's fantastic CACM article, I decided to change topics to something I'm more of an "expert" on: Self-Improving Systems that Learn Through Human Interaction.
As our telescopes grew more powerful, we became able to detect traces of activity, such as from transportation vehicles and eventually even from individual people. With this new wealth of information, we can further refine our understanding of this blossoming extraterrestrial society, and paint an amazingly vivid picture of their social dynamics.
Of course, we could just as easily tell the same story about our own society. Our telescope is the internet. In the early days, we relied on relatively the static hyperlink structure to infer the authoritative or trustworthy websites. We could even detect online communities by analyzing link density.
As the internet (and our dependence on it) expanded, much of its content became more dynamic (e.g., blogs, forums, Yahoo! Answers). Nowadays, we can even trace real-time online activities on sites such as Google, Twitter, Facebook, Amazon, and many others. Our numerous online activities all leave digital footprints which reflect the fine grained dynamics of our own society. Such information can be incredibly useful. For example, a Nature article published earlier this year showed how analyzing search behavior on Google can provide a faster turnaround time to detecting influenza outbreaks. As another example, a recently published PNAS article showed how mobile phone data can be used to infer friendships. That is the power of the digital medium.
We now also have an opportunity to introduce new levels of empirical rigor to many social science disciplines. In years past, acquiring social data was a very labor intensive task, often requiring months or years to collect a modestly sized dataset. Nowadays, sociologists can mine all of Livejournal or Facebook to study the global structure of things like social influence and gossip. Companies like Google constantly run auctions to determine which ads to show whenever someone issues a query -- this is an economist's dream.
Here at Cornell, Professor Jon Kleinberg is one of the leaders in studying the convergence of social and technological networks. As part of an undergraduate course he and Professor David Easley have been teaching the past few years, they have written a new textbook. From their website: "Drawing on ideas from economics, sociology, computing and information science, and applied mathematics, it describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected."
Incidentally, this was the topic I initially wanted to write about for the University Writing Competition hosted by Scientific Blogging. But seeing as how any meager offering of mine would pale in comparison to Jon Kleinberg's fantastic CACM article, I decided to change topics to something I'm more of an "expert" on: Self-Improving Systems that Learn Through Human Interaction.
Labels:
computer science,
internet / networks
Friday, October 09, 2009
John Hopcroft Symposium
I spent most of today attending a symposium held in honor of John Hopcroft's 70th birthday. Symposium covered an incredible gamut of John's achievements as revealed through his former students and colleagues. It was a remarkable experience for the younger faculty and current students such as myself.
John has been a faculty member in the computer science department at Cornell since its early days. In many ways, he helped define the field of computer science as a separate discipline from mathematics and electrical engineering, and is best known for his seminal work on algorithms and data structures with Robert Tarjan (for which they won the Turing Award in 1986).
In addition to his early algorithmic exploits, John has also delved into a myriad of other areas such as complexity theory, scientific simulations, robotics, and now large scale data processing. It was incredibly inspiring to hear others describe John's ability to repeatedly identify important emerging areas and the subsequent boldness he displays in rapidly adjusting his own research agenda. I have personally witnessed his most recent push to develop algorithms suitable for the information age (as evidenced by his recent talks on the subject). But I hadn't realized until today just how many times he'd successfully foraged in new research directions in the past.
In all, it was a wonderful event. Many thanks to Jon and Eva for organizing it.
John has been a faculty member in the computer science department at Cornell since its early days. In many ways, he helped define the field of computer science as a separate discipline from mathematics and electrical engineering, and is best known for his seminal work on algorithms and data structures with Robert Tarjan (for which they won the Turing Award in 1986).
In addition to his early algorithmic exploits, John has also delved into a myriad of other areas such as complexity theory, scientific simulations, robotics, and now large scale data processing. It was incredibly inspiring to hear others describe John's ability to repeatedly identify important emerging areas and the subsequent boldness he displays in rapidly adjusting his own research agenda. I have personally witnessed his most recent push to develop algorithms suitable for the information age (as evidenced by his recent talks on the subject). But I hadn't realized until today just how many times he'd successfully foraged in new research directions in the past.
In all, it was a wonderful event. Many thanks to Jon and Eva for organizing it.
Labels:
announcements,
computer science
Sunday, October 04, 2009
Our minds linked together: the case for being Borg
Those who've watched Star Trek: The Next Generation will know the Borg as a race of mindless cybernetic drones bound by a collective consciousness and lacking any trace of individuality. The Borg travel across the galaxy, forcefully assimilating any civilizations they encounter into their collective. As such, they must be mortal enemies of humanity, since our sense of self (and by extension our liberty) is a large part of what makes us human -- or so we believe.
What I find so intriguing about the Borg is the fact that they can communicate so efficiently. In essence, all their minds are directly connected to each other via some kind of super internet. The most optimistic among us believe that we might also achieve such an ability within our lifetimes.
I don't think anyone would argue that explosion of information made available by the internet is a bad thing. But humans are limited both by how much information we can send, receive and manage, as well as by the amount of computation or reasoning we can perform (not to mention all the inherent biases embedded in our neural hardware). It's why advertising companies and politicians care so much about marketing slogans and grabbing mind share. It's why we can benefit from playing games with indirect social and status signaling. It's essentially why we still have so much misunderstanding in this world.
While we shouldn't hope to completely eliminate our limitations (as that would violate our current understanding of physics), there's no reason to think that we cannot create a more efficient and freer society by making it easier and faster to communicate and organize information. In fact, we are already experiencing a change in attitude regarding what we take for granted communication-wise. For instance, I already have developed an intrinsic feeling of being connected to others through the internet. When GMail went down this past summer, I actually felt lonely (probably much like how Hugh felt in Star Trek episode "I, Borg"). Whenever I don't have my phone with me, I often feel somehow naked or incomplete.
Contrary to how the Borg are portrayed, expanding communication bandwidths will likely increase our diversity of thought rather than suppress it. I think the general principle of having robustness (e.g., promoting diversity) is a lesson that's been well learned (with notable exceptions). Forward thinking companies such as Google (where I interned this past summer) and Microsoft (3 summers ago) consistently emphasize diversity of thought and creativity in problem solving. Having a more connected society will make it easier for the good ideas to surface.
There are, of course, concerns over whether we could or should integrate our minds with "machines". One reason why we're supposed to find the Borg so repulsive is due to their cybernetic nature. Rather than debating what it means to be "human", I think it's sufficient for this discussion to note that such technologies can be a life-changer for people suffering from paralysis. In addition, there is still much left to accomplish with non-invasive interfaces (such as our phones), so we still have a ways to go before we need to cross that tricky bridge of having immersive virtual environments.
What I find so intriguing about the Borg is the fact that they can communicate so efficiently. In essence, all their minds are directly connected to each other via some kind of super internet. The most optimistic among us believe that we might also achieve such an ability within our lifetimes.
I don't think anyone would argue that explosion of information made available by the internet is a bad thing. But humans are limited both by how much information we can send, receive and manage, as well as by the amount of computation or reasoning we can perform (not to mention all the inherent biases embedded in our neural hardware). It's why advertising companies and politicians care so much about marketing slogans and grabbing mind share. It's why we can benefit from playing games with indirect social and status signaling. It's essentially why we still have so much misunderstanding in this world.
While we shouldn't hope to completely eliminate our limitations (as that would violate our current understanding of physics), there's no reason to think that we cannot create a more efficient and freer society by making it easier and faster to communicate and organize information. In fact, we are already experiencing a change in attitude regarding what we take for granted communication-wise. For instance, I already have developed an intrinsic feeling of being connected to others through the internet. When GMail went down this past summer, I actually felt lonely (probably much like how Hugh felt in Star Trek episode "I, Borg"). Whenever I don't have my phone with me, I often feel somehow naked or incomplete.
Contrary to how the Borg are portrayed, expanding communication bandwidths will likely increase our diversity of thought rather than suppress it. I think the general principle of having robustness (e.g., promoting diversity) is a lesson that's been well learned (with notable exceptions). Forward thinking companies such as Google (where I interned this past summer) and Microsoft (3 summers ago) consistently emphasize diversity of thought and creativity in problem solving. Having a more connected society will make it easier for the good ideas to surface.
There are, of course, concerns over whether we could or should integrate our minds with "machines". One reason why we're supposed to find the Borg so repulsive is due to their cybernetic nature. Rather than debating what it means to be "human", I think it's sufficient for this discussion to note that such technologies can be a life-changer for people suffering from paralysis. In addition, there is still much left to accomplish with non-invasive interfaces (such as our phones), so we still have a ways to go before we need to cross that tricky bridge of having immersive virtual environments.
Saturday, September 26, 2009
Are most numbers big or small?
Consider the following statement:
1) Most numbers are big. Given any threshold K, there are always many more numbers bigger than K as opposed to less than K. So if we define big using some kind of absolute threshold, then most numbers are big.
2) Most numbers are small. For any number K, there are always more numbers bigger than K as opposed to less than K. So if we define big using some kind of percentile threshold, then most numbers are small.
Just another little quirk of them good old countably infinite sets.
For any fixed positive integer threshold K, there exists a finite number of positive integers less than K and an infinite number of positive integers greater than K.One can plausibly interpret the above statement in two opposing ways.
1) Most numbers are big. Given any threshold K, there are always many more numbers bigger than K as opposed to less than K. So if we define big using some kind of absolute threshold, then most numbers are big.
2) Most numbers are small. For any number K, there are always more numbers bigger than K as opposed to less than K. So if we define big using some kind of percentile threshold, then most numbers are small.
Just another little quirk of them good old countably infinite sets.
Labels:
math,
random ponderings
Friday, September 25, 2009
Turing Tests for the Internet Age
In a conventional Turing test, a human judge tries to discern which of two text chat conversations is with another human and which is with a computer program. Since it was first proposed in 1950, designing computer programs that can pass the Turing test has become a milestone target of sorts for artificial intelligence researchers.
Now that the World Wide Web has enabled us to spend significant time in online environments, web-based variants of the Turing test are quickly becoming research goals with immediate or near-term practical interest. For example, when we send emails to customer service and receive coherent (and hopefully helpful) responses, we typically assume we're being serviced by a human agent. As we all know, living in a information economy drives up the price of labor. A significantly cheaper alternative would have computer programs that can automatically understand and resolve each customer's specific issues while also providing a pleasant service experience. There would imply no more bitchy emails from disgruntled human workers, much faster turn-around time on service requests, and overall increased efficiency in this sector of society.
Currently, we are still quite far from having programs that are of practical value. One way to generate interest and spur innovation is through competitions, very much like how the X PRIZE Foundation awards prize money for demonstrating various feats of technological prowess. While we should, of course, also pursue other avenues of research, I echo John Langford's sentiment that competitions give people who know how to do things a chance to distinguish themselves. In fact, there already exists a Turing test competition for computer game bots within the gaming community.
Regardless of whether we promote this research direction through competitions or other means, it would certainly be useful to have automated online services with rich interactive components (such as the aforementioned customer support service). What might be some other interesting Turing test variants that we can/should focus on?
One idea is some kind of social networking bot. For example, one might have a profile on Facebook, MySpace, or OkCupid that is controlled automatically by a computer program. Consider the OkCupid scenario, which is reminiscent of the great Mark V. Shaney experiment but taken one step further. We can actually test two experimental settings in this space, male and female. In both settings, the learning component could, in principle, be controlled by some kind of reinforcement learning algorithm with a clever strategy exploration routine. I'd expect the male version to be more proactive since males are typically expected to initiate contact with females. One could start with the obvious reward function of maintaining message threads with other people and go from there. We even have some guidance on how we might craft a suitable strategy space.
A second idea, which takes the game bot idea to its logical extreme, is a bot in a sustained virtual world that tricks human users into thinking it is also human. For example, consider World of Warcraft (WoW), which has a sustained virtual universe that features many social components that we also find in the real world (such as a vibrant trading economy). Suppose we could design a bot player which can learn to operate autonomously in WoW and interact naturally with human players. Suppose the bot player could go on quests, team up with other players, even join a guild, all while under the pretense of being a human player. This is certainly an exciting setting to experiment in since the penalty of failure is limited (we can just create another bot character), and there already exists an enormous amount of data from usage logs of normal humans that we can use to bootstrap the learning algorithm.
In both the aforementioned scenarios, it should be noted that some kind of agreement with the service provider (e.g., OkCupid or Blizzard Entertainment) is required. Even if we ultimately weren't allowed to run live experiments, I'd imagine there is already sufficient data collected that can be used to generate interesting simulations. That data is just locked up behind corporate walls.
Now that the World Wide Web has enabled us to spend significant time in online environments, web-based variants of the Turing test are quickly becoming research goals with immediate or near-term practical interest. For example, when we send emails to customer service and receive coherent (and hopefully helpful) responses, we typically assume we're being serviced by a human agent. As we all know, living in a information economy drives up the price of labor. A significantly cheaper alternative would have computer programs that can automatically understand and resolve each customer's specific issues while also providing a pleasant service experience. There would imply no more bitchy emails from disgruntled human workers, much faster turn-around time on service requests, and overall increased efficiency in this sector of society.
Currently, we are still quite far from having programs that are of practical value. One way to generate interest and spur innovation is through competitions, very much like how the X PRIZE Foundation awards prize money for demonstrating various feats of technological prowess. While we should, of course, also pursue other avenues of research, I echo John Langford's sentiment that competitions give people who know how to do things a chance to distinguish themselves. In fact, there already exists a Turing test competition for computer game bots within the gaming community.
Regardless of whether we promote this research direction through competitions or other means, it would certainly be useful to have automated online services with rich interactive components (such as the aforementioned customer support service). What might be some other interesting Turing test variants that we can/should focus on?
One idea is some kind of social networking bot. For example, one might have a profile on Facebook, MySpace, or OkCupid that is controlled automatically by a computer program. Consider the OkCupid scenario, which is reminiscent of the great Mark V. Shaney experiment but taken one step further. We can actually test two experimental settings in this space, male and female. In both settings, the learning component could, in principle, be controlled by some kind of reinforcement learning algorithm with a clever strategy exploration routine. I'd expect the male version to be more proactive since males are typically expected to initiate contact with females. One could start with the obvious reward function of maintaining message threads with other people and go from there. We even have some guidance on how we might craft a suitable strategy space.
A second idea, which takes the game bot idea to its logical extreme, is a bot in a sustained virtual world that tricks human users into thinking it is also human. For example, consider World of Warcraft (WoW), which has a sustained virtual universe that features many social components that we also find in the real world (such as a vibrant trading economy). Suppose we could design a bot player which can learn to operate autonomously in WoW and interact naturally with human players. Suppose the bot player could go on quests, team up with other players, even join a guild, all while under the pretense of being a human player. This is certainly an exciting setting to experiment in since the penalty of failure is limited (we can just create another bot character), and there already exists an enormous amount of data from usage logs of normal humans that we can use to bootstrap the learning algorithm.
In both the aforementioned scenarios, it should be noted that some kind of agreement with the service provider (e.g., OkCupid or Blizzard Entertainment) is required. Even if we ultimately weren't allowed to run live experiments, I'd imagine there is already sufficient data collected that can be used to generate interesting simulations. That data is just locked up behind corporate walls.
Subscribe to:
Posts (Atom)
