yesterday's technology tomorrow
Saturday, October 04, 2003Search engines, information retrieval and recommendations
Since 1996, I've been working, to various degrees, on recommendation systems; first at Accenture's research labs with the team who create Bargain Finder and since then at AgentArts working on Music and other entertainment recommendation systems. Most recently I built Blog Change Bot which is a little IM Bot which sends you a message when a site you are interested in is updated. Aside - The AOL network has troubles with large lists of buddies (in my case subscribers) so a rewrite of the app is ahead of me this weekend.
In 1998 I attended a conference on Intelligent Agents and did a workshop on Information Retrieval and related topics. Its been interesting how this theme has reappeared consistently over the years. Last week we got called by a company interested in us doing Movie Recommendations for them. This got me thinking about how we could create a database of movie recommendations. I remembered an available database of around 80,000 movie reviews and plot overviews. Traditionally, we have built recommendation databases by data mining people's usage patterns e.g songs downloaded, search terms, movies hired etc. but for this client none of this data was available. So I wondered if we could take the reviews and plots and automatically extract the most distinctive words from the text and match this against other movie's reviews and plots to generate a measure of similarity. So for the last 3 days I've been beaverishly coding what is basically a fully fledged search engine that processes and parses 80,000 XML files, creates and inverted index of them and then matches a query to specific movies. Next week with a few tweaks, we should have a comprehensive database of relationships. As an example, I searched for "nuclear bomb comedy", thinking of Dr Strangelove, and got Space Cowboys which was very accurate.
I'm pretty pleased with the effort so far and the results...not bad for 3 days work. I'll have a demo available in a week or so and point people to it then and hopefully it won't be long before you wander into the local video/dvd store and no longer have to hunt around to find something that you might like.
Random Pile of Ben
ben at neuronwave dot com