Monday, April 30, 2007

Improving the blog search interface

Anyone that has ever done regular research using blogs is likely to be aware of how time-consuming and ultimately frustrating the task can be using existing tools.

Some additional assistance from automation would undoubtedly be welcome, but as I pointed out last month, the most assertive methodologies on offer tend to be founded on somewhat misdirected notions of influence and intelligence.

One of the pretentious books I keep close to my desk is a volume entitled Doing Internet Research from 1999, and I have only just realised that it has an interesting-looking − though as yet unread − chapter on Studying On-line Social Networks. I shall have to read this and report back, but in the meantime here are a few suggestions both for blogging and blog-browsing interfaces, which I think would make life a whole lot easier for communications professionals.

1) Print blog post (with or without comments). Scott Adams's blog posts regularly have several hundred comments, making each use of the browser's in-built print functionality an act of deforestation. Fixing this quickly surely can't be beyond Google/

2) Writers and readers can tag individual posts with keywords and phrases. The information-seeker should therefore be able to use either or both of these to conduct their search, independently of other indexation criteria/algorithms.

3) It might also be handy to be able to search comments independently.

4) Simple built-in tools for scoring relevance and favourability would allow the researcher to structure and categorise the information from the moment they choose to keep it.

5) Writers should also be able to add a description tag to their posts rather like the meta-tag equivalent on non-blog Web pages.

6) Oh how useful it would be to conduct a search that would return the most linked/cited/commented posts relevant to a particular keyword string or set of tags.

There is a pervasive assumption that the basic unit of influence within the blogosphere is the blogger. Now I've pointed out before that addressing a connected medium as the blogosphere in terms of units of anything is a flawed approach borrowed from other spheres of investigation, but if you really had to identify the basic underlying element of blog-influence it would usually be more reasonable to suggest that it is the post itself rather than the individual that posted it.

The very nature of search means that the searcher must have some nodal unit in mind when their query is first constructed. This could be blogger, blog-post, commenter, comment, topic, tag etc. In many cases however, the searcher will want to quickly move on to re-construct ( and perhaps also visualise) the networked relationships from which the impact of this form of communication ultimately derives. These can be either explicit (links, comments) or implicit (unlinked citation, blogrolls, tags, topics etc.).

The functionality described in 6) above would be all the more valuable if the searches could be saved and automated so that new additions to each blog mini-sphere could be picked up and assimilated.

One final suggestion before I throw this one open. In same way that you can use Google to search and monitor an individual website, you should already be able to use Google blogsearch to carve up an individual blog any which way you want. For example..
  • Show me all posts matching a given tag or key phrase query
  • Now show me which ones have the most out-links/in-links
  • Now show me all posts matching the same query on other blogs this one is linked to (either in the text, the comments or via the blogroll).
and so on.

Update: Technorati has gone some way towards the ideal by allowing the searcher to click to a page displaying the full set of links referencing each post returned by the original query.

But this sort of interface is still like squirelling around a tree: you can run out along an interesting branch, but you have to come back again to the main trunk in order to access the other branches. When pressed, a squirrel in a hurry can jump between branches that are close enough for such acrobatics, but the Technorati-squirrel can't do this because it literally can't see the other nearby branches in information space.

No comments: