Tuesday, April 12, 2011

Paper Reading #22, " DocuBrowse"

http://isthishci.blogspot.com/2011/04/paper-reading-19-from-documents-to.html
http://pfrithcsce436.blogspot.com/2011/04/paper-reading-21-supporting-exploratory.html#comments

DocuBrowse: Faceted Searching, Browsing, and Recommendations in an Enterprise Context

Andreas Girgensohn, Francine Chen, and Lynn Wilcox, FX Palo Alto Laboratory
Frank Shipman, TAMU Computer Science
Presented at IUI’10, February 7-10, 2010, Hong Kong, China

Summary
This paper describes DocuBrowse, a system to allow "easy and intuitive" enterprise searches. Enterprise searches are searches for documents inside a given organization, lacking the interlinks that make internet searching in the modern sense so effective.

The biggest advantage to DocuBrowse as a document organization system is that files can be in more than onre directory; instead of having to find the one specific location you need you can come in from any angle. Other features the authors attempted to implement include being able to see an entire tree in one query, retaining structure in results (rather than just a Google-esque list, see image), and a genre detector telling us what type of document it is. The last is accomplished based on estimation from images via a system known as GenIE (Genre Identification and Estimation) and presumably applies to scanned documents, since in the base you can tell a .rtf from a .doc, etc, by the file extension.

In determining relevance of documents to individuals, organizational structure and job class replace their access history. That is, instead of being pointed towards documents they have seen before they pointed towards documents that are relevant to their position or that others with the same or equivalent job titles have accessed.

The authors state their next intended move in this research is testing with some kind of large organization, as they have only conducted in-house tests so far.




Discussion
This paper is significant because it would be difficult to imagine trying to find critical information without access to a modern search engine. Expanding that as widely as possible is an important goal.

The biggest weakness of the paper was a failure to note how well it worked in in-house testing or better explain GenIE. The biggest strength was good diagrams.

The future research proposed seems solid, although they might also consider an auto-keywording system if they don't already have one. Currently the implication seems to be it is all manual.

1 comment:

  1. I also read this paper and I definitely think they are heading in the right direction. But I agree, I think they need to mention more about the productivity of a real in-house test.

    ReplyDelete