Università di Pisa
Sistema bibliotecario di ateneo

A personalized search engine based on web-snippet hierarchical clustering

Ferragina, Paolo and Gullì, Antonio (2004) A personalized search engine based on web-snippet hierarchical clustering. Technical Report del Dipartimento di Informatica . Università di Pisa, Pisa, IT.

[img] Postscript (GZip) - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (944Kb)


    Search engines provide the view of the Web, and their smart ranking algorithms are their point of view. To offer the best view, personalized ranking algorithms are currently flourishing. They focus on the users rather than on their submitted queries, by taking into account some contextual/profiled information. In this paper we propose a personalized (meta-)search engine based on the web-snippet hierarchical clustering technology (a la Vivisimo) that is fully adaptive and non intrusive both for the user and for the queried search engine(s). It works on the top of 16 commodity search engines and fetches 200 (or more) results from them per user query. Our engine is able to mine on-the-fly the fine and variegate ''themes'' behind these results and then organize them in a hierarchy of folders that offers, at various levels of details, an up-to-date picture of these results. Users can therefore browse the hierarchy, select the themes that best match the ``intention'' behind their query, and ask our engine to personalize on-the-fly those query results according to their choices. In this way lazy users are not limited to look at first ten results, but immediately acquire several points of view on a larger pool (about 200) of them! We claim that it does exist a mutual reinforcement relationship between ranking and web-snippet clustering from which both of them may benefit. Our extensive experiments show that this form of personalization is very effective in informative queries, polysemous queries, and poor queries consisting of at most two terms (more than 80% of the Web queries are of this type!). In these cases, in fact, one theme might be so web-popular to unfortunately monopolize the top-ten results of link-based ranking algorithms.

    Item Type: Book
    Uncontrolled Keywords: Web Snippets Clustering, Search Engines, Information Extraction, New Search Applications and Interfaces, Personalized Web Ranking
    Subjects: Area01 - Scienze matematiche e informatiche > INF/01 - Informatica
    Divisions: Dipartimenti (until 2012) > DIPARTIMENTO DI INFORMATICA
    Depositing User: dott.ssa Sandra Faita
    Date Deposited: 09 Dec 2014 10:55
    Last Modified: 09 Dec 2014 10:55
    URI: http://eprints.adm.unipi.it/id/eprint/2125

    Repository staff only actions

    View Item