Preparing your result...
Loading...
Press Esc to dismiss this message

Ranking algorism based on purity of documents (17-Dec-2009)

Thumbnail
IP.com Prior Art Database Disclosure (Source: IPCOM)
Disclosure Number IPCOM000191120D dated 17-Dec-2009
Originally published in Prior Art Database
Disclosed by: IBM
Country: Undisclosed
Disclosure File: 1 pages / 33.6 KB / English (United States)

Ranking text search results according to purity of documents calculated by clustering of document set.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 100% of the total text.

Page 1 of 1

Ranking algorism based on purity of documents

Ranking text search results according to purity of documents calculated by clustering of document set.

This article describes a technique of ranking text search results according to purity of documents calculated by clustering of document set.

1. Cluster text document set into subsets by using clustering algorism such as Latent Semantic Analysis/Indexing (LSA/LSI) or Latent Dirichlet allocation (LDA).

2. Represent every document by a vector in a vector space. In this vector space, each cluster is regarded as a base of space. Projection of a representing vector on a base is score of a document in a cluster.

3. Vectors of documents are stored in search index as metadata of document.
4. User inputs search keywords. This search keywords are regarded as a document and evaluate it representing vector.

5. Calculate an inner product of a vector of search keywords and a vactor of every document in index.. Present search results in the order of this inner product value. Documents with large inner product values are listed on top of search results

1

(Source: IPCOM)
First page image
(Source: IPCOM)