Boost logo

Ublas :

Subject: Re: [ublas] sparse vector usage example
From: Jose (jmalv04_at_[hidden])
Date: 2008-12-13 11:26:17


Hi,

Cosine similarity is a measure of similarity between two vectors by
finding the cosine of the angle between them.

http://en.wikipedia.org/wiki/Cosine_similarity
http://en.wikipedia.org/wiki/Tf-idf

I am interested in using ublas for information retrieval and I assume
some people must already have tackled this problem (with ublas).

This requires sparse vectors, e.g. 10 million components, where a
typical doc may have 1000 words (=sparse vector with 1000 components)

So the real need is to compute the dot product (cosine similarity) for
a large number of sparse vectors with the query vector and rank them
(using Tf-idf - see wikipedia link above)

I've quickly looked at your page and I am planning to study everything
in it but I wanted to ask the list before reinventing the wheel.

regards
jose

On Sat, Dec 13, 2008 at 3:32 PM, Gunter Winkler <guwi17_at_[hidden]> wrote:
> Am Freitag, 12. Dezember 2008 20:03 schrieb Jose:
>> Hi,
>>
>> Is there an example of computing the cosine similarity of two sparse
>> vectors ?
>
> What is cosine similarity? I have never seen such thing in uBLAS.
>
> mfg
> Gunter
>
> PS: There are some other examples on my page: http://www.guwi17.de/ublas
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
>
>