Matt's Blog: MinHash for dummies 1. Break down the document a set of shingles. 2. Calculate the hash...

Please Visit: http://ift.tt/1ajReyV

Matt's Blog: MinHash for dummies
1. Break down the document a set of shingles.
2. Calculate the hash value for every shingle.
3. Store the minimum hash value found in step 2.
4. Repeat steps 2 and 3 with different hash algorithms 199 more times to get a total of 200 min hash values.
you XOR the value returned by String.hashCode() with 199 random numbers to generate the 199 other hash code values. Just make sure that you are using the same 199 random numbers across all the documents.

LSH allows you to precompute a hash code that is then quickly and easily compared to another precomputed LSH hash code to determine if two objects should be compared in more detail or quickly discarded.

Any documents that share rows in any bands should be compared for their similarity. 
http://ift.tt/1pPyGyl
http://ift.tt/1Pkl2p4
http://ift.tt/OCbW2A
http://ift.tt/1Pkl2p6


from Public RSS-Feed of Jeffery yuan. Created with the PIXELMECHANICS 'GPlusRSS-Webtool' at http://gplusrss.com http://ift.tt/1KLiaej
via LifeLong Community

No comments:

Post a Comment