Library: boilerpipe boilerpipe provides algorithms to detect and remove the surplus "clutter" (boilerplate...

Please Visit: http://ift.tt/1ajReyV



Library: boilerpipe

boilerpipe provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page.

Can be used with Nutch and Solr when crawl web page.

Just call http://ift.tt/1haNgjf to highlight the main content of an arbitrary URL.

http://ift.tt/106wMPw

http://ift.tt/1gX4FwZ





from Public RSS-Feed of Jeffery yuan. Created with the PIXELMECHANICS 'GPlusRSS-Webtool' at http://gplusrss.com http://ift.tt/1haNgjk

via LifeLong Community

No comments:

Post a Comment