My thesis research was dedicated to web content extraction algorithms where I conducted a comprehensive study of existing solutions in this sparse field.
- text extraction (5)
Tag Archives: text extraction
UPDATE 11/6/2011: Added the summary and the results table Lately I’ve been working on evaluating and comparing algorithms, capable of extractinguseful content from arbitrary html documents. Before continuing I encourage you to pass trough some of my previous posts, just to … Continue reading
In one of my previous posts I compiled quite a decent list of software (and other resources) all capable of extracting article content from an arbitrary HTML document. While I was gathering all the relevant papers and software I kept … Continue reading
In my two previous posts (both were issued on hacker news, ReadWriteWeb and O’Reilly Radar) I’ve covered quite a decent array of various text extraction methods and related software. So before reading this one I encourage you to read them to get … Continue reading
UPDATE 21/3/2011: Added reader contributed links to software and API section Following up to my overview of article text extractors, I’ll try to compile a list of research papers, articles, web APIs, libraries and other software that I encountered during … Continue reading