HTML 2 RSS (web page scraping)

Have you sometimes had the situation where the web page that you really like to read with Feedreader does not have RSS feed. I have been in this situation quite a lot of times. So we did a little bit of hacking and released a tool called HTML2RSS to public.

So what does this tool do? Basically it's a little webserver that you can access from URL http://localhost:8182. If you open this location then you will see a brief description of usage. I will copy the example here and explain it :

http://localhost:8182/?serverurl=http://bbc.co.uk&feedtitle=BBC%20scraped%20feed&linkfilter=news.bbc.co.uk&encoding=gb2312

  • serverurl : url for target webpage (without RSS feed)
  • feedtitle : title for feed that you are creating (if you omit this variable then feed title will be webpage url)
  • linkfilter : return only links that contain string provided by linkfilter variable (if you omit this variable then all links are taken from webpage).
  • encoding : output encoding (must be the same as encoding of webpage). If you omit this variable then tool tries to get encoding itself but it does not work every time.

So this is it. Just let the application run and add this example link to Feedreader. News will come in :).

HINT : This tool works with other RSS readers, too :).

DISCLAIMER : This is not official product. This is just tool that we are experimenting with. We will hopefully develop it a little bit further (tray option) but development priority can be hectic :).


Download
HTML2RSS can be downloaded from here.
 
Newer version of HTML2RSS running as windows service can be downloaded from here. Just install it with command line "html2rss_service.exe /install" and then start from services control panel.

Relevant content from Feedreader Observe