HTML 2 RSS (web page scraping)


Have you sometimes had the situation where the web page that you really like to read with Feedreader does not have RSS feed. I have been in this situation quite a lot of times. So we did a little bit of hacking and released a tool called HTML2RSS to public.

So what does this tool do? Basically it's a little webserver that you can access from URL http://localhost:8182. If you open this location then you will see a brief description of usage. I will copy the example here and explain it :

http://localhost:8182/?serverurl=http://bbc.co.uk&feedtitle=BBC%20scraped%20feed&linkfilter=news.bbc.co.uk&encoding=gb2312

  • serverurl : url for target webpage (without RSS feed)
  • feedtitle : title for feed that you are creating (if you omit this variable then feed title will be webpage url)
  • linkfilter : return only links that contain string provided by linkfilter variable (if you omit this variable then all links are taken from webpage).
  • encoding : output encoding (must be the same as encoding of webpage). If you omit this variable then tool tries to get encoding itself but it does not work every time.

So this is it. Just let the application run and add this example link to Feedreader. News will come in :).

HINT : This tool works with other RSS readers, too :).

DISCLAIMER : This is not official product. This is just tool that we are experimenting with. We will hopefully develop it a little bit further (tray option) but development priority can be hectic :).


Download
HTML2RSS can be downloaded from here.
 
Newer version of HTML2RSS running as windows service can be downloaded from here. Just install it with command line "html2rss_service.exe /install" and then start from services control panel.

Links are working again.

Links are working again. Feel free to download!

Link to HTML2RSS does not work

Hi,

Can you please provide the link to HTML2RSS converter ?

Thanks

pages that requires login confirmation

I was wondering is there a way to grab info from pages that requires login confirmation.

Thanks for your time and answer.

cool

cool

I got it to work but have a question

I tried to scrape a local paper I don't feel like leafing through.

Here is my script

http://localhost:8182/?serverurl=http://news.mywebpal.com/index.cfm?pnpid=573&feedtitle=Howard County Times%20scraped%20feed&linkfilter=local&encoding

The linkfilter variable narrows the scope to local news only. I'm not interested in their other articles.

I wonder why I couldn't just use "http://news.mywebpal.com" as the serverurl variable. Will the other stuff (index.cfm?pnpid=57) be different each day?

musicaplay

solo musica

Service doesn't load

When I press the EXE file, I see the service load for about a second then it disappears from the services list.

What do I do?

Question and answer...

Q: How to add webpages that have & symbol inside link.
A: Replace & symbol with %26.