Introduction to RexStuff Content Grabber
The RexStuff Content Grabber can be used to extract content from HTML pages and reformat them for REX.MAMY.TO,
which can then be downloaded to Rex6000 using Intellisync for Rex.
There are three versions of the grabber:
- Simple Grabber is used to configure simple grabs. You specify the start and end sections
of the page you want to harvest from.
- List Grabber is a highly configurable grabber routine with lots of optional functionality. It is best used with
underlying pages that is in a "structured list" format. Many news pages are like this, they have a heading and a body with a brief story in it.
The grabber pulls apart the HTML and reformats it into an index card, with all the headings as link, and a series of data
cards.
- Complex Grabber is a highly configurable grabber routine designed to grab a list
which is hyperlinked to a series of other simple pages. For instance, a list of news items, with just brief descriptions
and hyperlinks to news pages. This grabber will reach out to the other pages (which must be consistently formatted)
and grab the stories from there. This grabber is not as flexible as List
Grabber, but like List Grabber it
pulls apart the HTML and reformats it into an index card, with all the headings as link, and a series of data
cards.
Grabber scripts written by
blackmanx
Home