Perl is probably the way to go. I did my last few screen scrapes in PHP, but the grunt of the code is all preg_replace (which is the perl regular expression engine).
Don't try to parse too much, because then even tiny changes to random stuff on the site will mess up your entire scrape. So just go for what you need.
I used an array of possible expressions to find what I needed, and would try them in my order of how confident I was that this was the right expression, because sometimes, different pages have slightly different formatting for no particular reason.
|