captcha image not getting scraped from App1-Comp1-Site#2.
Reported by S. Shehzed | November 28th, 2010 @ 05:42 PM | in v0.20
see title.
Comments and changes to this ticket
-
S. Shehzed November 28th, 2010 @ 11:45 PM
Test code below works. If this works then why doesn't the script's way work? It uses the same exact pattern.
<?php $str = file_get_contents('http://www.wordorigins.org/index.php/forums/member/register/'); $pattern = '(<img[a-z\s]+src=")((http://)?(www.)?([a-z0-9-]+)([\.a-z]+/)(images/captchas/)([a-z0-9\.]+)(\.jpg))'; $pattern = '#'.$pattern.'#i'; $result = preg_match($pattern, $str, $match); var_dump($result); var_dump($match); ?>
-
S. Shehzed November 29th, 2010 @ 03:41 AM
- State changed from new to resolvedandnewticket
- Assigned user set to S. Shehzed
RESOLVED: scraping with isearch gets weird source code. Changes code from how IE would view source. So back to getting page source by saving page as htm file on computer and getting file contents.
This creates a new ticket problem at #32.
Please Sign in or create a free account to add a new ticket.
With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
GOT COLD BEER! POUR BEER IN FROSTY MUG! DRINK BEER! REPEAT! HOPE COOKS SOURCE NO STEAL DRUNK HULK RECIPE!
People watching this ticket
Referenced by
- 32 captcha image not appearing in in_memory after ticket #30 got fixed this problem is still around.