#30 ✓resolvedandnewticket
S. Shehzed

captcha image not getting scraped from App1-Comp1-Site#2.

Reported by S. Shehzed | November 28th, 2010 @ 05:42 PM | in v0.20

see title.

Comments and changes to this ticket

  • S. Shehzed

    S. Shehzed November 28th, 2010 @ 11:45 PM

    Test code below works. If this works then why doesn't the script's way work? It uses the same exact pattern.

    <?php
    
    $str = file_get_contents('http://www.wordorigins.org/index.php/forums/member/register/');
    
    $pattern = '(<img[a-z\s]+src=")((http://)?(www.)?([a-z0-9-]+)([\.a-z]+/)(images/captchas/)([a-z0-9\.]+)(\.jpg))';
    $pattern = '#'.$pattern.'#i';
    
    $result = preg_match($pattern, $str, $match);
    
    var_dump($result);
    var_dump($match);
    
    ?>
    
  • S. Shehzed

    S. Shehzed November 29th, 2010 @ 03:41 AM

    • State changed from “new” to “resolvedandnewticket”
    • Assigned user set to “S. Shehzed”

    RESOLVED: scraping with isearch gets weird source code. Changes code from how IE would view source. So back to getting page source by saving page as htm file on computer and getting file contents.

    This creates a new ticket problem at #32.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

GOT COLD BEER! POUR BEER IN FROSTY MUG! DRINK BEER! REPEAT! HOPE COOKS SOURCE NO STEAL DRUNK HULK RECIPE!

Shared Ticket Bins

People watching this ticket

Referenced by

Pages