Hinnerk Altenburg

Web Developer in Hamburg, Germany

Archive for the ‘GitHub’ tag

Perl: Handle malformed UTF-8 strings with Encode::encode

without comments

Having the error message “Malformed UTF-8 character (fatal)” in my log files, I tried to handle this properly without letting the process die nor throwing away the whole string.
Having some research on Google I came up with following solution:

sub encode_utf_8 {
    my $string = @_;

    my $utf8_encoded = '';
    eval {
        $utf8_encoded = Encode::encode('UTF-8', $string, Encode::FB_CROAK);
    };
    if ($@) {
        # sanitize malformed UTF-8
        $utf8_encoded = '';
        my @chars = split(//, $string);
        foreach my $char (@chars) {
            my $utf_8_char = eval { Encode::encode('UTF-8', $char, Encode::FB_CROAK) }
                or next;
            $utf8_encoded .= $utf_8_char;
        }
    }
    return $utf8_encoded;
}

See also:
http://perldoc.perl.org/Encode.html#Handling-Malformed-Data
http://www.perlmonks.org/?node_id=839519

Written by Hinnerk

August 31st, 2011 at 4:58 pm

Posted in English

Tagged with , , , , ,

Set a custom HTTP User-Agent in Perl with WWW::Mechanize

with 2 comments

This is how you can dynamically set a custom HTTP User-Agent for your Perl requests to fake a device or browser for testing purpose or getting a device-specific version of a website.
WWW::Mechanize supports setting a custom user-agent with the constructor and after this gives a choice of 6 pre-defined basic user-agents ( $mech->agent_alias() ), only.

The following code demonstrates how to dynamically change the user-agent on a Mechanize object.

use WWW::Mechanize;

my $initial_user_agent = 'Mozilla/5.0 (Linux; U; Android 2.2; de-de; HTC Desire HD 1.18.161.2 Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1';
my @user_agents = (
    'Mozilla/5.0 (Windows; U; Windows NT 6.1; nl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
    'Mozilla/5.0 (iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7D11',
    'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5',
);

# Set an initial custom header with the contructor
my $mech = WWW::Mechanize->new( agent => $initial_user_agent );

# get a page and print current URI (WWW::Mechanize follows redirections)
$mech->get( 'http://www.facebook.com' );
print sprintf( "User-Agent %s\n redirects to: %s\n\n", $initial_user_agent, $mech->uri() );

foreach my $http_user_agent (@user_agents) {
    # dynamically set custom HTTP User-agents
    $mech->add_header( 'User-agent' => $http_user_agent);

    $mech->get( 'http://www.facebook.com' );
    print sprintf( "User-Agent %s\n redirects to: %s\n\n", $http_user_agent, $mech->uri() );
}

# $ perl ./mechanize-user-agent.pl
# User-Agent Mozilla/5.0 (Linux; U; Android 2.2; de-de; HTC Desire HD 1.18.161.2 Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
# redirects to: http://m.facebook.com/?w2m&refsrc=http%3A%2F%2Fwww.facebook.com%2F&_rdr
#
# User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; nl; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13
# redirects to: http://www.facebook.com
#
# User-Agent Mozilla/5.0 (iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7D11
# redirects to: http://m.facebook.com/?w2m&refsrc=http%3A%2F%2Fwww.facebook.com%2F&_rdr
#
# User-Agent Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5
# redirects to: http://m.facebook.com/?w2m&refsrc=http%3A%2F%2Fwww.facebook.com%2F&_rdr

Written by Hinnerk

Juni 29th, 2011 at 9:21 pm

Posted in English

Tagged with , , , , ,