Forum Moderators: coopster & phranque

Message Too Old, No Replies

Email attachment stripper

         

jehoshua

2:58 am on Feb 26, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have been doing lots of research into using the Email::MIME::Attachment::Stripper module, but cannot get it to work. Here is the script:

#!/usr/bin/env perl
#
use strict;
use warnings;
use Data::Dumper;

use File::Slurp qw( read_file );
use Email::Address;
use Email::MIME::Attachment::Stripper;
use File::Slurp qw(slurp write_file);

my $infile = '/home/*****/Mail/.family.directory/1277709595.16641.pRLSb:2,S';
my $outfile = 'output4.txt';

my $intext = File::Slurp::read_file( $infile );

my $parser = Email::MIME->new($intext);
my @parts = $parser->parts;
my $stripper = Email::MIME::Attachment::Stripper->new($parser);

my $message = $stripper->message;
print $message;

foreach my $attachment ( $stripper->attachments ) {
write_file( $attachment->{filename},
{ buf_ref => \$attachment->{payload} } )
or die "Can't write $attachment->{filename}: $!\n";
}


This is what is printed

Email::MIME=HASH(0x1c3a418)Email::MIME=HASH(0x1d24cb8)


and the hash values are different each time ? The test email I'm using has 24 photos as attachments, and the code above successfully writes all the 24 attachments to disk.

However, it is not the attachments I want. I want the headers and the plain text/html part, the "first" part. After I get the headers and plain text part, I then want to use Email::Address to extract the name/email address from the file.

What am I doing wrong ?

explorador

10:55 pm on Mar 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So you want the first part of the mail, ok, the problem (as far as I can remember) is some mails present diff beginnings, I was doing something similar for a ticket like system, I needed to read the emails and extract the text regardless of the email format, it was tricky, there is a thread about it around here where Brett (as I remember) posted a lot of help as some others. Gosh I deleted my source code so I have no way to get an example but I'm sure the thread is around here...

explorador

10:56 pm on Mar 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hey there, check this out, hope it helps you:

[webmasterworld.com...]

jehoshua

1:17 am on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks for your replies. The reason I wanted to strip the attachments (selectively) was to be able to then parse through 'text/plain' and 'text/html' attachments, in addition to parsing through the headers. Here is the solution ...

#!/usr/bin/perl

#
# Name: multipart-mo.pl
#
# Extract names and email addresses from a maildir folder
#
# - Also displays the number of messages (files) found and the number of parts in each message
# - Displays structure of each message and then outputs the names and email addresses
# - Where the name/email address is found in the 'body' part of a message (i.e NOT in From:, To:, Cc: etc,etc),
# then usually only the email address will be returmed
#
# This script has been adapted from multipart.pl , which is part of the Mail::Box module , vers 2.118,
# by Mark Overmeer (http://http://search.cpan.org/dist/Mail-Box/ )
#
# This code can be used and modified without restriction.
#
# Usage: perl multipart-mo.pl Smith\,\ Bill\ \&\ Nancy/ (maildir folder name)
#

use warnings;
use strict;
use lib '..', '.';

use Mail::Box::Manager;

sub emails_from_body($);

#
# Get the command line arguments.
#

die "Usage: $0 folderfile\n"
unless @ARGV==1;

my $foldername = shift @ARGV;

#
# Open the folder
#

my $mgr = Mail::Box::Manager->new;

my $folder = $mgr->open($foldername, access => 'r')
or die "Cannot open $foldername: $!\n";

#
# List all messages in this folder.
#

print "Mail folder $foldername contains ", $folder->nrMessages, " messages:\n";

my %emails;

foreach my $message ($folder->messages)
{
my @parts = ($message, $message->parts('RECURSE'));
print $message->seqnr, ' has '.@parts." parts\n";
$message->printStructure;

foreach my $part (@parts)
{
foreach my $fieldname (qw/To Cc Bcc From Reply-To Sender/)
{ my $field = $part->study($fieldname) or next;
$emails{$_}++ for $field->addresses;
}

my $ct = $part->contentType || 'text/plain';
$ct eq 'text/plain' || $ct eq 'text/html' or next;

$emails{$_}++ for emails_from_body $part->body->decoded;
}

}

print "$_\n" for sort keys %emails;
$folder->close;
exit 0;


### HELPERS

sub emails_from_body($)
{
$_[0] =~ /([-\w.]+@([a-z0-9][a-z-0-9]+\.)+[a-z]{2,})/gi;
}