Forum Moderators: coopster & phranque

Message Too Old, No Replies

adding numbers in a string

need help with a perl script

         

RudyS

7:18 pm on Oct 25, 2008 (gmt 0)

10+ Year Member



hi guys ... i need a way to determine the quality of reads from a DNA sequencing machine ... it generates lines like this:

HWI-EAS102_3:6:1:897:791:AATGTCAATCTGAGTTGCGATCGACCCTAGAAGTTT:40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14

i have a little script that puts the numbers (always 36 of them to go with the 36 ACTG bases) that are separated by spaces at the end of the string into a scalar

#!/usr/bin/perl
open(SEQS,'<10scarflines');
open (OUT, '>qlines');
$qval = 0;
while (<SEQS>)
{
$read = $_;
$value = $read;
$value =~ s/^HWI-EAS102_\d+:\d+:\d+:\d+:\d+:\w+://;
print OUT "$value\n$read";
}

so for the sequence read:

HWI-EAS102_3:6:1:897:791:AATGTCAATCTGAGTTGCGATCGACCCTAGAAGTTT:40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14

$value is

40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14

I would like to be able to manipulate those numbers ... like take the average of the 36 ... or determine if there are negative numbers and how many ... how do i deal with them in perl?

(Ian, im using your hash script to great effect ... thx again)

RudyS

[edited by: phranque at 11:29 pm (utc) on Oct. 26, 2008]
[edit reason] disabled smileys ;) [/edit]

rocknbil

5:27 pm on Oct 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd do it a little differently.

#!/usr/bin/perl
# to quiet warnings.
$tot=$tot_nums=$min=$max=$n=0;
open(SEQS,"test-dna.txt") or die("Cannot read file $!");
# chomp and store while reading
while (chomp($read=<SEQS>)) {
## split on the :, all we need are the numbers.
@raw = split(/:/,$read);
# pop returns the last item in the array, your nums
$nums = pop(@raw);
# split on spaces
@nums = split(/\s+/,$nums);
$tot_nums = $#nums+1; ## arrays are zero-based
# do the dirty work
foreach $n (@nums) {
$min = ($n<$min)?$n:$min;
$max = ($n>$max)?$n:$max;
$tot+=$n;
}
print "Total numbers: $tot_nums.\n";
print "Min: $min Max: $max\n";
## don't divide by 0.
if ($tot > 0) { print "average: " . $tot/$tot_nums . "\n"; }
}
close(SEQS);

Add the -w switch, I get an uninitialized variable warning on line 7, couldn't get rid of it. :-P

[edited by: phranque at 11:29 pm (utc) on Oct. 26, 2008]
[edit reason] disabled smileys ;) [/edit]

phranque

11:37 pm on Oct 26, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



if you want to get into more exotic statistics, you should peruse the list of statistics-related perl modules on cpan:
[search.cpan.org...]
i would suggest starting with the Statistics::Descriptive module [search.cpan.org].

RudyS

5:09 pm on Oct 27, 2008 (gmt 0)

10+ Year Member



nice rocknbil

just had to set $tot = 0 above the foreach so that i get the average for the numbers in each line rather than a running average ... also you helped me understand a little about arrays ... now at least i understand the power of the split function ... and since i will be running this on millions of reads it better be efficient ... works fine on 10 lines ... will give you a report on efficiency when i run it on the server ...
and thanks phranque for the advice about the cpan statistics modules ... i think even Ian will agree that you dont want to be writing your own code for standard statistics? hehe

rudyS