HWI-EAS102_3:6:1:897:791:AATGTCAATCTGAGTTGCGATCGACCCTAGAAGTTT:40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14
i have a little script that puts the numbers (always 36 of them to go with the 36 ACTG bases) that are separated by spaces at the end of the string into a scalar
#!/usr/bin/perl
open(SEQS,'<10scarflines');
open (OUT, '>qlines');
$qval = 0;
while (<SEQS>)
{
$read = $_;
$value = $read;
$value =~ s/^HWI-EAS102_\d+:\d+:\d+:\d+:\d+:\w+://;
print OUT "$value\n$read";
}
so for the sequence read:
HWI-EAS102_3:6:1:897:791:AATGTCAATCTGAGTTGCGATCGACCCTAGAAGTTT:40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14
$value is
40 40 40 40 40 40 22 23 40 40 40 31 15 25 24 27 29 22 23 5 23 11 21 7 10 11 12 16 -0 6 1 -3 9 -0 14 14
I would like to be able to manipulate those numbers ... like take the average of the 36 ... or determine if there are negative numbers and how many ... how do i deal with them in perl?
(Ian, im using your hash script to great effect ... thx again)
RudyS
[edited by: phranque at 11:29 pm (utc) on Oct. 26, 2008]
[edit reason] disabled smileys ;) [/edit]
#!/usr/bin/perl
# to quiet warnings.
$tot=$tot_nums=$min=$max=$n=0;
open(SEQS,"test-dna.txt") or die("Cannot read file $!");
# chomp and store while reading
while (chomp($read=<SEQS>)) {
## split on the :, all we need are the numbers.
@raw = split(/:/,$read);
# pop returns the last item in the array, your nums
$nums = pop(@raw);
# split on spaces
@nums = split(/\s+/,$nums);
$tot_nums = $#nums+1; ## arrays are zero-based
# do the dirty work
foreach $n (@nums) {
$min = ($n<$min)?$n:$min;
$max = ($n>$max)?$n:$max;
$tot+=$n;
}
print "Total numbers: $tot_nums.\n";
print "Min: $min Max: $max\n";
## don't divide by 0.
if ($tot > 0) { print "average: " . $tot/$tot_nums . "\n"; }
}
close(SEQS);
Add the -w switch, I get an uninitialized variable warning on line 7, couldn't get rid of it. :-P
[edited by: phranque at 11:29 pm (utc) on Oct. 26, 2008]
[edit reason] disabled smileys ;) [/edit]
just had to set $tot = 0 above the foreach so that i get the average for the numbers in each line rather than a running average ... also you helped me understand a little about arrays ... now at least i understand the power of the split function ... and since i will be running this on millions of reads it better be efficient ... works fine on 10 lines ... will give you a report on efficiency when i run it on the server ...
and thanks phranque for the advice about the cpan statistics modules ... i think even Ian will agree that you dont want to be writing your own code for standard statistics? hehe
rudyS