For example my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA"
should produce the substrings below and stored in an array
@whatever =("ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA",
"GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTGGTTGGAAATAA",
"ATTGGTTGGAAATAA");
I have this as part of my entire code as my best effort:
while ($seq =~ m/ATG¦TTG¦CTG¦ATT¦CTA¦GTG¦ATT/gi){
my $matchPosition = pos($seq) - 3;
if (($matchPosition % 3) == 0) {
push (@startsRF1, $matchPosition);
}
while ($seq =~ m/TAG¦TAA¦TGA/gi){
my $matchPosition = pos($seq);
if (($matchPosition % 3) == 0) {
push (@stopsRF1, $matchPosition);
}
my $codonRange = "";
my $startPosition = 0;
my $stopPosition = 0;
@startsRF1 = reverse(@startsRF1);
@stopsRF1 = reverse(@stopsRF1);
while (scalar(@startsRF1) > 0) {
$codonRange = "";
$startPosition = pop(@startsRF1);
if ($startPosition < $stopPosition) {
next;
}
my $ORFseq = "";
while (scalar(@stopsRF1) > 0) {
$stopPosition = pop(@stopsRF1);
if ($stopPosition > $startPosition) {
my $difF = $stopPosition - $startPosition;
$ORFseq = substr($seq, $startPosition,(length($seq)-(length($seq)-$difF)));
push (@arrayOfORFs, $ORFseq);
}
I'm not sure if I got your idea completely, because my try reaches different results then you give in your post.
use strict;
my $string = "ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA";
my @array = ();
my @starts = qw(ATG TTG CTG ATT CTA GTG);
my @stops = qw(TAG TAA TGA);
for my $start (@starts)
{
for my $stop (@stops)
{
while($string =~ m/$start(.*)$stop/g)
{
push @array, $start . $1 . $stop;
}}
}
print join("\n", @array);
results in
ATGAAAGTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
ATGAAAGTGAAAGGGAAAGGGGTGA
TTGGGTATTGGTTGGAAATAA
ATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGAGTGGGGGCGGGTTGGGTATTGGTTGGAAATAA
GTGAAAGGGAAAGGGGTGA
but maybe I got something wrong, I've never been into the Bio-Stuff.
If that's not what you needed, please elaborate for a guy who knows he should have DNA somewhere in his body but not much more than that.
Also, did you check the modules available at cpan? I hear there are quite a few for dealing with DNA. Maybe one of these can do the job much cleaner: [search.cpan.org...]
I have a sting of letters and would like to use regex to check the availabilty of these letters in a text.
bbbb either cg or gc or cc or gg then followed by a t. So the regex should match any of 4 possiblities like either:
bbbbcgt or bbbbgct or bbbbcct or bbbbggt.
Regards,
Emmanuel
in your case, /bbbb(cg¦gc¦cc¦gg)t/ would work and, if the string is bbbbcgt, $1 would contain cg. the ¦ in the parentheses tells the regexp-machine that any one of those strings can match at this position.
if you don't need to know which of the four possibilites matched, you could also say /bbbb(?:cg¦gc¦cc¦gg)t/ to indicate that you just want to group them, not save them.
you should learn the basics of regular expressions:
[perldoc.perl.org...]
the knowlege is essential to perl and can be transferred to many other disciplines.