千家信息网

perl如何提取miRNA前后500bp序列

发表于:2024-10-17 作者:千家信息网编辑
千家信息网最后更新 2024年10月17日,小编给大家分享一下perl如何提取miRNA前后500bp序列,相信大部分人都还不怎么了解,因此分享这篇文章给大家参考一下,希望大家阅读完这篇文章后大有收获,下面让我们一起去了解一下吧!在没有位置信息
千家信息网最后更新 2024年10月17日perl如何提取miRNA前后500bp序列

小编给大家分享一下perl如何提取miRNA前后500bp序列,相信大部分人都还不怎么了解,因此分享这篇文章给大家参考一下,希望大家阅读完这篇文章后大有收获,下面让我们一起去了解一下吧!

在没有位置信息时,提取miRNA前后500bp的序列

在提取miRNA前后500bp序列时,若没有其位置信息,提取会比较麻烦,可用以下方法提取:

1.首先利用blast软件将miRNA的前体序列与基因组比对,获取前体序列的位置信息,比对方法:

makeblastdb -in Donkey_Hic_genome.20180408.fa -dbtype nucl -title Donkey_Hic_genome.20180408.fablastall -i miRNA_Pre.fa -d Donkey_Hic_genome.20180408.fa -p blastn -e 0.01 -b 5 -v 5 -m 8 -o blast.out#Donkey_Hic_genome.20180408.fa  参考基因组染色体序列#miRNA_Pre.fa  miRNA的前体序列

由于序列可能较短,所以e-value值可调高一些。比对结果如下:

bta-miR-34c     Chr20   100.00  113     0       0       1       113     39385825        39385713        6e-57    224bta-miR-370     Chr07   100.00  109     0       0       1       109     70327160        70327268        1e-54    216bta-miR-34b     Chr20   100.00  112     0       0       1       112     39386420        39386309        2e-56    222bta-miR-383     Chr27   100.00  110     0       0       1       110     21958895        21959004        4e-55    218bta-miR-449a    Chr01   100.00  112     0       0       1       112     103603665       103603554       2e-56    222bta-miR-378     Chr09   100.00  111     0       0       1       111     51049164        51049054        9e-56    220bta-miR-378     Chr13   96.88   32      1       0       67      98      32631593        32631624        3e-06   56.0bta-miR-378     Chr13   93.94   33      2       0       62      94      29900614        29900582        2e-04   50.1bta-miR-378     Chr07   100.00  27      0       0       71      97      65072541        65072515        1e-05   54.0bta-miR-378     Chr30   96.55   29      1       0       69      97      28926810        28926782        2e-04   50.1bta-miR-378     Chr18   93.94   33      2       0       62      94      21721834        21721866        2e-04   50.1bta-miR-206     Chr08   100.00  112     0       0       1       112     78794660        78794771        2e-56    222bta-miR-1       Chr15   100.00  108     0       0       1       108     48711439        48711546        5e-54    214bta-miR-1       Chr07   88.89   63      7       0       32      94      26712206        26712268        2e-10   69.9bta-let-7c      Chr18   100.00  112     0       0       1       112     32412721        32412610        2e-56    222bta-let-7c      Chr20   89.19   74      8       0       18      91      29703021        29703094        1e-14   83.8

然后对比对结果筛选,尽量选择完全匹配,并且匹配长度最长的。筛选后的blast.out结果如下:

bta-miR-34c     Chr20   100.00  113     0       0       1       113     39385825        39385713        6e-57    224bta-miR-370     Chr07   100.00  109     0       0       1       109     70327160        70327268        1e-54    216bta-miR-34b     Chr20   100.00  112     0       0       1       112     39386420        39386309        2e-56    222bta-miR-383     Chr27   100.00  110     0       0       1       110     21958895        21959004        4e-55    218bta-miR-449a    Chr01   100.00  112     0       0       1       112     103603665       103603554       2e-56    222bta-miR-378     Chr09   100.00  111     0       0       1       111     51049164        51049054        9e-56    220bta-miR-206     Chr08   100.00  112     0       0       1       112     78794660        78794771        2e-56    222bta-miR-1       Chr15   100.00  108     0       0       1       108     48711439        48711546        5e-54    214bta-let-7c      Chr18   100.00  112     0       0       1       112     32412721        32412610        2e-56    222

3.接下来就可以提取序列啦,我有提供脚本哦,用法:

perl /share/work/wangq/script/lv/miRNA_500_fasta.pl -id blast.out  -fa Donkey_Hic_genome.20180408.fa  -out result.fa  -miR miRNA.fa -l 500

-id 后跟筛选的blast结果;-fa后跟参考基因组染色体序列;-miR后跟miRNA的序列文件;-l后跟提取miRNA前后多长序列,默认500。

提取结果(miRNA序列大写标注,其余小写):

>bta-miR-34cacaaggcacagcatcacccgccgccctgctgggaggaggccgccaccctccgcggcgaactgccgacccgaagcgtttgagaggagaaagctgcgcttcgagactggatgcgtcagcatttcttccgcgcggcgcggcgagcgcgtggtccgcgtgcgagcaaaccccctgcaaacgcaggcgggctgatgcttctagctggagttaaacagttagctatcactaggagtaaaagcaagattgggcgatcccatgtaaggaaagcaaaatctggggcgtttagtagctttaattgtaaggtgtaaagacatttgaattgctgggggaagccccctgtgtaacgttcagagctgtgagtcactgtgcctattttccattgtttatcagggtactcaccaatccaccaactaaagtgagtaccacggagccagtgatctgcctgtcacgacgcatgggggcaccaacttgagactgaagtttgtgatgaaagtaaagcttttttgctgtgagtctagttactAGGCAGTGTAGTTAGCTGATTGctaataataccaatcactaaccacacggccaggtaaaaagatttgggaattcatccagatgagctgcgtgtgcacaccagtgggtttgggggcaagaaggggattggaataccctaatagtacgcattgcctgtttatccatagctcagccaagagagaaatcagcattttagctgctaaatatacaacatatgtagtaaatatacatttttaacatataatttttcaatattccttcaggcgcttaaccaaaaaaattttcagatatattgaggaatagaggttttctctcttagctcatttatgtcattgtgtaaacttgtgattattttgaactatctgtaagactgtattactattttgaaagaataaagtgccctaaagtcataaattgtggtcattcatgtgtccattgcctttcctaagttggctttatgatgtccttctcagcgcctgccacagtaattataactttcctatcgctaaaatggttaaattgttcctagattaacaaaatctcccgggaaggctattcacaagcactttttagtatttttctaagaacacgtta>bta-miR-370tgtttattgttcgtgtctcagagttgctctgaggccagtgtgctgggcacccagcaggccatctgggaggggagaaaggaagctagccatgtatgtcagtaggggtcagcgtggaggcctttggggtttctgggaaacggttcgaacatggagaatctcgtgatgtggacgcccccgagggcagccccatttcatgaactggatcccttagtcggatttctgttttccagggcacttgtttaagctttcatgccgtgtccaaaaaaaagagcagatcccaagagtttcctgcccgaggcccgtcttgccagaagccctcggcttggcctgagcatcgagatcattcctctaatcagcctgggggtggtctgacccgcggggtgggcggcgtggggcggggaggggcagggcgtgtgcggggcgggagctgctgggggagctggcggcggttcctttcaccctcgccgtggacccgcgggggcgtggctgtcctcggtctacaaatcgcgcaagtcggggcacaagacagagaggccaggtcacgtctctgcagttacacagctcatgagtGCCTGCTGGGGTGGAACCTGGTctgtctgtctgtctagcaccacagctcgggcgctgctgcagagggaacaaagatttgggtgagggcctcagagacgggctggggaaggggtttactcgggctgactttgacatgaaaacaatagctaattctgcttggggcctagtgctgtgactatcttacagatgggaaacaggcacagacaggttagttaactttcctgatgctttccagctcatcaggattggaggctggatttggaggcaggcggtgtggttcgagcatatgtgctctaaccattgggaaacactgcctcctgactgtgagcacacgtagagatggcacatggagtccaagaatctgggtttgagtccttttaccactgcctggtgggtgtactgtccagtcaatcatacatatttcattcttcctcttgacagagtctctgcagaggcccagcgagtcgttggcccgtgggaaatatttttttaagaatgcatgtgtgtgatagaattttatatctatcgaaggggagggg

脚本代码如下:

#!/usr/bin/perl -wuse strict;use Getopt::Long;use Bio::SeqIO;use Bio::Seq;my $version = "1.3";my %opts;GetOptions(\%opts,  "id=s", "fa=s", "miR=s", "l=s","out=s","h");if(!defined($opts{out}) || !defined($opts{fa}) || !defined($opts{miR}) || !defined($opts{id}) ||defined($opts{h})){print <<"Usage End.";Description:$version:lefse analysisUsageForced parameter:-id           blast.out                         must be given-out          outfile                                must be given-fa      genome fasta file    must be given                -miR          miRNA fasta file              must be given                -l            length                                Other parameter:-h           Help documentUsage End.exit;}my $len = $opts{l};$len ||=500;my $in  = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta');my %fasta;while ( my $seq = $in->next_seq() ) {my($id,$sequence)=($seq->id,$seq->seq);$fasta{$id}=$sequence;}my $ina  = Bio::SeqIO->new(-file => "$opts{miR}" , -format => 'Fasta');my %mi;while ( my $seq = $ina->next_seq() ) {        my($id,$sequence)=($seq->id,$seq->seq);        $mi{$id}=$sequence;}open(IN,"$opts{id}") ||die "open file $opts{id} faild.\n";open(OUT,">$opts{out}") ||die "open file $opts{out} faild.\n";while(){chomp;my @line = split("\t");if($line[8] < $line[9]){my $miR = lc(substr( $fasta{$line[1]},$line[8]-1, $line[9] - $line[8]+1));my $small = lc($mi{$line[0]});$miR =~ s/$small/$mi{$line[0]}/;my $before = lc(substr( $fasta{$line[1]},$line[8]-$len-1, $len));my $laft = lc(substr( $fasta{$line[1]},$line[9], $len));print OUT ">$line[0]\n$before$miR$laft\n";}elsif($line[8] > $line[9]){my $miR = lc(substr( $fasta{$line[1]},$line[9]-1, $line[8] - $line[9]+1));my $before = lc(substr( $fasta{$line[1]},$line[9]-$len-1, $len));my $laft = lc(substr( $fasta{$line[1]},$line[8], $len));my $gene = $before.$miR.$laft;$gene = &reverse_complement_IUPAC($gene);my $small = lc($mi{$line[0]});$gene =~ s/$small/$mi{$line[0]}/;print OUT ">$line[0]\n$gene\n";}}close(IN);close(OUT);sub reverse_complement_IUPAC {        my $dna = shift;        # reverse the DNA sequence        my $revcomp = reverse($dna);        # complement the reversed DNA sequence        $revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/;        return $revcomp;}

以上是"perl如何提取miRNA前后500bp序列"这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注行业资讯频道!

0