千家信息网

如何使用perl实现ncbi基因组数据格式转换

发表于:2025-01-20 作者:千家信息网编辑
千家信息网最后更新 2025年01月20日,小编给大家分享一下如何使用perl实现ncbi基因组数据格式转换,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!ncbi基因组数据格式转换大家都知道从ncbi下载的基因组数据与正常的基
千家信息网最后更新 2025年01月20日如何使用perl实现ncbi基因组数据格式转换

小编给大家分享一下如何使用perl实现ncbi基因组数据格式转换,希望大家阅读完这篇文章之后都有所收获,下面让我们一起去探讨吧!

ncbi基因组数据格式转换

大家都知道从ncbi下载的基因组数据与正常的基因组数据有所不同,所有的基因转录本CDS的ID都是重新起的,用起来非常不方便,为此写了一perl脚本将其ID转换回来。

用法如下:

UsageForced parameter:-gff        genoma gff file              must be given-fa         genoma fasta file            must be given-o          output gff file              must be given-out        output fasta file            must be givenOther parameter:-h           Help document

代码如下:

use Getopt::Long;use strict;use Bio::SeqIO;use Bio::Seq;#get optsmy %opts;GetOptions(\%opts, "gff=s", "o=s", "fa=s", "out=s","h");if(!defined($opts{gff}) || !defined($opts{o}) || !defined($opts{fa}) || !defined($opts{out}) || defined($opts{h})){print <<"Usage End.";UsageForced parameter:-gff        genoma gff file           must be given-fa         genoma fasta file  must be given-o          output gff file  must be given-out        output fasta file  must be givenOther parameter:-h           Help documentUsage End.exit;}open(IN,"$opts{gff}") || die "open $opts{gff} failed\n";open(OUT,">$opts{o}") || die "open $opts{o} failed\n";my %chr;my %gene;my %gene_mrna_num;my %mrna;my %mrna_exon_num;while(){if(/^#/){print OUT $_;next;}chomp;my @line = split("\t");###########################################################################if($line[2] eq "region"){if($line[8] =~/chromosome/){$line[8] =~ /chromosome=([^;]*)/;my $chromosome = $1;$chr{$line[0]} = $chromosome;}else{$chr{$line[0]} = $line[0];}}###########################################################################if($line[2] eq "gene"){$line[8] =~ /ID=([^;]*);Name=([^;]*)/;my $geneid = $1;my $genename = $2;$line[8] =~ s/$geneid/$genename/;$gene{$geneid} = $genename;$gene_mrna_num{$geneid} = 0;}###########################################################################if($line[2] eq "mRNA"){$line[8] =~ /ID=([^;]*);Parent=([^;]*)/;my $mrnaid = $1;my $geneid = $2;$line[8] =~ s/$geneid/$gene{$geneid}/;$gene_mrna_num{$geneid}++;$line[8] =~ s/$mrnaid/$gene{$geneid}\.$gene_mrna_num{$geneid}/;$mrna{$mrnaid} = "$gene{$geneid}.$gene_mrna_num{$geneid}";$mrna_exon_num{$mrnaid} = 0;}###########################################################################if($line[2] eq "exon"){$line[8] =~ /ID=([^;]*);Parent=([^;]*)/;my $exonid = $1;my $mrnaid = $2;$mrna_exon_num{$mrnaid}++;$line[8] =~ s/$mrnaid/$mrna{$mrnaid};Name=$mrna{$mrnaid}\.exon$mrna_exon_num{$mrnaid}/;$line[8] =~ s/ID=$exonid;//;}###########################################################################if($line[2] eq "CDS"){$line[8] =~ /ID=([^;]*);Parent=([^;]*)/;my $cdsid = $1;my $mrnaid = $2;$line[8] =~ s/$mrnaid/$mrna{$mrnaid}/;$line[8] =~ s/$cdsid/$mrna{$mrnaid}/;}$line[0] = $chr{$line[0]};print OUT join("\t",@line)."\n";}close(IN);close(OUT);my $in  = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta');my $out = Bio::SeqIO->new(-file => ">$opts{out}" , -format => 'fasta');while ( my $seq = $in->next_seq() ) {#my($id,$sequence)=($seq->id,$seq->seq);my($id,$sequence,$desc)=($seq->id,$seq->seq,$seq->desc);my $newSeqobj = Bio::Seq->new(-seq => $sequence,       -desc => $desc,       -id => $chr{$id}, ); $out->write_seq($newSeqobj);}

看完了这篇文章,相信你对"如何使用perl实现ncbi基因组数据格式转换"有了一定的了解,如果想了解更多相关知识,欢迎关注行业资讯频道,感谢各位的阅读!

0