千家信息网

merged.gtf如何合并同一转录本的exon位置

发表于:2025-02-01 作者:千家信息网编辑
千家信息网最后更新 2025年02月01日,这篇文章主要为大家展示了"merged.gtf如何合并同一转录本的exon位置",内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下"merged.gtf如何合并同
千家信息网最后更新 2025年02月01日merged.gtf如何合并同一转录本的exon位置

这篇文章主要为大家展示了"merged.gtf如何合并同一转录本的exon位置",内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下"merged.gtf如何合并同一转录本的exon位置"这篇文章吧。

在merged.gtf文件中有所有外显子的信息,下面的脚本可根据此文件提取转录本的所有外显子位置信息。

merged.gtf文件实例:

Chr00   Cufflinks       exon    37990   38333   .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";Chr00   Cufflinks       exon    38607   38710   .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";Chr00   Cufflinks       exon    38814   38898   .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";Chr00   Cufflinks       exon    42611   42713   .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "4"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";Chr00   Cufflinks       exon    42906   43203   .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "5"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";

输出文件示例:

Chr00   +       XLOC_000001     MD00G1000200    TCONS_00000001  exon    37990-38333     38607-38710     38814-38898     42611-42713     42906-43203Chr00   +       XLOC_000001     MD00G1000200    TCONS_00000002  exon    38005-38333     38607-38710     38814-38898     42611-42726     42906-43167Chr00   +       XLOC_000002     MD00G1000400    TCONS_00000003  exon    50386-50877Chr00   +       XLOC_000003     MD00G1000500    TCONS_00000004  exon    76659-76991     77468-77544     77649-77715     77889-77970     78355-78424Chr00   +       XLOC_000004     MD00G1000600    TCONS_00000005  exon    101951-102138   102228-102398   102957-103004   103099-103138   103227-103327   Chr00   +       XLOC_000004     MD00G1000600    TCONS_00000006  exon    102003-102138   102228-102398   102957-103004   103099-103138   103227-103327   Chr00   +       XLOC_000005     MD00G1000700    TCONS_00000007  exon    105542-105626   105926-106541   108356-108832Chr00   +       XLOC_000005     MD00G1000700    TCONS_00000009  exon    105542-105626   105926-106541   108902-109696Chr00   +       XLOC_000005     MD00G1000700    TCONS_00000008  exon    105542-105626   105926-106541   108949-109696Chr00   +       XLOC_000006     MD00G1001100    TCONS_00000010  exon    276592-277221   280928-280975

其中第一列为染色体;第二列为正负链;第三列是gene_id;第四列为gene_name;第五列为转录本ID;之后是外显子的起始位置信息

代码如下:

#!/usr/bin/perl -wuse strict;use warnings;use Getopt::Long;use Config::General;use Cwd qw(abs_path getcwd);use FindBin qw($Bin $Script);my $version = "1.2";## prepare parameters ######################################################################### -------------------------------------------------------------------------------------------## GetOptionsmy %opts;GetOptions(\%opts, "gtf=s", "od=s", "h");my $od = $opts{od};$od = abs_path($od);mkdir $od unless(-d $od);open(IN,"$opts{gtf}") ||die "open file $opts{gtf} failed.";open(OUT,">$opts{od}/merged.tpm") ||die "open file $opts{od}/merged.tpm failed.";while(){next if(/^#/);chomp;my($chr,$a,$exon,$start,$end,$c,$link,$d,$lin) = split("\t",$_);$lin=~/transcript_id \"([^\"]*)\"/;my $trans = $1;$lin=~/gene_name \"([^\"]*)\"/;my $gene_name= $1;$lin =~/gene_id \"([^\"]*)\"/;my $gene_id= $1;$lin =~/transcript_id \"([^\"]*)\"/;my $trans_id = $1;print OUT join("\t",$chr,$exon,$start,$end,$link,$gene_id,$trans_id)."\n";}close(IN);close(OUT);open(IN,"$opts{od}/merged.tpm") ||die "open file $opts{od}/merged.tpm failed.";open(OUT,">$opts{od}/merged.gtf") ||die "open file $opts{od}/merged.gtf failed.";my $cmd="";my $key="";while(){next if(/^#/);chomp;my ($chr,$exon,$start,$end,$link,$gene_id,$gene_name,$trans_id) = split("\t",$_);if($key eq $trans_id){$cmd .= "\t".$start."-".$end;}else{$key = $trans_id;if($cmd ne ""){print OUT $cmd."\n";}$cmd = join("\t",$chr,$link,$gene_id,$trans_id,$exon,$start."-".$end);}}close(IN);close(OUT);

以上是"merged.gtf如何合并同一转录本的exon位置"这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注行业资讯频道!

0