bam/sam 数据格式的介绍 (二)

5.详解

举例:

E00606:11:H2CC3CCXY:8:1101:7172:14195 77 * 0 0 * * 0 0 CTACGAGTCATTTAGCACCGGGTTCTCCACAAACTTGCGGTGCGTCTCCAGAGAGGGGCGGCACTCGTTCGGCCGCACCCCGGTCCAGTCACGAACGGCTCTCCACACCGGCCGGCCCCGGGGGGTCGACCGGCTATCCCAGGCCAATCA AAFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<JFJJJJ<FJJJJJJJJJJJJJJJJJ)FJ<JJJJJJFJJJJJJJJJJFJJ< XM:i:0
E00606:11:H2CC3CCXY:8:1101:7172:14195 2:N:0:ATCACG 141 * 0 0 * * 0 0 AGACATTTGGTGCGTGTGCTTGGCTGAGGAGCCACTGGTGCGAAGCTACCATCTGTGGGATTATGACTGAACGCCTCTAAGTCAGAATCCCGCCTAAACGTAACGATACCGCAGCGCCGCGGGACTTTGATTGGCCTGGGATAGCCGGTC AAAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJJ<JAJJJJJJJ<JJJJJ<JJJJJJJJJF7JJFFFJJJJJFJAJJJAJFJJJJ7JJFJJFFA-A7FFJJJJF-AFJJJJJJJJ XM:i:0
E00606:11:H2CC3CCXY:8:1101:6400:14195 77 * 0 0 * * 0 0 GCGGGATGCAGGCCGCTCACCATGGCGACGGAGCTGGAGGCGTGGCTCATGTATGAGGATGTCTGGGGCAGCGGATACGTCACCACCTCCAGTACATCATGAGAGCTGCGCTTGAAGCGGTTATTACTGGGCAGCGGCAGCAGGGGGCAG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ XM:i:0
E00606:11:H2CC3CCXY:8:1101:7963:14195 2:N:0:ATCACG 141 * 0 0 * * 0 0 GAGTCTAACGCACGCGCGAGTCAAAGGGTGTCTCCGAGCCCCCACGGCGCAATGAAGGTGAAGGCCGGCGCTCGCCGGCCCAGGTGGGATCCCCCCGCCCCGGCGGGGGGCGCACCACCGGCCCGTCTCGCCCGCACCGCCGGGCAGGTG AAAFFJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJJJJJFJFJJFJJJJJ<JJJJJJJ<FJJJFJJJFFFJJJJFJJJFJJJJJJJJJJJJJ)AFFJJJJFJJJFFJJJJJJJJ)7<JF--<FJFJ)< XM:i:0

1)QNAME

query name 一般就是read名称 如:E00606:11:H2CC3CCXY:8:1101:7172:14195

2)FLAG  


bam/sam 数据格式的介绍 (二)


以下信息来自于:http://www.cnblogs.com/xudongliang/p/5437850.html

/*! @abstract the read is paired in sequencing, no matter whether it is mapped in a pair */
#define BAM_FPAIRED        1
/*! @abstract the read is mapped in a proper pair */
#define BAM_FPROPER_PAIR   2
/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */
#define BAM_FUNMAP         4
/*! @abstract the mate is unmapped */
#define BAM_FMUNMAP        8
/*! @abstract the read is mapped to the reverse strand */
#define BAM_FREVERSE      16
/*! @abstract the mate is mapped to the reverse strand */
#define BAM_FMREVERSE     32
/*! @abstract this is read1 */
#define BAM_FREAD1        64
/*! @abstract this is read2 */
#define BAM_FREAD2       128
/*! @abstract not primary alignment */
#define BAM_FSECONDARY   256
/*! @abstract QC failure */
#define BAM_FQCFAIL      512
/*! @abstract optical or PCR duplicate */
#define BAM_FDUP        1024
/*! @abstract supplementary alignment */
#define BAM_FSUPPLEMENTARY 2048

1 : 代表这个序列采用的是PE双端测序

2: 代表这个序列和参考序列完全匹配,没有错配和插入缺失

4: 代表这个序列没有mapping到参考序列上

8: 代表这个序列的另一端序列没有比对到参考序列上,比如这条序列是R1,它对应的R2端序列没有比对到参考序列上

16:代表这个序列比对到参考序列的负链上

32 :代表这个序列对应的另一端序列比对到参考序列的负链上

64 : 代表这个序列是R1端序列, read1;

128 : 代表这个序列是R2端序列,read2;

256: 代表这个序列不是主要的比对,一条序列可能比对到参考序列的多个位置,只有一个是首要的比对位置,其他都是次要的

512: 代表这个序列在QC时失败了,被过滤不掉了(# 这个标签不常用)

1024: 代表这个序列是PCR重复序列(#这个标签不常用)

2048: 代表这个序列是补充的比对(#这个标签具体什么意思,没搞清楚,但是不常用)

上面的这几个标签都是2的n次方,这样的数列有一个特点,就是随机挑选其中的几个,它们的和是唯一的,比如65 只能是1 和 64 组成,代表这个序列是双端测序,而且是read1

samtools 中flag 可以查看flags详细信息:如:

$samtools flags 77
0x4d    77      PAIRED,UNMAP,MUNMAP,READ1
flags值为77 
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为64.
以上数值相加和为77

$samtools flags 141
0x8d    141     PAIRED,UNMAP,MUNMAP,READ2
flags值为141
PAIRED表示这条序列采用双端测序, 其值为1;
UNMAP表示这个序列没有mapping到参考序列上, 其值为4;
MUNMAP表示这个序列的另一端序列没有比对到参考序列上, 其值为8;
READ1表示这条序列是R1端序列,其值为128.
以上数值相加和为141

3)RNAME

reference sequence name

一般是参考基因组染色体名称,如果没有比对上,用*表示