使用samtools来对sam/bam/cram相互转换

使用samtools来对sam/bam/cram相互转换

1.sam <=>bam

samtools view -h NA12878.bam >NA12878_2.sam  
samtools view -b -S NA12878.sam > NA12878_2.bam 

2. cram=>bam

samtools view -bS artificial.sam >artificial.bam
samtools view -bS artificial.sam >artificial.bam
samtools view -bS artificial.cram >artificial2.bam 

遇到问题:

hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools view -b -S NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram >NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam
[E::cram_populate_ref] mismatching md5sum for downloaded reference.
Failed to populate reference for id 0
Unable to fetch reference #0 9998..119239
Failure to decode slice
[main_samview] truncated file.
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ 

hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ ll
total 6991788
drwxr-xr-x 2 hadoop hadoop       4096  3月  9 22:38 ./
drwxr-xr-x 4 hadoop hadoop       4096  3月  8 21:39 ../
-rw-rw-r-- 1 hadoop hadoop     116162  3月  9 22:30 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam
-rw-r--r-- 1 hadoop hadoop        877  3月  8 14:19 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam.bas
-rw-r--r-- 1 hadoop hadoop 7158968591  3月  8 14:31 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram
-rw-r--r-- 1 hadoop hadoop     482019  3月  9 22:38 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram.crai


hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools flagstat NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage2.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
248958 + 0 in total (QC-passed reads + QC-failed reads)
481 + 0 secondary
0 + 0 supplementary
4660 + 0 duplicates
247408 + 0 mapped (99.38% : N/A)
248477 + 0 paired in sequencing
124288 + 0 read1
124189 + 0 read2
227967 + 0 properly paired (91.75% : N/A)
245377 + 0 with itself and mate mapped
1550 + 0 singletons (0.62% : N/A)
3892 + 0 with mate mapped to a different chr
1109 + 0 with mate mapped to a different chr (mapQ>=5)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools flagstat NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage3.bam
97098407 + 0 in total (QC-passed reads + QC-failed reads)
179635 + 0 secondary
0 + 0 supplementary
2634031 + 0 duplicates
96638779 + 0 mapped (99.53% : N/A)
96918772 + 0 paired in sequencing
48457840 + 0 read1
48460932 + 0 read2
93116714 + 0 properly paired (96.08% : N/A)
95999516 + 0 with itself and mate mapped
459628 + 0 singletons (0.47% : N/A)
1495404 + 0 with mate mapped to a different chr
565190 + 0 with mate mapped to a different chr (mapQ>=5)




3.bam<=>cram:

samtools view -C -T ref.fa aln.bam > aln.cram

java -jar cramtools-3.0.jar bam -O yeast.bam -I yeast.cram -R yeast.fasta
 java -jar cramtools-3.0.jar cram -O yeast2.cram  -I yeast.bam -R yeast.fasta 


运行记录:

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast.bam -I yeast.cram -R yeast.fasta 
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ls
cramtools-3.0.jar  yeast.bam  yeast.cram  yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 19008
drwxrwxr-x 2 hadoop hadoop     4096  3月 10 15:23 ./
drwxrwxr-x 3 hadoop hadoop     4096  3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop  3986091  3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop  2130246  3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop   967382  3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755  3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast2.bam -I yeast.cram -R yeast.fasta 
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ls
cramtools-3.0.jar  yeast2.bam  yeast.bam  yeast.cram  yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 21092
drwxrwxr-x 2 hadoop hadoop     4096  3月 10 15:26 ./
drwxrwxr-x 3 hadoop hadoop     4096  3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop  3986091  3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop  2130246  3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop   967382  3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755  3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast3.bam -I yeast.cram -R yeast.fasta 
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 23176
drwxrwxr-x 2 hadoop hadoop     4096  3月 10 15:28 ./
drwxrwxr-x 3 hadoop hadoop     4096  3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop  3986091  3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop  2130246  3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop   967382  3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755  3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast4.bam -I yeast.cram -R yeast.fasta 
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 25260
drwxrwxr-x 2 hadoop hadoop     4096  3月 10 15:30 ./
drwxrwxr-x 3 hadoop hadoop     4096  3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop  3986091  3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:31 yeast4.bam
-rw-rw-r-- 1 hadoop hadoop  2130246  3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop   967382  3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755  3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar 
Version 3.0-b48

Usage: cramtools [options] [command] [command options]
  Options:    -h, --help  Print help and quit (default: false)
  Commands:
    bam         CRAM to BAM conversion. 
    cram        BAM to CRAM converter. 
    index       BAM/CRAM indexer. 
    merge       Tool to merge CRAM or BAM files. 
    fastq       CRAM to FastQ dump conversion. 
    fixheader   A tool to fix CRAM header without re-writing the whole file.
    getref      Download reference sequences.
    qstat       Quality score statistics.

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar cram
Version 3.0-b48

Usage: <main class> [options]
 
  Options:    --capture-all-tags              Capture all tags. (default: false)
    --capture-tags                  Capture the tags listed, for example 'OQ:XA:XB' (default: )
    --encrypt                       Encrypt the CRAM file. (default: false)
    --ignore-md5-mismatch           Fail on MD5 mismatch if true, or correct (overwrite) the checksums and continue if false. (default: false)
    --ignore-tags                   Ignore the tags listed, for example 'OQ:XA:XB' (default: )
    --inject-sq-uri                 Inject or change the @SQ:UR header fields to point to ENA reference service.  (default: false)
    --input-bam-file, -I            Path to a BAM file to be converted to CRAM. Omit if standard input (pipe).
    --input-is-sam                  Input is in SAM format. (default: false)
    --lossless-quality-score, -Q    Preserve all quality scores. Overwrites '--lossless-quality-score'. (default: false)
    --lossy-quality-score-spec, -L  A string specifying what quality scores should be preserved. (default: )
    --max-records                   Stop after compressing this many records.  (default: 9223372036854775807)
    --output-cram-file, -O          The path for the output CRAM file. Omit if standard output (pipe).
    --preserve-read-names, -n       Preserve all read names. (default: false)
    --reference-fasta-file, -R      The reference fasta file, uncompressed and indexed (.fai file, use 'samtools faidx'). 
    -h, --help                      Print help and quit (default: false)
    -l, --log-level                 Change log level: DEBUG, INFO, WARNING, ERROR. (default: ERROR)

使用上述指令转换后明显有数据压缩:
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 26760
drwxrwxr-x 2 hadoop hadoop     4096  3月 10 16:49 ./
drwxrwxr-x 3 hadoop hadoop     4096  3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop  3986091  3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop   510298  3月 10 15:33 yeast2.cram
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop   510301  3月 10 15:40 yeast3.cram
-rw-rw-r-- 1 hadoop hadoop  2130242  3月 10 15:31 yeast4.bam
-rw-rw-r-- 1 hadoop hadoop   510301  3月 10 16:50 yeast5.cram
-rw-rw-r-- 1 hadoop hadoop  2130246  3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop   967382  3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755  3月 10 15:23 yeast.fasta


hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ samtools view yeast5.cram |head -20
SRR507778.19213	147	I	62	60	36M	=	3183	3087	ATCCTAACACTACCCTAACACAGCCCTAATCTAACC	*	MD:Z:36	NM:i:0
SRR507778.12312	147	I	205	60	36M	=	3626	3387	CCACTCACCCACCGTTACCCTCCAATTACCCATATC	*	MD:Z:36	NM:i:0
SRR507778.11604	83	I	402	60	36M	=	3869	3503	CTCACTTGTATACTGATTTTACGTACGCACACGGAT	*	MD:Z:36	NM:i:0
SRR507778.10609	83	I	2661	40	36M	=	6131	3506	TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG	*	MD:Z:36	NM:i:0
SRR507778.6249	147	I	2925	60	36M	=	6404	3445	TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT	*	MD:Z:36	NM:i:0
SRR507778.14609	129	I	3048	60	36M	IV	1525643	0	AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC	*	MD:Z:36	NM:i:0
SRR507778.20233	83	I	3132	60	36M	=	6388	3292	TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA	*	MD:Z:36	NM:i:0
SRR507778.19213	99	I	3183	60	36M	=	62	-3087	ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA	*	MD:Z:36	NM:i:0
SRR507778.20882	73	I	3259	60	36M	=	3259	0	CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG	*	MD:Z:36	NM:i:0
SRR507778.20882	133	I	3259	0	*	=	3259	0	GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG	*
SRR507778.12312	99	I	3626	60	36M	=	205	-3387	GTATCTGATGTTTTTTTAGTAATTTCTTTGTAAATA	*	MD:Z:36	NM:i:0
SRR507778.11604	163	I	3869	60	36M	=	402	-3503	TTTTTGAAAATATTCTGAGGTAAAAGCCATTAAGGT	*	MD:Z:36	NM:i:0
SRR507778.24515	83	I	4004	60	36M	=	7814	3846	GATGTTTCAAGGCCTGAAGTTTGAATATTTATGTAG	*	MD:Z:36	NM:i:0
SRR507778.19471	83	I	4627	60	36M	=	8153	3562	GGCAGAGTTTCCAAAAAAAATTGTTAATCGACAAAG	*	MD:Z:36	NM:i:0
SRR507778.15626	83	I	4748	60	36M	=	8861	4149	TTTAAATTGTATTGAGTGCTTCAGTCATTGCAAAAT	*	MD:Z:36	NM:i:0
SRR507778.7265	147	I	4894	60	36M	=	8228	3300	TATCTATCACAAAGGAGACAAAATCGTTGATAAAAA	*	MD:Z:36	NM:i:0
SRR507778.14364	83	I	5516	60	36M	=	9133	3653	TATGATATAAAAACTCGGACCCTGTTTTACTTCTTT	*	MD:Z:36	NM:i:0
SRR507778.10609	163	I	6131	60	36M	=	2661	-3506	CATACGTTGATTAGTACTGTTGGTCTCTCATTGAAA	*	MD:Z:36	NM:i:0
SRR507778.20233	163	I	6388	60	36M	=	3132	-3292	ACCAATTTGACGTTAATTTTAAATGCGTTCTGAAGT	*	MD:Z:36	NM:i:0
SRR507778.6249	99	I	6404	60	36M	=	2925	-3445	TTTTAAATGCGTTCTGAAGTTTCTTAAATAACCCGG	*	MD:Z:36	NM:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ samtools view yeast.cram |head -20
SRR507778.19213	147	I	62	60	36M	=	3183	3087	ATCCTAACACTACCCTAACACAGCCCTAATCTAACC	15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG	AS:i:36	XS:i:19	MD:Z:36	NM:i:0
SRR507778.12312	147	I	205	60	36M	=	3626	3387	CCACTCACCCACCGTTACCCTCCAATTACCCATATC	GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII	AS:i:36	XS:i:26	MD:Z:36	NM:i:0
SRR507778.11604	83	I	402	60	36M	=	3869	3433	CTCACTTGTATACTGATTTTACGTACGCACACGGAT	GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.10609	83	I	2661	40	36M	=	6131	3436	TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG	IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII	AS:i:36	XS:i:36	MD:Z:36	NM:i:0
SRR507778.6249	147	I	2925	60	36M	=	6404	3445	TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT	GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.14609	129	I	3048	60	36M	IV	1525643	0	AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC	>5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD>	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20233	83	I	3132	60	36M	=	6388	3222	TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA	IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.19213	99	I	3183	60	36M	=	62	-3087	ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA	HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	73	I	3259	60	36M	=	3259	0	CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG	@GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	133	I	3259	0	*	=	3259	0	GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG	EGBG8GCB8BBBB#######################	AS:i:0	XS:i:0
SRR507778.12312	99	I	3626	60	36M	=	205	-3387	GTATCTGATGTTTTTTTAGTAATTTCTTTGTAAATA	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHI	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.11604	163	I	3869	60	36M	=	402	-3433	TTTTTGAAAATATTCTGAGGTAAAAGCCATTAAGGT	IIIIIIHIIIIIIIIIIIIIIIIIIIIHIIIIHIIE	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.24515	83	I	4004	60	36M	=	7814	3776	GATGTTTCAAGGCCTGAAGTTTGAATATTTATGTAG	IHIIIIIIIIIIIIHIIIIHIIIIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.19471	83	I	4627	60	36M	=	8153	3492	GGCAGAGTTTCCAAAAAAAATTGTTAATCGACAAAG	HGHBHGGHGHHHGHHHHHHHHGGGGGDDD=BDGGGG	AS:i:36	XS:i:20	MD:Z:36	NM:i:0
SRR507778.15626	83	I	4748	60	36M	=	8861	4079	TTTAAATTGTATTGAGTGCTTCAGTCATTGCAAAAT	IIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.7265	147	I	4894	60	36M	=	8228	3300	TATCTATCACAAAGGAGACAAAATCGTTGATAAAAA	GGBDGGIIIFIGHHIIDDGGGGGDGDIIDIIEIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.14364	83	I	5516	60	36M	=	9133	3583	TATGATATAAAAACTCGGACCCTGTTTTACTTCTTT	IIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.10609	163	I	6131	60	36M	=	2661	-3436	CATACGTTGATTAGTACTGTTGGTCTCTCATTGAAA	HIHIHIGIIIHIIIHIHIIIGIIIGIEHDIIIHIHG	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20233	163	I	6388	60	36M	=	3132	-3222	ACCAATTTGACGTTAATTTTAAATGCGTTCTGAAGT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.6249	99	I	6404	60	36M	=	2925	-3445	TTTTAAATGCGTTCTGAAGTTTCTTAAATAACCCGG	GGGGGGGGGGHHHHHHGDHHHHGHHFGHHHGFGGGG	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ 
应该需要用压缩等级,但不确定??


©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页