基因数据处理101之SparkBWA本地运行配置和实例

1.修改Makefile.common:

LIBBWA_LIBS = -lrt
改为

LIBBWA_LIBS = -lrt -lz

不然会报错误【5】

2.make之后修改java.library.path

步骤:

vi /etc/profile

加入

export LD_LIBRARY_PATH=/home/hadoop/xubo/tools/SparkBWA/build:$LD_LIBRARY_PATH

使生效:

source /etc/profile

3.本地运行脚本:

spark-submit --class SparkBWA \
--master local \
SparkBWA.jar \
-algorithm mem -reads paired \
-index /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta \
-partitions 3 \
/xubo/alignment/sparkBWA/GRCH38chr1L3556522N10L50paired1.fastq /xubo/alignment/sparkBWA/GRCH38chr1L3556522N10L50paired2.fastq \
/xubo/alignment/output/sparkBWA/datatestLocalGRCH38chr1L3556522N10L50paired12

4.运行结果:

hadoop@Master:~/xubo/tools/SparkBWA/build$ ./pairedGRCH38L1Local.sh 
[Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-0.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-0.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2.
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 6 sequences (300 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 6 reads in 0.002 CPU sec, 0.002 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2
[main] Real time: 0.438 sec; CPU: 9.435 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.
[Stage 3:>                                                          (0 + 1) / 3][Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-1.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-1.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2.
[Stage 3:===================>                                       (1 + 1) / 3][M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 8 sequences (400 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 4, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 8 reads in 0.002 CPU sec, 0.001 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2
[main] Real time: 0.440 sec; CPU: 10.097 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.
[Java_BwaJni_bwa_1jni] Arg 0 'bwa'
[Java_BwaJni_bwa_1jni] Algorithm found 1 'mem'
[Java_BwaJni_bwa_1jni] Arg 1 'mem'
[Java_BwaJni_bwa_1jni] Filename parameter -f found 2 '-f'
[Java_BwaJni_bwa_1jni] Arg 2 '-f'
[Java_BwaJni_bwa_1jni] Filename found 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-2.sam'
[Java_BwaJni_bwa_1jni] Arg 3 '/home/hadoop/cloud/workspace/tmpSparkBWA_GRCH38chr1L3556522N10L50paired1.fastq-3-NoSort-local-1466761250475-2.sam'
[Java_BwaJni_bwa_1jni] Arg 4 '/home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta'
[Java_BwaJni_bwa_1jni] Arg 5 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1'
[Java_BwaJni_bwa_1jni] Arg 6 '/home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2'
[Java_BwaJni_bwa_1jni] option[0]: bwa.
[Java_BwaJni_bwa_1jni] option[1]: mem.
[Java_BwaJni_bwa_1jni] option[2]: /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta.
[Java_BwaJni_bwa_1jni] option[3]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1.
[Java_BwaJni_bwa_1jni] option[4]: /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2.
[Stage 3:=======================================>                   (2 + 1) / 3][M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 6 sequences (300 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 6 reads in 0.002 CPU sec, 0.002 real sec
[main] Version: 0.7.12-r1044
[main] CMD: bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2
[main] Real time: 0.429 sec; CPU: 10.584 sec
[Java_BwaJni_bwa_1jni] Return code from BWA 0.

5.数据源:

pair1:

@chr1_114727112_114727587_0:0:0_0:0:0_0/1
AGTACTTGAACTGTGCTAGATCATACACCAAATTATCCTGCATTGTTAAG
+
22222222222222222222222222222222222222222222222222
@chr1_59526946_59527392_0:0:0_1:0:0_1/1
ACTGAAGGTAGAGATGCAGGAATACAGTTACCTGTGCAACTATGACTCTA
+
22222222222222222222222222222222222222222222222222
@chr1_21862138_21862675_1:0:0_2:0:0_2/1
GCCCCAGCCATTAGGCCAAATTTACCAGAAGCCTTTCAGGGTTGCAATCC
+
22222222222222222222222222222222222222222222222222
@chr1_192732894_192733437_1:0:0_1:0:0_3/1
AATAGACAACACGAAGAACAGCTGTGAGCAATACAATTAGAAACTTTTTT
+
22222222222222222222222222222222222222222222222222
@chr1_246769496_246770014_0:0:0_2:0:0_4/1
TAATCTGTACGACAAACCCCCATGTCACTTTACCTCTATAACAAACCTGG
+
22222222222222222222222222222222222222222222222222
@chr1_89997487_89998004_0:0:0_2:0:0_5/1
CCTCAGCCTCCCTAGTAGCTGGGACTACAGGCACGCACCACCAGGCCCGG
+
22222222222222222222222222222222222222222222222222
@chr1_100741557_100742038_2:0:0_1:0:0_6/1
GTTCTCTTATATATTCTGAATAGACATTCTTTATGGAAAATACATTTAGC
+
22222222222222222222222222222222222222222222222222
@chr1_6197792_6198312_2:0:0_0:0:0_7/1
TTCAGTCAGTCGGAAAAACAAGATTAACATAACCAGAAACGTCCTAGGTA
+
22222222222222222222222222222222222222222222222222
@chr1_216218212_216218601_0:0:0_3:0:0_8/1
AATATAGTAAGATAACTTTAGTGCAACTTAAATTTCTTGGACCCAAAGTT
+
22222222222222222222222222222222222222222222222222
@chr1_2670698_2671257_2:0:0_0:0:0_9/1
ACCCACACGCCCATGTGAGCCTCTGACAGCCTGGAACAGCACGCGCAAGC
+
22222222222222222222222222222222222222222222222222

paird2:

@chr1_114727112_114727587_0:0:0_0:0:0_0/2
CAGAATGAAAACAATCTCAAGAACAAAAACCAATAAAAACAACTATAGTT
+
22222222222222222222222222222222222222222222222222
@chr1_59526946_59527392_0:0:0_1:0:0_1/2
CATTGTAAGCACTCAACAAGTGTTAGCTACTCCCAGTTGGAAGCTAGAAT
+
22222222222222222222222222222222222222222222222222
@chr1_21862138_21862675_1:0:0_2:0:0_2/2
GATGGATGTAGTTTTTAATTTATCCAATTGCCTATTCATGGATGTTTAGG
+
22222222222222222222222222222222222222222222222222
@chr1_192732894_192733437_1:0:0_1:0:0_3/2
AATAAAATATCATAGGACTAGGAGGCTTAAACAACATTTATTCCTCGCAG
+
22222222222222222222222222222222222222222222222222
@chr1_246769496_246770014_0:0:0_2:0:0_4/2
ACTTGGATTAGTGCCTGGCACATGGTGTAAGCACTTACTAAGTTTCAACG
+
22222222222222222222222222222222222222222222222222
@chr1_89997487_89998004_0:0:0_2:0:0_5/2
ACCCAATTATCTGCAAAACTGAGCATATTTAAAACAAATAATTACCAATA
+
22222222222222222222222222222222222222222222222222
@chr1_100741557_100742038_2:0:0_1:0:0_6/2
AAATGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCA
+
22222222222222222222222222222222222222222222222222
@chr1_6197792_6198312_2:0:0_0:0:0_7/2
GGACAGGCACCTAGAATACTTTGCAACTCCCTTCATGAGAAGGTGGATGA
+
22222222222222222222222222222222222222222222222222
@chr1_216218212_216218601_0:0:0_3:0:0_8/2
TCATTTATCAGCCTTTTTAAGGGTGTTGAATGATTACCCAAGAGATCAGA
+
22222222222222222222222222222222222222222222222222
@chr1_2670698_2671257_2:0:0_0:0:0_9/2
TGTTACAGGCTGTCAGAGGCTCACCTGGGCATGTGGGTGCTGTTCCAGTC
+
22222222222222222222222222222222222222222222222222

6.部分数据结果:

@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD0_2
chr1_100741557_100742038_2:0:0_1:0:0_6  65  chr1    100741557   60  50M =   35207765    -65533793   GTTCTCTTATATATTCTGAATAGACATTCTTTATGGAAAATACATTTAGC  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:24A13T11   AS:i:40 XS:i:0
chr1_100741557_100742038_2:0:0_1:0:0_6  129 chr1    35207765    0   50M =   100741557   65533793    AAATGCTTGAACCCGGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCA  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:2T47   AS:i:47 XS:i:47
chr1_114727112_114727587_0:0:0_0:0:0_0  81  chr1    114727538   60  50M =   114727112   -476    CTTAACAATGCAGGATAATTTGGTGTATGATCTAGCACAGTTCAAGTACT  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:0
chr1_114727112_114727587_0:0:0_0:0:0_0  161 chr1    114727112   60  50M =   114727538   476 CAGAATGAAAACAATCTCAAGAACAAAAACCAATAAAAACAACTATAGTT  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:0
chr1_192732894_192733437_1:0:0_1:0:0_3  81  chr1    192733388   60  50M =   192732894   -544    AAAAAAGTTTCTAATTGTATTGCTCACAGCTGTTCTTCGTGTTGTCTATT  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:21T28  AS:i:45 XS:i:0
chr1_192732894_192733437_1:0:0_1:0:0_3  161 chr1    192732894   60  50M =   192733388   544 AATAAAATATCATAGGACTAGGAGGCTTAAACAACATTTATTCCTCGCAG  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:22T27  AS:i:45 XS:i:0

第二部分:

@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD1_2
chr1_216218212_216218601_0:0:0_3:0:0_8  97  chr1    216218212   60  50M =   216218552   387 AATATAGTAAGATAACTTTAGTGCAACTTAAATTTCTTGGACCCAAAGTT  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:0
chr1_216218212_216218601_0:0:0_3:0:0_8  145 chr1    216218552   60  47M3S   =   216218212   -387    TCTGATCTCTTGGGTAATCATTCAACACCCTTAAAAAGGCTGATAAATGA  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:1G45   AS:i:45 XS:i:0
chr1_6197792_6198312_2:0:0_0:0:0_7  97  chr1    6197792 60  50M =   6198263 521 TTCAGTCAGTCGGAAAAACAAGATTAACATAACCAGAAACGTCCTAGGTA  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:29G13A6    AS:i:40 XS:i:0
chr1_6197792_6198312_2:0:0_0:0:0_7  145 chr1    6198263 60  50M =   6197792 -521    TCATCCACCTTCTCATGAAGGGAGTTGCAAAGTATTCTAGGTGCCTGTCC  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:0
chr1_59526946_59527392_0:0:0_1:0:0_1    81  chr1    59527343    60  50M =   59526946    -447    TAGAGTCATAGTTGCACAGGTAACTGTATTCCTGCATCTCTACCTTCAGT  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:20A29  AS:i:45 XS:i:0
chr1_59526946_59527392_0:0:0_1:0:0_1    161 chr1    59526946    60  50M =   59527343    447 CATTGTAAGCACTCAACAAGTGTTAGCTACTCCCAGTTGGAAGCTAGAAT  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:19
chr1_246769496_246770014_0:0:0_2:0:0_4  81  chr1    246769965   51  50M =   246769496   -519    CCAGGTTTGTTATAGAGGTAAAGTGACATGGGGGTTTGTCGTACAGATTA  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:0G13T35    AS:i:44 XS:i:28
chr1_246769496_246770014_0:0:0_2:0:0_4  161 chr1    246769496   60  50M =   246769965   519 ACTTGGATTAGTGCCTGGCACATGGTGTAAGCACTTACTAAGTTTCAACG  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:19

第三部分:

@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.12-r1044 CL:bwa mem /home/hadoop/xubo/ref/GRCH38L1Index/GRCH38chr1L3556522.fasta /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_1 /home/hadoop/cloud/workspace/tmplocal-1466761250475-RDD2_2
chr1_89997487_89998004_0:0:0_2:0:0_5    97  chr1    89997487    22  50M =   89997955    518 CCTCAGCCTCCCTAGTAGCTGGGACTACAGGCACGCACCACCAGGCCCGG  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:43 XA:Z:chr1,-26514021,50M,2;
chr1_89997487_89998004_0:0:0_2:0:0_5    145 chr1    89997955    60  50M =   89997487    -518    TATTGGTAATTATTTGTTTTAAATATGCTCAGTTTTGCAGATAATTGGGT  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:22C3T23    AS:i:40 XS:i:19
chr1_2670698_2671257_2:0:0_0:0:0_9  97  chr1    2679088 0   50M =   2666592 -12448  ACCCACACGCCCATGTGAGCCTCTGACAGCCTGGAACAGCACGCGCAAGC  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:13G34C1    AS:i:43 XS:i:43
chr1_2670698_2671257_2:0:0_0:0:0_9  145 chr1    2666592 0   50M =   2679088 12448   GACTGGAACAGCACCCACATGCCCAGGTGAGCCTCTGACAGCCTGTAACA  22222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:50 AS:i:50 XS:i:50
chr1_21862138_21862675_1:0:0_2:0:0_2    97  chr1    21862138    60  50M =   21862626    538 GCCCCAGCCATTAGGCCAAATTTACCAGAAGCCTTTCAGGGTTGCAATCC  22222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:2A47   AS:i:47 XS:i:0
chr1_21862138_21862675_1:0:0_2:0:0_2    145 chr1    21862626    60  50M =   21862138    -538    CCTAAACATCCATGAATAGGCAATTGGATAAATTAAAAACTACATCCATC  22222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:20G18G10   AS:i:40 XS:i:0

附录
错误:
错误【5】:

libbwa.so: undefined symbol: gzdopen

错误【6】

java.lang.UnsatisfiedLinkError: no bwa in java.library.path

参考

【1】https://github.com/xubo245/AdamLearning
【2】https://github.com/bigdatagenomics/adam/ 
【3】https://github.com/xubo245/SparkLearning
【4】http://spark.apache.org
【5】http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job  
【6】http://stackoverflow.com/questions/28840438/how-to-override-sparks-log4j-properties-per-driver

研究成果:

【1】 [BIBM] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Chao Wang, and Xuehai Zhou, "Distributed Gene Clinical Decision Support System Based on Cloud Computing", in IEEE International Conference on Bioinformatics and Biomedicine. (BIBM 2017, CCF B)
【2】 [IEEE CLOUD] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Xuehai Zhou. Efficient Distributed Smith-Waterman Algorithm Based on Apache Spark (CLOUD 2017, CCF-C).
【3】 [CCGrid] Bo Xu, Changlong Li, Hang Zhuang, Jiali Wang, Qingfeng Wang, Jinhong Zhou, Xuehai Zhou. DSA: Scalable Distributed Sequence Alignment System Using SIMD Instructions. (CCGrid 2017, CCF-C).
【4】more: https://github.com/xubo245/Publications

Help

If you have any questions or suggestions, please write it in the issue of this project or send an e-mail to me: xubo245@mail.ustc.edu.cn
Wechat: xu601450868
QQ: 601450868
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页