Adam学习之7代码kmer.scala完善(统计和SaveAsFile)

更多代码请见:https://github.com/xubo245/SparkLearning


Adam学习之7代码kmer.scala完善(统计和SaveAsFile)


代码:

package testAdam
import org.apache.spark._
import org.bdgenomics.adam.rdd.ADAMContext
import org.bdgenomics.adam.projections.{AlignmentRecordField, Projection}
import java.text.SimpleDateFormat
import java.util._;

object kmer {
def main(args:Array[String]){
  val conf=new SparkConf().setAppName("test Adam kmer").setMaster("local")
//  val conf=new SparkConf().setAppName("test Adam kmer").setMaster("local")
//  val conf=new SparkConf().setAppName("test Adam kmer")
  val sc=new SparkContext(conf)
val ac = new ADAMContext(sc)
// Load alignments from disk
//val reads = ac.loadAlignments("/data/NA21144.chrom11.ILLUMINA.adam",
//val reads = ac.loadAlignments("/xubo/adam/output/small.adam",
val reads = ac.loadAlignments("hdfs://<strong>Master</strong>:9000/xubo/adam/output/small.adam",
  projection = Some(Projection(AlignmentRecordField.sequence,AlignmentRecordField.readMapped,AlignmentRecordField.mapq)))
// Generate, count and sort 21-mers
val kmers =reads.flatMap(_.getSequence.sliding(21).map(k => (k, 1L))).reduceByKey(_ + _).map(_.swap).sortByKey(ascending = false)
kmers.take(10).foreach(println)
// Print the top 10 most common 21-mers
//SaveAsFile
 val iString=new SimpleDateFormat("yyyyMMddHHmmssSSS").format(new Date() )
 val soutput="hdfs://<span style="font-size: 13.3333px; font-family: Arial, Helvetica, sans-serif;"><strong>Master</strong></span><span style="font-size: 12px; font-family: Arial, Helvetica, sans-serif;">:9000/xubo/adam/output/kmer/"+iString+"/smallkmers21.adam";</span>
  
println("kmers.count(reduceByKey):"+kmers.count)
kmers.saveAsTextFile(soutput)
val sum0=for((a,b)<-kmers) yield a
println("kmers.count(no reduce):"+sum0.sum)
sc.stop()

}
}


Master需要是真实IP


运行结果:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/G:/149/jar%e9%87%8d%e8%a6%81/spark-assembly-1.5.2-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/1win7/java/otherJar/adam-cli_2.10-0.18.3-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-03-07 11:13:28 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-03-07 11:13:34 WARN  MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
2016-03-07 11:13:38 WARN  :139 - Your hostname, xubo-PC resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:16c%17, but we couldn't find any external IP address!
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
(4,TCTTTCTTTCTTTCTTTCTTT)
(4,TTTCTTTCTTTCTTTCTTTCT)
(3,CTTTCTTTCTTTCTTTCTTTC)
(3,TTCTTTCTTTCTTTCTTTCTT)
(2,TCTTTTTCTTTCTTTCTTTCT)
(2,TTCTTTTTCTTTCTTTCTTTC)
(2,TTTCTTTTTCTTTCTTTCTTT)
(1,ATTGGATATCCTCCCAAATTT)
(1,AGGCATGAGGCACCGCGCCTG)
(1,CTACTGCCCAACAAGTCCCTA)
kmers.count(reduceByKey):1087
kmers.count(no reduce):1100.0
2016-3-7 11:14:15 INFO: org.apache.parquet.hadoop.ParquetInputFormat: Total input paths to process : 1
2016-3-7 11:14:16 WARNING: org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 20 records.
2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
2016-3-7 11:14:17 INFO: org.apache.parquet.hadoop.InternalParquetRecordReader: block read in memory in 69 ms. row count = 20

FIle在hdfs可以看到,比较长,不列了
已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页