idea spark scala maven环境搭建
1.Windows开发环境配置与安装
下载IDEA并安装,可以百度一下免费文档。
2.IDEA Maven工程创建与配置
1)配置maven
2)新建Project项目
3)选择maven骨架
4)创建项目名称
5)选择maven地址
6)生成maven项目
7)选择scala版本
8)新建Java 和scala目录
9)编辑pom.xml文件
3.开发Spark Application程序并进行本地测试
1)idea编写WordCount程序
package com.spark.test import org.apache.spark.sql.SparkSession import org.apache.spark.{SparkConf, SparkContext} /** * Created by z on 2018/4/18. */ object test { def main(args: Array[String]): Unit = { val spark= SparkSession .builder .appName("HdfsTest") .getOrCreate() val filePart = "E://stu.txt" // val rdd= spark.sparkContext.textFile(filePart) // println(lines) import spark.implicits._ val dataSet= spark.read.textFile(filePart) .flatMap(x => x.split(" ")) .map(x=>(x,1)).groupBy("_1").count() .show() } }
结果
18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
18/04/18 23:49:55 INFO Executor: Finished task 50.0 in stage 9.0 (TID 200). 2540 bytes result sent to driver
18/04/18 23:49:55 INFO TaskSetManager: Finished task 50.0 in stage 9.0 (TID 200) in 4 ms on localhost (executor driver) (75/75)
18/04/18 23:49:55 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool
18/04/18 23:49:55 INFO DAGScheduler: ResultStage 9 (show at test.scala:22) finished in 0.363 s
18/04/18 23:49:55 INFO DAGScheduler: Job 4 finished: show at test.scala:22, took 0.372613 s
18/04/18 23:49:55 INFO CodeGenerator: Code generated in 5.24118 ms
+--------+-----+
| _1|count|
+--------+-----+
| dfd| 2|
| ha| 3|
| hh| 3|
|dsfsdfsd| 1|
| sdfdsf| 1|
+--------+-----+
18/04/18 23:49:55 INFO SparkContext: Invoking stop() from shutdown hook
18/04/18 23:49:55 INFO SparkUI: Stopped Spark web UI at http://192.168.143.1:4040
18/04/18 23:49:55 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/18 23:49:55 INFO MemoryStore: MemoryStore cleared
18/04/18 23:49:55 INFO BlockManager: BlockManager stopped
18/04/18 23:49:55 INFO BlockManagerMaster: BlockManagerMaster stopped
18/04/18 23:49:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/18 23:49:55 INFO SparkContext: Successfully stopped SparkContext
18/04/18 23:49:55 INFO ShutdownHookManager: Shutdown hook called
18/04/18 23:49:55 INFO ShutdownHookManager: Deleting directory C:\Users\wangz\AppData\Local\Temp\spark-bce58627-23a7-4a65-8552-ceeb8112fba2
Process finished with exit code 0
4.Spark Application程序打包
1)项目打jar包,参考之前讲过的项目打包方式
2)spark-submit方式提交作业
bin/spark-submit --master local[2] /opt/jars/sparkStu.jarhdfs://bigdata-pro01.kfk.com:9000/user/data/stu.txt