idea spark scala maven环境搭建

1.Windows开发环境配置与安装

下载IDEA并安装,可以百度一下免费文档。

2.IDEA Maven工程创建与配置

1)配置maven

idea spark scala maven环境搭建

2)新建Project项目

idea spark scala maven环境搭建

3)选择maven骨架

idea spark scala maven环境搭建

4)创建项目名称

idea spark scala maven环境搭建

5)选择maven地址

idea spark scala maven环境搭建

6)生成maven项目

7)选择scala版本

8)新建Java 和scala目录

9)编辑pom.xml文件

3.开发Spark Application程序并进行本地测试

1)idea编写WordCount程序

package com.spark.test
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
/**
  * Created by z on 2018/4/18.
  */
object test {
  def main(args: Array[String]): Unit = {
    val spark= SparkSession
      .builder
      .appName("HdfsTest")
      .getOrCreate()
    val filePart = "E://stu.txt"
    //     val rdd= spark.sparkContext.textFile(filePart)
  
    //     println(lines)
    import spark.implicits._
    val dataSet= spark.read.textFile(filePart)
      .flatMap(x => x.split(" "))
      .map(x=>(x,1)).groupBy("_1").count()
      .show()
  }
}


结果


18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
18/04/18 23:49:55 INFO Executor: Finished task 50.0 in stage 9.0 (TID 200). 2540 bytes result sent to driver
18/04/18 23:49:55 INFO TaskSetManager: Finished task 50.0 in stage 9.0 (TID 200) in 4 ms on localhost (executor driver) (75/75)
18/04/18 23:49:55 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 
18/04/18 23:49:55 INFO DAGScheduler: ResultStage 9 (show at test.scala:22) finished in 0.363 s
18/04/18 23:49:55 INFO DAGScheduler: Job 4 finished: show at test.scala:22, took 0.372613 s
18/04/18 23:49:55 INFO CodeGenerator: Code generated in 5.24118 ms
+--------+-----+
|      _1|count|
+--------+-----+
|     dfd|    2|
|      ha|    3|
|      hh|    3|
|dsfsdfsd|    1|
|  sdfdsf|    1|
+--------+-----+


18/04/18 23:49:55 INFO SparkContext: Invoking stop() from shutdown hook
18/04/18 23:49:55 INFO SparkUI: Stopped Spark web UI at http://192.168.143.1:4040
18/04/18 23:49:55 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/18 23:49:55 INFO MemoryStore: MemoryStore cleared
18/04/18 23:49:55 INFO BlockManager: BlockManager stopped
18/04/18 23:49:55 INFO BlockManagerMaster: BlockManagerMaster stopped
18/04/18 23:49:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/18 23:49:55 INFO SparkContext: Successfully stopped SparkContext
18/04/18 23:49:55 INFO ShutdownHookManager: Shutdown hook called
18/04/18 23:49:55 INFO ShutdownHookManager: Deleting directory C:\Users\wangz\AppData\Local\Temp\spark-bce58627-23a7-4a65-8552-ceeb8112fba2


Process finished with exit code 0


4.Spark Application程序打包

1)项目打jar包,参考之前讲过的项目打包方式

2)spark-submit方式提交作业

bin/spark-submit --master local[2] /opt/jars/sparkStu.jarhdfs://bigdata-pro01.kfk.com:9000/user/data/stu.txt