零基础学大数据开发,Spark 学习资源分享
本系列是基于目前最新的 spark 1.6.0 系列开始的,spark 目前的更新速度很快,记录一下版本好还是必要的。 来源:segmentfault
1. 书籍
- Learning Spark
- Mastering Apache Spark
2. 网站
- official site
- user mailing list
- spark channel on youtube
- spark summit
- meetup
- spark third party packages
- databricks blog
- databricks docs.html)
- databricks training/Introduction%20(README).html)
- cloudera blog about spark
- https://0x0fff.com
- http://techsuppdiva.github.io/
- **** spark 知识库
- 过往记忆
3. 文章,博客
- RDD论文英文版
- RDD论文中文版
- An Architecture for Fast and General Data Processing on Large Clusters
- How-to: Tune Your Apache Spark Jobs (Part 1)
- How-to: Tune Your Apache Spark Jobs (Part 2)
- 借助 Redis ,让 Spark 提速 45 倍!
- 量化派基于Hadoop、Spark、Storm的大数据风控架构
- 基于Spark的异构分布式深度学习平台
- 你对Hadoop和Spark生态圈了解有几许?
- Hadoop vs Spark
- 雅虎开源CaffeOnSpark:基于Hadoop/Spark的分布式深度学习
- 2016 上海第二次 spark meetup: 1. spark_meetup.pdf
- 2016 上海第二次 spark meetup: 2. Flink_ An unified stream engine.pdf
- 2016 上海第二次 spark meetup: 3. Spark在计算广告领域的应用实践.pdf
- 2016 上海第二次 spark meetup: 4. splunk_spark.pdf
- 基于Spark的医疗和金融大数据
4. 视频
- YouTube: what is apache spark
- Introduction to Spark Architecture
- Top 5 Mistakes When Writing Spark Applications
- slide Top 5 mistakes when writing Spark applications
- Tuning and Debugging Apache Spark
- slide Tuning and Debugging Apache Spark
- A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)
- slide A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)
- Building, Debugging, and Tuning Spark Machine Learning Pipelines - Joseph Bradley (Databricks)
- slide Building, Debugging, and Tuning Spark Machine Learning Pipelines
- Spark DataFrames Simple and Fast Analysis of Structured Data - Michael Armbrust (Databricks)
- slide Spark DataFrames Simple and Fast Analysis of Structured Data - Michael Armbrust (Databricks)
- Spark Tuning for Enterprise System Administrators
- slide Spark Tuning for Enterprise System Administrators
- Structuring Spark: DataFrames, Datasets, and Streaming
- slide Structuring Spark: DataFrames, Datasets, and Streaming
- Spark in Production: Lessons from 100+ Production Users
- slide Spark in Production: Lessons from 100+ Production Users
- Production Spark and Tachyon use Cases
- slide Production Spark and Tachyon use Cases
- SparkUI Visualization
- slide SparkUI Visualization
- Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San Jose 2015
- slide Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San Jose 2015
- Large Scale Distributed Machine Learning on Apache Spark
- Securing your Spark Applications
- slide Securing your Spark Applications
- Building a REST Job Server for Interactive Spark as a Service
- slide Building a REST Job Server for Interactive Spark as a Service
- Exploiting GPUs for Columnar DataFrame Operations
- slide Exploiting GPUs for Columnar DataFrame Operations
- Easy JSON Data Manipulation in Spark - Yin Huai (Databricks)
- slide Easy JSON Data Manipulation in Spark - Yin Huai (Databricks)
- Sparkling: Speculative Partition of Data for Spark Applications - Peilong Li
- slide Sparkling: Speculative Partition of Data for Spark Applications - Peilong Li
- Advanced Spark Internals and Tuning – Reynold Xin
- slide Advanced Spark Internals and Tuning – Reynold Xin
- The Future of Real Time in Spark
- The Future of Real Time in Spark