Spark Submit 2017 SF Note

 

写一下看了部分spark submit 2017 ppt后的个人理解,

  • Apache-Kylin–Speed-Up-Cubing-with-Apache-Spark-with-Luke-Han-and-Shaofeng-Shi-iteblog,kylin利用spark来加速之前MR的cube build过程
  • 很多关于IoT的spark应用。ETL,real-time-analysis
  • A-Deep-Dive-into-Spark-SQL’s-Catalyst-Optimizer-with-Yin-Huai-iteblog,spark SQL优化项
  • Apache-Spark-and-Apache-Ignite–Where-Fast-Data-Meets-the-IoT-with-Denis-Magda-iteblog,Ignite大数据分布式内存sql分析系统
  • Best Practices for Using Alluxio with Spark, Alluxio缓存一份file,避免多个spark app的重复读并占用重复内存
  • Cost-Based Optimizer in Apache Spark 2.2,spark CBO的最早提出?
  • demystifying-dataframe-and-dataset-with-kazuaki-ishizaki. spark 2.2的dataset加速方案:数据转换(装、解箱)、序列化、字节码