Contents

all you need is to just click the links...

Setting Up

​Spark Environment Setup​
​JDK setup​
​Scala IDE​
​Summary​
​Dockerfile​
​Hadoop configuration​
​Hadoop setup​
​HDFS​
​Start Hadoop​
​Install Hive​
​hive home​
​Initialize hive schema​
​Hive client​
​Setup Apache Spark​
​Spark Home​

Python and Scala Prep

​Python 3 Warm Up​

​Basics​
​Iterables/Collections​
​Strings​
​List​
​Tuple​
​Dictionary​
​Set​
​Conditional statement​
​Functions and methods​
​map and filter​
​lambda​
​Data structure​
​Input and if statement​
​Input from a file​
​Output to a file​
​Python coding exercise​

​Scala Warm Up​

​Scala Data Type​
​Array in Scala​
​Methods​
​Class​
​Objects​
​Trait​
​Scala if statement​
​Scala for loop​
​Scala While Loop​
​Scala coding exercise​

Spark Core

​Spark Core Introduction​

​Spark and Scala Version​
​Basic Spark Package​
​RDD Operations​
​RDD Action Functions​

Spark SQL

​SPARK SQL Introduction​

​SQL​
​datasets vs dataframe​
​SparkSession​
​Creating DataFrames​
​Creating Datasets​
​Interoperating with RDD​
​Save Mode​
​Apache Arrow​
​Example: Enrich JSON​

Spark Streaming

​map(func)​
​filter(func)​
​union(otherStream)​
​reduce(func)​
​count()​
​countByValue()​
​transform(func)​
​updateStateByKey(func)​
​Window DStream print(n)​
​foreachRDD(func)​

Spark Graphx

​Spark Graph Computing​

​Graphx​

​Edge Class​
​EdgeContext Class​
​EdgeDirection Class​
​EdgeRDD Class​
​EdgeTriplet Class​
​Graph Class​
​GraphLoader Object​
​GraphOps Class​
​GraphXUtils Object​
​PartitionStrategy Trait​
​Pregel Object​
​TripletFields Class​
​VertexRDD Class​
​EdgeRDDImpl Class​
​Class GraphImpl​
​Class VertexRDDImpl​
​Class LabelPropagation​
​Class PageRank​
​Class ShortestPaths​
​Class SVDPlusPlus​
​Class SVDPlusPlus.Conf​
​Class TriangleCount​
​Class BytecodeUtils​
​Class GraphGenerators​
​Graphx Example 1​
​Graphx Example 2​
​Graphx Example 3​

Spark Machine Learning

​Binary Classification​

​Regression​

​Correlation​

​Image Data Source​
​ML Transformer​
​ML Estimator​
​ML Pipeline​
​TF-IDF​
​Word2Vec​
​FeatureHasher​
​Tokenizer​
​CountVectorizer​
​StopWordRemover​
​n-gram​
​Binarizer​
​PCA​
​PolynomialExpansion​
​StringIndexer​
​One-hot encoding​
​StandardScaler​
​IndexToString​
​VectorIndexer​
​Interaction​
​Normalizer​
​MinMaxScaler​
​MaxAbScaler​
​Bucketizer​
​ElementwiseProduct​
​SQLTransformer​
​VectorAssembler​
​VectorSizeHint​
​QuantileDiscretizer​
​Imputer​
​VectorSlicer​
​RFormula​
​ChiSqSelector​
​LogisticRegression​
​OneVsRest​
​Naive Bayes classifiers​
​Decision trees​
​Random forests​
​Linear Regression​
​Isotonic regression​
​Decision Tree Regression​
​Random Forest Regression​
​Survival regression​

​Clustering​

​k-means​
​Bisecting k-means​

​A Gaussian Mixture Model​

​Collaborative filtering​
​Frequent Pattern Mining​
​FP-Growth​
​PrefixSpan​

​Cross-Validation​

​Train-Validation Split​

Appendix

​Video presentation​

​References​