Contents
all you need is to just click the links...

Setting Up

​JDK setup​
​Scala IDE​
​Summary​
​Dockerfile​
​Hadoop setup​
​HDFS​
​Start Hadoop​
​Install Hive​
​hive home​
​Hive client​
​Setup Apache Spark​
​Spark Home​

Python and Scala Prep

​Python 3 Warm Up​

​Basics​
​Strings​
​List​
​Tuple​
​Dictionary​
​Set​
​map and filter​
​lambda​
​Data structure​
​Input from a file​
​Output to a file​

​Scala Warm Up​

​Scala Data Type​
​Array in Scala​
​Methods​
​Class​
​Objects​
​Trait​
​Scala if statement​
​Scala for loop​
​Scala While Loop​

Spark Core

​RDD Operations​

Spark SQL

​SQL​
​SparkSession​
​Creating Datasets​
​Save Mode​
​Apache Arrow​

Spark Streaming

​map(func)​
​filter(func)​
​union(otherStream)​
​reduce(func)​
​count()​
​countByValue()​
​transform(func)​
​foreachRDD(func)​

Spark Graphx

​Graphx​

​Edge Class​
​EdgeContext Class​
​EdgeRDD Class​
​EdgeTriplet Class​
​Graph Class​
​GraphLoader Object​
​GraphOps Class​
​GraphXUtils Object​
​Pregel Object​
​VertexRDD Class​
​EdgeRDDImpl Class​
​Class GraphImpl​
​Class PageRank​
​Class SVDPlusPlus​
​Graphx Example 1​
​Graphx Example 2​
​Graphx Example 3​

Spark Machine Learning

​Regression​

​Correlation​

​Image Data Source​
​ML Transformer​
​ML Estimator​
​ML Pipeline​
​TF-IDF​
​Word2Vec​
​FeatureHasher​
​Tokenizer​
​CountVectorizer​
​StopWordRemover​
​n-gram​
​Binarizer​
​PCA​
​StringIndexer​
​One-hot encoding​
​StandardScaler​
​IndexToString​
​VectorIndexer​
​Interaction​
​Normalizer​
​MinMaxScaler​
​MaxAbScaler​
​Bucketizer​
​ElementwiseProduct​
​SQLTransformer​
​VectorAssembler​
​VectorSizeHint​
​Imputer​
​VectorSlicer​
​RFormula​
​ChiSqSelector​
​LogisticRegression​
​OneVsRest​
​Decision trees​
​Random forests​
​Linear Regression​

​Clustering​

​k-means​
​Bisecting k-means​
​FP-Growth​
​PrefixSpan​

​Cross-Validation​

Appendix

​Video presentation​

​References​

Last modified 1yr ago