Saturday, 19 January 2019

How to configure Maven in IntelliJ IDEA to run Spark programs?

Run IntelliJ IDEA.

File - New - Project
Maven - Project SDK : 1.8 - Next
GroupId : SparkExample
ArtifactId : BigData
Finish

Start - Run - CMD as Administrator
spark-shell
spark   version 2.4.0
Scala version 2.11.12
Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191

// Kindly make a note on the above version numbers. We need to pick corresponding maven package from maven repository using the above version informations.

do Google Search for : maven repository spark
Central:
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.4.0

Copy the following from there..
Maven :
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.0</version>
</dependency>

Add repository too.  Put everything into pom.xml which will look the following.



pom.xml
-------
Add the following taken from maven repository into pom.xml:
<dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.0</version>
        </dependency>
    </dependencies>
    <repositories>
        <repository>
            <id>my-repo1</id>
            <name>cloudera-repo</name>
            <url>https://repository.cloudera.com/content/repositories/releases/</url>
        </repository>
    </repositories>


//Add Scala framework support in IntelliJ IDEA project

Right click the Project -> Add Framework support
[x] Scala
Use Library : Scala-sdk-2.11.12 - OK

// Create a package
Expand src - main - java - Right click on java - new package
Enter package name : com.spark.learning
Expand java - Right click on com.spark.learning

// Create a class
new - Scala Class - Name : demo
  - Kind : Object
 
At the bottom - Enable Auto Import

demo.scala code:
---------------
package com.spark.learning

import org.apache.spark.{SparkConf, SparkContext}

object demo {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf()
    conf.set("spark.master", "local")
    conf.set("spark.app.name", "sampleApp")

    val sc = new SparkContext(conf)
    val rd1 = sc.textFile("E:\\IQ mine.txt")
    rd1.collect.foreach(println)
    rd1.saveAsTextFile("E:\\IQOutput")
    sc.stop()
  }
}



Right click - Run demo

// output will be displayed in console

No comments:

Post a Comment

Flume - Simple Demo

// create a folder in hdfs : $ hdfs dfs -mkdir /user/flumeExa // Create a shell script which generates : Hadoop in real world <n>...