Apache Spark is an open source data processing framework for large volumes of data from multiple sources. Spark is used in distributed computing to process machine learning, data analytics, and graph-parallel processing applications on single-node machines or clusters.

Thanks to its lightning-fast processing speed, scalability, and programmability for big data, Spark has become one of the most widely used distributed big data processing frameworks for scalable computing.

Thousands of companies, including tech giants like Apple, Facebook, IBM and Microsoft, use Apache Spark. Installing Spark is easy and can be done in a number of ways. It provides native bindings for programming languages ​​including Java, Scala, Python, and R.

This guide will show you the step by step tutorial to install Apache Spark.

Apache Spark Installation Steps

Prerequisites:

  • Windows 10 system
  • A user account with administrator privileges (required to install software, change file permissions, and change the system path)
  • Command line or Powershell
  • A tool like 7-Zip can extract .tar files

Step 1: Verify Java installation

To install Apache Spark on Windows, you must have Java 8 or the latest version installed on your system.

Try this command to check your Java version:

$java -version

If your system already has Java installed, you will get the following output:

java version “1.7.0_71”

Java(TM) SE Runtime Environment (build 1.7.0_71-b13)

Java HotSpot(TM) Client VM (build 25.0-b02, mixed mode)

If you do not have Java installed, download Java from the https://java.com/en/download/ and install Java on your system before proceeding to the next step

Step 2: Verify your Scala installation

To deploy Apache Spark, you must have the Scala language installed on your system. Check your Scala installation with the following command:

$scala -version

If you already have Scala installed, you will see the following response:

Scala code runner version 2.11.6 — Copyright 2002-2013, LAMP/EPFL

However, if you don’t have Scala installed, skip to the next step of installing Spark.

Step 3: Download Scala

Download the latest version of Scala from the link http://www.scala-lang.org/download/ and install Scala on your system before proceeding to the next step.

In this tutorial we are using Scala – version 2.11.6.

Once the download is complete, you will find the Scala tar file in the download folder.

Step 4: Install Scala

Steps to follow to install Scala:

  • Extract the Scala tar file –

Use the command below to extract the Scala tar file.

$ tar xvf scala-2.11.6.tgz

  • Moving software files to Scala –

To move the Scala software files to the appropriate directory (/usr/local/scala), use the following commands:

$ su –

Password:

# cd /home/Hadoop/Downloads/

# mv scala-2.11.6 /usr/local/scala

# output

The command to set PATH for Scala is:

$export PATH = $PATH:/usr/local/scala/bin

  • Verifying the Scala installation

The check should follow the installation of Scala. Use the command below to verify your Scala installation:

$scala -version

You should see the following output:

Scala code runner version 2.11.6 — Copyright 2002-2013, LAMP/EPFL

Data Engineering Bootcamp with UCI

Take your career to the next level with Big DataSign up now

Step 5: Download Apache Spark

Open a browser and open the link

For this tutorial we are using spark-1.3.1-bin-hadoop2.6 version.

Under the “Download Apache Spark” heading, select from the 2 drop-down menus.

  • In the Select Spark Release drop-down menu, select 1.3.1
  • In the second “Select package type” drop-down menu, select Pre-built for Apache Hadoop 2.6.

Click the spark-1.3.1-bin-hadoop2.6.tgz link to download Spark. Once the download is complete, you will find the Spark tar file in the Downloads folder.

You can verify the integrity of your downloaded Spark software file by checking the checksum of the file. This step ensures that you are running unmodified, undamaged software.

Step 6: Install Spark

Here are the steps required to install Apache Spark:

  • Extract Spark tar file –

The command to extract the Spark tar file is:

$ tar xvf spark-1.3.1-bin-hadoop2.6.tgz

  • Move Spark software files to desired location –

Type the following commands to move the Spark software files to the appropriate directory (/usr/local/spark).

$ su –

Password:

# cd /home/Hadoop/Downloads/

# mv spark-1.3.1-bin-hadoop2.6 /usr/local/spark

# output

  • Environment setup for Spark

Add the following command to the ~/.bashrc file. This is done to add the location of the Spark software file to the PATH variable.

export PATH=$PATH:/usr/local/spark/bin

The command to extract the ~/.bashrc file is:

$ source ~/.bashrc

PCP in Data Engineering

In partnership with Purdue UniversityCourse overview

PCP in Data Engineering

Step 7: Verify the Spark installation

Open the Spark shell using the following command:

$sparkling shell

If you have installed Spark successfully, then the system will display many lines showing the status of the application. A Java pop-up window may appear on the screen. Select “Allow Access” to continue.

The Spark logo will then appear and the prompt will show the Scala shell.

You should see the following display:

The Spark assembly is built with Hive, including Datanucleus jars in the classpath

Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties

15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoop

15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop

15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled;

ui acls disabled; users with view permissions: Set(Hadoop); users with modify rights: Set(Hadoop)

15/06/04 15:25:22 INFO HttpServer: Starting HTTP server

15/06/04 15:25:23 INFO Utils: Successfully started service ‘HTTP class server’ on port 43292.

Welcome in

____ __

/ __/__ ___ _____/ /__

_ / _ / _ `/ __/ ‘_/

/___/ .__/_,_/_/ /_/_ version 1.4.0

/_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-bit Server VM, Java 1.7.0_71)

Enter expressions to be evaluated.

The Spark context is available as sc

scale>

Then open a web browser and open http://localhost:4040/You can then replace localhost with your system name.

An Apache Spark shell web interface will be displayed on the screen.

You can exit Spark and close the Scala shell by pressing ctrl-d in the command prompt window.

https://www.simplilearn.com/tutorials/apache-spark-tutorial/install-spark

Previous article3 Top Cloud Computing Jobs in 2023
Next articleGlobal monthly semiconductor sales fall as chip market takes another hit