- Installing Spark+Hadoop on Linux with no prior installation. Go to Apache Spark Download page. Choose the latest Spark release (2.2.0), and the package type 'Pre-built for Hadoop 2.7 and later'. Click on the link 'Download Spark' to get the tgz package of the latest Spark release.
- Go to Spark root directory and run in command line: sbt/sbt clean assembly Then start up Spark, also from Spark root folder:./bin/spark-shell Posted 19th September 2015 by Unknown.
- Installing Apache Spark. Head over to the Spark homepage. Select the Spark release and package type as following and download the.tgz file. Save the file to your local machine and click 'Ok'. Let's extract the file using the following command. $ tar -xzf spark-2.4.6-bin-hadoop2.7.tgz; Configuring Environment Variable for Apache Spark and Python.
Installing Spark on Mac or Ubuntu. Please like and subscribe. Write in comment if you have any questions. Path: export SPARKHOME=/usr/local/spark/ export PA.
- Apache Spark Tutorial
- Apache Spark Useful Resources
- Selected Reading
Spark is Hadoop’s sub-project. Therefore, it is better to install Spark into a Linux based system. The following steps show how to install Apache Spark.
Step 1: Verifying Java Installation
Java installation is one of the mandatory things in installing Spark. Try the following command to verify the JAVA version.
If Java is already, installed on your system, you get to see the following response −
In case you do not have Java installed on your system, then Install Java before proceeding to next step.
Step 2: Verifying Scala installation
You should Scala language to implement Spark. So let us verify Scala installation using following command.
If Scala is already installed on your system, you get to see the following response −
In case you don’t have Scala installed on your system, then proceed to next step for Scala installation.
Step 3: Downloading Scala
Download the latest version of Scala by visit the following link Download Scala. For this tutorial, we are using scala-2.11.6 version. After downloading, you will find the Scala tar file in the download folder.
Step 4: Installing Scala
Follow the below given steps for installing Scala.
Extract the Scala tar file
Type the following command for extracting the Scala tar file.
Move Scala software files

Use the following commands for moving the Scala software files, to respective directory (/usr/local/scala).
Download Apache Spark For Mac
Set PATH for Scala
Use the following command for setting PATH for Scala.
Verifying Scala Installation
After installation, it is better to verify it. Use the following command for verifying Scala installation.
If Scala is already installed on your system, you get to see the following response −
Step 5: Downloading Apache Spark
Download the latest version of Spark by visiting the following link Download Spark. For this tutorial, we are using spark-1.3.1-bin-hadoop2.6 version. After downloading it, you will find the Spark tar file in the download folder.
Step 6: Installing Spark
Follow the steps given below for installing Spark.
Extracting Spark tar
The following command for extracting the spark tar file.
Moving Spark software files
The following commands for moving the Spark software files to respective directory (/usr/local/spark).
Setting up the environment for Spark
Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.
Use the following command for sourcing the ~/.bashrc file.
Step 7: Verifying the Spark Installation
Write the following command for opening Spark shell.
If spark is installed successfully then you will find the following output.
install.spark {SparkR} | R Documentation |
Download and Install Apache Spark to a Local Directory
Description
install.spark
downloads and installs Spark to a local directory ifit is not found. If SPARK_HOME is set in the environment, and that directory is found, that isreturned. The Spark version we use is the same as the SparkR version. Users can specify a desiredHadoop version, the remote mirror site, and the directory where the package is installed locally.
Usage
Arguments
hadoopVersion | Version of Hadoop to install. Default is |
mirrorUrl | base URL of the repositories to use. The directory layout should followApache mirrors. |
localDir | a local directory where Spark is installed. The directory containsversion-specific folders of Spark packages. Default is path tothe cache directory:
|
overwrite | If |
Details
Apache Spark Virtual Machine Download
The full url of remote file is inferred from mirrorUrl
and hadoopVersion
.mirrorUrl
specifies the remote path to a Spark folder. It is followed by a subfoldernamed after the Spark version (that corresponds to SparkR), and then the tar filename.The filename is composed of four parts, i.e. [Spark version]-bin-[Hadoop version].tgz.For example, the full path for a Spark 2.0.0 package for Hadoop 2.7 fromhttp://apache.osuosl.org
has path:http://apache.osuosl.org/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz
.For hadoopVersion = 'without'
, [Hadoop version] in the filename is thenwithout-hadoop
.
Value
the (invisible) local directory where Spark is found or installed
Note
install.spark since 2.1.0
See Also
See available Hadoop versions:Apache Spark
Examples
