Note: I recommend using Hortonworks Sandbox instead…..
Firstly, you need to check whether home brew has been installed or not, you can try
I will skip how to install home brew here.
Then, install hadoop with brew command:
The installation location of hadoop is ‘/usr/local/Cellar/hadoop’
usr/local/Cellar/hadoop/2.8.2/libexec/etc/hadoop. Under this folder, you will need to modify four files.
Then configure HDFS address and port number, open
core-site.xml, input following content in
jobtracker address and port number in map-reduce, first
sudo cp mapred-site.xml.template mapred-site.xml to make a copy of
mapred-site.xml, and open
Set HDFS default backup, the default value is 3, we should change to 1, open
Check for files
~/.ssh/id_rsa.pubto verify the SSH localhost is enabled or not.
If these files does not exists then run the following command to generate them:
Enable Remote Login in (System Preference->Sharing), Just click “remote login”.
Then Authorize the generate SSH keys:
Test ssh at localhost:
If success, you will see:
If fail, you will see:
Format the distributed file system with the below command before starting the hadoop daemons. So that we can put our data sources into the hdfs file system while performing the map-reduce job.
Now, we need to go to
/usr/local/Cellar/hadoop/2.8.2/sbin/ to start and stop hadoop services. And it is quite inconvienient, to make it easier we can create alias:
~/.bash_profile and add
Start hadoop with
In the browser, when your url is
http://localhost:50070 you can see the following page:
Also, you can see the JobTracker as well.
Go to the page
http://localhost:8088, Specific Node Information
http://localhost:8042, you will see
To stop it, just run
Thanks for these two articles which help me to figure out this problem.
It is much more convient to install the Spark on Mac.
Firstly, download the latest Spark from http://spark.apache.org/downloads.html
Then you are done! :-) Just kidding hhhh. Move that zip to the directory that you like, as for me, I move it to the home directory.
Open you .bash_profile, and add the following to this file.
source .bash_profile. You are done! I am not kidding. :P
You can now start to program with Spark in Jupyter Notebook. :-)
Useful Link here: http://spark.apache.org/docs/latest/index.html