Guidelines for bootstraping ASAP cluster and running the WIND use case

Introduction

The following guide directs the users to setup an ASAP cluster and run the sociometer use case provided by the WIND.

Create VMs

Prepare environment in the host machines:

ASAP_MASTER_IMAGE=<path to downloaded image>
RAM=<memory in MB>
VCORES=<number of virtual cores>
VM=asap-master
sudo virt-install \
--network bridge=br0 \
--name $VM  \
-r $RAM \
--vcpus $VCORES \
--disk path=$ASAP_MASTER_IMAGE,device=disk,bus=virtio \
--graphics vnc,listen=0.0.0.0,password=12345 \
--connect qemu:///system \
--virt-type kvm \
--noautoconsole \
--import -d
  • Check VM status:
$ sudo virsh list --all
 Id    Name                           State
----------------------------------------------------
28    asap-master                    running
  • Start VM (if not running already):
$ sudo virsh start asap-master
Domain asap-master started
  • Login in asap-master using a VNC client (password 12345) as asap user with password: raiding536&ivory
  • Note the acquired IP address:
$ ifconfig
ens3      Link encap:Ethernet  HWaddr 52:54:00:e3:f0:24
          inet addr:192.168.1.62  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fee3:f024/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2039 errors:0 dropped:0 overruns:0 frame:0
          TX packets:124 errors:0 dropped:0 overruns:0 carrier:0
          collisions:527 txqueuelen:1000
          RX bytes:104691 (104.6 KB)  TX bytes:16689 (16.6 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:229 errors:0 dropped:0 overruns:0 frame:0
          TX packets:229 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:31684 (31.6 KB)  TX bytes:31684 (31.6 KB)
  • Download asap-worker image (50GB).

  • Create several asap-worker VMs from the same image using backing files:
    • Change ownership to readonly:
    $ sudo chmod u-w asap-worker-readonly.qcow2
    
    • Create VMs:
    $ for i in 1 2 3 4 5;
    do;
    VM=asap-worker-$i
    sudo qemu-img create -f qcow2 -b asap-worker-readonly.qcow2 $VM.qcow2
    sudo virt-install \
    --network bridge=br0 \
    --name $VM \
    --ram <memory in MB> \
    --vcpus <number of cores> \
    --disk path=/tmp/$VM.qcow2,device=disk,bus=virtio \
    --graphics vnc,listen=0.0.0.0,password=12345 \
    --connect qemu:///system \
    --virt-type kvm \
    --noautoconsole \
    --import -d;
    done
    
    • Check VM statuses
    $ sudo virsh list --all
    
    • Login in the worker VMs using a VNC client (password 12345) as asap user with password: raiding536&ivory
    • Note the acquired IP address as done for the asap-master

Rename hostnames

Since the asap-workers were created using the same image we need to rename them. In order to do so:

declare -A workers=( ["192.168.1.194"]="asap-worker-1" ["192.168.1.70"]="asap-worker-2" ["192.168.1.157"]="asap-worker-3")
for ip in "${!workers[@]}"; do ssh -t asap@$ip sudo sed -ie s/asap1/${workers[$ip]}/g /etc/hostname; done

And then restart the VMs:

for i in 1 2 3; do sudo virsh shutdown asap-worker-$i; done
for i in 1 2 3; do sudo virsh start asap-worker-$i; done

Configure VMs

  • Update appropriately the guest IPs in the following configuration files in asap-master & asap-workers:
    • /etc/hosts
    • ~/asap/hadoop-2.7.1/etc/hadoop/yarn-site.xml
    • ~/asap/hadoop-2.7.1/etc/hadoop/core-site.xml
    • ~/asap/hadoop-2.7.1/etc/hadoop/mapred-site.xml
    • ~/asap/spark-nested/conf/spark-env.sh
    • ~/asap/workflow/src/main.coffee
    • ~/asap/spark-nested/conf/slaves
    • ~/fabric-scripts/hadoop_yarn/fabfile.py

Reformat HDFS

Delete old repositories:

for i in 1 2 3; do ssh asap-worker-$i rm -rf ~/asap/hdfs; done

Reformat HDFS:

cd ~/fabric-scripts/hadoop_yarn/ && fab formatHdfs

Start services

  • Start hadoop:
$ cd ~/fabric-scripts/hadoop_yarn/ && fab start
  • Start mongo DB:
$ sudo service mongod start
  • Start IReS:
$ cd ~/asap/IReS-Platform && ./sbin/ires.sh start
  • Start Spark:
$ cd ~/asap/spark-nested && ./sbin/start-all.sh
  • Rebuild WMT and restart nginx:
$ cd ~/asap/workflow/ && grunt && sudo service nginx reload
  • Access web services from your desktop
    • Forward port 3128 of the asap-master in your desktop local port 3128:
    $ ssh -L3128:<asap-master IP>:3128 <host machine> -N
    
    • Configure your web browser to use the proxy in localhost:3128

The available Web services are:

Hadoop:http://<asap-master IP>:50070
HDFS explorer:http://<asap-master IP>:50070/explorer.html
Yarn cluster:http://<asap-master IP>:8088/cluster
Workflow Management Tool:
 http://<asap-master IP>:8888/main.html
IReS:http://<asap-master IP>:1323/web/main
Spark master:http://<asap-master IP>:8080

Run sociometer WIND use case

The descriptions of the abstract operators assembling the use case (Wind_Latest_User_Profiling, Wind_Latest_Kmeans, Wind_Latest_Stereo_Type_Classification, Wind_Latest_Peak_Detection, Wind_Latest_OD_Matrix, Wind_Latest_Socio_Publisher, Wind_Latest_Peak_Publisher, Wind_Latest_ODMatrix_Publisher, Wind_Latest_Weblyzard_Uploader) and the input dataset (dataset_simulated) are already defined in IReS. Moreover, the respective materialized operators are also defined in the IReS. In order to run the sociometer workflow and get start with the ASAP do the following:

  • Save this file in your disk: sociometer.json
  • Navigate to the WMT Web UI.
  • From the navigation bar, click “Load workflow” and in the prompt window select the previously downladed JSON file.
  • The graph will appear in the workflow board:
_images/sociometer-WMT.png
  • Click on the several operators (rectangulars) and datasets (ellipses) in order to see their task descriptions in the taskboard. Using WMT one can modify the graph: e.g. change the dataset input.
  • Finally, click “Upload workflow” in the navigation bar; the workflow will be send to the IReS and will be listed among the other abstract workflows in the respective tab of the IReS Web UI. (The workflow will be overridden if already exists).
  • Follow the specific link and you will be navigated in the abstract workflow view, depicted in the following image:
_images/sociometer-abstract.png
  • The workflow now can be materialized by clicking the “Materialize Workflow” button and the materialized graph will be loaded. This workflow has two alternative materialized operators for the Wind_Latest_Kmeans abstract operator; one impemented using the KMeans implementation of the Spark MLlib library and one using the Swan KMeans clustering further described here: swan.html#k-means-clustering. There are also two alternative materialized operators for the Wind_Latest_Peak_Detection abstract operator; one impemented using pySpark and one using the nested RDD reduction further described here: spark.html#introduction.
_images/sociometer-materialized.png
  • Finally you can instract IReS to execute the workflow by clicking in the “Execute workflow” button.
  • IReS then will instruct Yarn to start a new application. The progress of the execution can be checked via the Execution view of the IReS and via the tracking URL pointing to the application on the Yarn cluster.
_images/sociometer-execute.png