Your Federal Quarterly Tax Payments are due April 15th

# B629 CLOUD COMPUTING PROJECT 2 Kaushik Chandrasekaran by dandanhuanghuang

VIEWS: 2 PAGES: 4

• pg 1
```									                 B629 CLOUD COMPUTING
PROJECT 2
Kaushik Chandrasekaran, Nabeel Ahamed Akheel

GOAL

The goal of this project is to learn the concepts of iterative MapReduce
programming model and have a try of an implementation of the iterative
MapReduce, the Twister Kmeans with Multiple Reducers. In this project we are
required to implement an automatic Twister Kmeans Program, which runs with
different centroids and gets the best case.

PROJECT DESCRIPTION

The main objective of this project is to execute the K-Means Data Generation to
create 10 Centroid files initially. Once these centroid files are obtained we need to
least objective value among those 10 runs. Hence a total of 10 x 10 runs would be
required.

STEPS INVOLVED

1.) Follow the necessary instructions given in the initial document and have
twister started and running.

2.) Generate the initial 10 centroid files using the following command

./gen_data.sh [init clusters file][num of clusters][vector length][sub dir][data file
prefix][number of files to generate][number of data points]

3.) Create the Partition files

4.) Obtain the final centroid point for each of the file

5.) Calculate the Euclidian distance

6.) Find the sum of the Euclidian distance for each point in each of the file
7.) Repeat step 6 for all the files and find the least summation value.

8.) The minimum value obtained is the least object function value.

COMPARISON WITH THE EXISTING SYSTEM

The algorithm followed is similar to the present algorithm. The vital differences
are as follows

1.) The calculation is performed on 10 initial centroid files.
2.) The Objective function value for each file has been calculated
3.) The least objective function value amongst 10 Objective function values are
obtained

OBSERVATIONS

a.) The sequential complexity per iteration for K centers and N points  O(NK). Since
80 Mapper Tasks are used in this program the time complexity is O(NK/80).

b.) The Time Complexity of Reduce task would be 80*N*(K/80) 0(NK)

c.) There would be a positive speed up obtained as the value of N increases. Since
twister is considered to be an iterative version of Map Reduce the performance
would increase with the increase in value of N.

d.) Initially we obtain the Objective function value for each of the Centroid files we have
(i.e 10 in our case). We are then calculating the summation of the distance of each of
these points to the 80,000 points in the partition file and the minimum value
obtained would be the best solution.
FLOW CHART

Start

Generate Initial cluster files

Generate partition file

Run K-means for the
generated Input cluster files

Get final centroid of each
input cluster file

Calculate Euclidian distance

Sum the Euclidian distance
for each point in each file

Done for               No
10 files?

YES
Compute the Euclidian
distance for the least value

Print the centroid with the
least Euclidian distance

Stop
REFERENCES

[1] CS machine assignment,