Implementation of Package Management
in a Cluster Environment
So, Jung-ki
Distributed Computing System LAB
School of Computer Science and Engineering
Seoul National University
Introduction Related Work Design Evaluation Conclusion
Introduction (1/2)
Supercomputer
High performance processor / high network bandwidth
Expensive system but Beowulf system is cost-effective
Motivation
Focus on Cluster system
Cluster Management system
Manual method / add-on method / integrated method
Registry
Central repository of information about all aspects of the computer
So, Jung-ki (SNU DCS Lab) 2 / 20
Introduction Related Work Design Evaluation Conclusion
Introduction (2/2)
Challenge
Integrated method has low availability and reliability
Can’t manage computation nodes separately
When failure occurs, system can’t be rejuvenated
Goal ( using Registry )
Improve availability and reliability of integrated method
Administrator can manage a cluster system easily
Restore cluster system with a backup snapshot
So, Jung-ki (SNU DCS Lab) 3 / 20
Introduction Related Work Design Evaluation Conclusion
Supercomputer
Domestic Supercomputer
Quantity : 14
Cluster : 4
MPP : 4
Constellation : 6
※ SNU : 2 (51/413)
60.8%
So, Jung-ki (SNU DCS Lab) 4 / 20
Introduction Related Work Design Evaluation Conclusion
Cluster Management System
Manual approach
System administrator brings up entire system manually
Add-on method
Bring up a frontend node, then add cluster packages
OSCAR / Warewulf / OpenMosix
Integrated method
Cluster packages are installed and configured during the
initial installation
Rocks / Scyld
So, Jung-ki (SNU DCS Lab) 5 / 20
Introduction Related Work Design Evaluation Conclusion
Cluster Management System
Software Stack
Parallel code / Grid / computer lab …
Message passing / communication Layer
Cluster State
Job Scheduling Cluster software
management /
and Launching management
Monitoring
Linux Environment
HPC
Device Drivers
Linux Kernel
Application SGE HPC OS (Linux)
So, Jung-ki (SNU DCS Lab) 6 / 20
Introduction Related Work Design Evaluation Conclusion
Rocks Overview
Identity
System to build and manage a Linux Cluster
Free : Open source project
Goal
Make clusters easy
Philosophy
Computation nodes are 100% automatically installed
Roll : set of packages
Graph / Kickstart
Run on heterogeneous system architecture
Doesn’t attempt to incrementally update software
So, Jung-ki (SNU DCS Lab) 7 / 20
Introduction Related Work Design Evaluation Conclusion
Rocks system
Architecture
Front-end node eth1 internet
eth0
Local Network
eth0 eth0 eth0 eth0
node node node node
So, Jung-ki (SNU DCS Lab) 8 / 20
Introduction Related Work Design Evaluation Conclusion
What is Registry ?
Central repository of info about all aspects of the computer
Hardware, OS, applications, users information
Function
Retrieve system information
Update / add / delete software
Backup & restore system
Advantage
Easier for applications to access system
Storing large amounts of structured data (system info)
So, Jung-ki (SNU DCS Lab) 9 / 20
Introduction Related Work Design Evaluation Conclusion
Registry Design
Aliases
Original ID (primary key)
Relational Schema Node
H/W Name
information Network Appended Relation
ID (primary key) Nodes
Package
Node ID (primary key)
ID (primary key)
MAC Name
Node
S/W
IP Membership
Gateway CPUs
Name information
Name Version
Rack
Device Release
Rank
Module Install
Comment
Appliances Memberships Distribution
ID (primary key) ID (primary key) ID (primary key)
Name Name Name
Graph Appliance Release
Node Distribution Lang
So, Jung-ki (SNU DCS Lab) 10 / 20
Introduction Related Work Design Evaluation Conclusion
Strategy of management
Rocks Setup
Minimum modification
Take advantage of original Rocks system
Deploy cluster system easily
Modify related source codes
insert-ethers, kickstart.cgi, Kpp, Kgen, Rgen
Running System
Apply package modification
Package management program : add / update / delete packages
DB consistency management program
So, Jung-ki (SNU DCS Lab) 11 / 20
Introduction Related Work Design Evaluation Conclusion
Collection Method
Rgen
Appended
component
Package variables
Registry variables
So, Jung-ki (SNU DCS Lab) 12 / 20
Introduction Related Work Design Evaluation Conclusion
Modification Method
Insert command
Instruction : Add / update / delete
add –c=compute-0-0 –i=amanda-2.4.5-2.i386
add –c=all –i=all
del -c=compute-0-0 –i=amanda-2.4.5-2.i386
del -c=all -i=all Packages table
Package name / version / release
Compute Nodes
Packages table
Add / delete / update
So, Jung-ki (SNU DCS Lab) 13 / 20
Introduction Related Work Design Evaluation Conclusion
Registry consistency
Setup time
When frontend node removes / updates computation node
Dependency : change node table → change package table
Modify Kickstart.cgi / kgen
Apply cascading tables change
※mysql not support transaction property
Running system
Package install / delete / update
Compute node rpm information = frontend node’s registry DB
So, Jung-ki (SNU DCS Lab) 14 / 20
Introduction Related Work Design Evaluation Conclusion
Experiment Setup
Public Ethernet
Rocks.snu.ac.kr
CPU 800Mhz
RAM 768MB Frontend node
HDD 40G Experiment Data
name capacity volume
amanda 468KB 3
HPC 117MB 53
Rocks roll 1.5GB 479
Compute-0-(1~14)
Compute nodes (14) CPU 850Mhz
RAM 1G
HDD 10G
So, Jung-ki (SNU DCS Lab) 15 / 20
Introduction Related Work Design Evaluation Conclusion
Original Rocks Evaluation
average service time : 18min 14sec average transmit time : 11min 28sec
Network card
DHCP request
So, Jung-ki (SNU DCS Lab) 16 / 20
Introduction Related Work Design Evaluation Conclusion
Amanda Packages Evaluation
average install time : 6.62 sec Average delete time : 5.57sec
So, Jung-ki (SNU DCS Lab) 17 / 20
Introduction Related Work Design Evaluation Conclusion
HPC Roll Evaluation
average install time : 3min 38sec average delete time : 1min 18sec
So, Jung-ki (SNU DCS Lab) 18 / 20
Introduction Related Work Design Evaluation Conclusion
Conclusion
Registry takes advantage of cluster system
Improve availability and reliability using Registry
Administrator can manage cluster systems easily
Restore cluster systems with backup snapshots
So, Jung-ki (SNU DCS Lab) 19 / 20
Introduction Related Work Design Evaluation Conclusion
Q&A
Questions or Comments ?
Thank you !
So, Jung-ki (SNU DCS Lab) 20 / 20