Embed
Email

Slides

Document Sample

Shared by: yunyi
Categories
Tags
Stats
views:
2
posted:
11/15/2011
language:
English
pages:
35
Experimental Use of a Large

GPGPU-Enhanced Linux Cluster





16 June 2009

Computational Sciences Robert F. Lucas

rflucas@isi.edu

(310)448-9449

Approved for public release; distribution is unlimited.

Joshua DHPI









20 of 28

racks

Overview



• Background and DHPI Proposal



• Initial Configuration and Results



• GPU Training and Early Research



• Transition to Production Use



• Summary

JFCOM Background



• U.S. Joint Forces Command (JFCOM)

• One of DoD’s combatant commands

• Key role in transforming defense capabilities

• Two Directorates using agent-based simulations

• J7 Training

• J9 Experimentation

• Simulations are characterized by

• Interactive use by hundreds of personnel

• Distributed trans-continentally

• Typical users are uniformed warfighters

Simulation Federates



• Rule-based agent models of entity behavior

• Individual entities run sequentially

• Can be interactive and run in real time

• Large compute clusters required to run big ensemble

• HLA RTI communication (IEEE 1516)

• Fault-tolerant

• Publish/Subscribe model

• Relevant Examples:

• OneSAF

• Joint Semi-Automated Forces (JSAF)

• “Culture”, simplified civilian derivative

• Simulation of the Location and Attack of Mobile Enemy Missiles

(SLAMEM)

Heterogeneous

Ensemble

MARCI









Logger Event

Control

MARCI MARCI MARCI MARCI MARCI Log DB

SQL DB

Culture JSAF SLAMEM SOAR OOS



Logger Logger Logger Logger Logger MARCI



Front End

Log DB Log DB Log DB Log DB Log DB

SQL DB SQL DB SQL DB SQL DB SQL DB Logger Pucker



Log DB

SQL DB

Model of a City

Experimental Sensor

Architecture

DHPI Proposal Basics





NEED - 24x7x365 enhanced, distributed and scalable compute resources to

enable joint warfighters at JFCOM … to develop, explore, test, and validate

21st century battlespace concepts … to enhance global-scale, computer-

generated military experimentation by sustaining more than 2,000,000

entities on appropriate terrain with valid phenomenology.



APPROACH – Enable further growth in entity count, entity complexity, and

environmental/Infrastructure settings by employing large Linux cluster with

General Purpose GPUs (GPGPU) on each node to aid in line-of-sight, route

planning, plume representation, all capable of running faster than real time.



CHALLENGES – Effectively implementing Hardware configuration to provide

stable and useful platform, motivate/train operators to utilize GPGPUs,

and program simulations to take advantage of GPGPUs

Cluster Configuration

as Originally Specified



• 256 Nodes

Node (2) AMD Santa Rosa 2220 2.8 GHz dual-core processors

GPU (1) NVIDIA 7950 Video Card

Node Chassis 4U chassis

Memory 16 GB DIMM DDR2 667 per node

• GigE Internode Communications

• Delivery to:

Joint Advanced Tactics and Training Laboratory

Suffolk, VA

Modification Requested





• Shortly after award, NVIDIA announced CUDA

• Only available on 8800 model cards

• Programming with CUDA seen as:

• Easier

• More extensible

• More accessible by C programmers

• Requested and received upgrade

Management Rack

Cluster Row Installed

in Suffolk Virginia

Exceeding the Goal

10 Million SAF Entities

Perspective

Entity Growth vs. Time 10,000,000







Experiments continue

to require orders of

SCALE magnitude larger &

and FIDELITY more complex 1,400,00

DHPI

battlespaces

Complexity of JSAF









GPU-

Enhanced

Number and









Cluster

Entities









1,000,000









SPP Proof of Principle DC Clusters at MHPCC & ASCMSRC

DARPA / Caltech



250,000



107,000

50,000

3,600 12,000



UE 98-1 SAF J9901 AO-00 JSAF/SPP JSAF/SPP JSAF/SPP JSAF/SPP

(1997) Express (1999) (2000) Urban Resolve Tests Capability Joshua

(1997) (2004) (2004) (2006) (2008)

Why GPUs?



• GPU performance can be 100X hosts

• This differential is expected to grow

• Early OneSAF work (UNC & SAIC)

• Line of Sight

• Route Finding

• Collision Detection

• ISI verified they’re also bottlenecks in JSAF

Route Planning

Performance Impact



Time Spent in Route Planning is Critical Bottleneck

Fortran vs CUDA

ip=0;

for (j = jl; j <= jr; j++) {

if(ltid <= (j-1)-jl){

gpulskj(ip+ltid) = s[IDXS(jl+ltid,j)];

}

ip = ip + (j - 1) – jl + 1;

}

do j = jl, jr

do i = jr + 1, ld __syncthreads();

x = 0.0

do k = jl, j - 1 for (i = jr + 1 + tid; i <= ld;

i += GPUL_THREAD_COUNT) {

x = x + s(i, k) * s(k, j) for (j = jl; j <= jr; j++) {

end do gpuls(j-jl,ltid) = s[IDXS(i,j)];

s(i, j) = s(i, j) - x }

end do ip=0;

end do for (j = jl; j <= jr; j++) {

x = 0.0f;

for (k = jl; k <= (j-1); k++) {

x = x + gpuls(k-jl,ltid) * gpulskj(ip);

ip = ip + 1;

}

gpuls(j-jl,ltid) -= x;

}

for (j = jl; j <= jr; j++) {

s[IDXS(i,j)] = gpuls(j-jl,ltid);

}

}

Early GPU Programming



• Trained ISI staff with Sparse Matrix Solver

• Then examined JSAF kernels

• Line-of-sight

• Illumination

• Route planning

• Route planning appeared easiest to integrate

• Route planning work published at I/ITSEC

Tran, J.J., Lucas, R.R., Yao, K-T., Davis, D.M. and Wagenbreth, G., Yao, K-T., &

Bakeman, D.J., (2008), A High Performance Route-Planning Technique for Dense

Urban Simulations, WSC05-2008 I/ITSEC Conference, Orlando, FL

Timing Comparison CPU v GPU

Single Source Shortest Path

CUDA Training

PET Courses



• Conceived and run by Dr. David Pratt

• HPCMP FAPOC for FMS

• Location & Dates:

• SAIC facility Suffolk VA, 23 - 25 October 2007

• ISI Marina del Rey 21- 23 October 2008

• UCSD San Diego 5 – 6 March 2009

• Attendees: total ~ 60 HPCMP users

• Web resource and trial test beds

• Under construction

NVIDIA Instructors









Patrick Legresley

Paulius Micikevicious

Views of CUDA Classes

Practicum and

Programming Sessions







Patrick Legresley helps

with implementation









Gene Wagenbreth gives

programming exercises

Transition to Production Use





• JFCOM “customers” often run classified

• Most software development is unclassified

• Issue arose last fall

• JFCOM decided to split Joshua in two

• Secret (GenSer)

• Unclassified

• The partition is now complete

Partitioning of Joshua

Joint Integrated Persistent

Surveillance (JIPS) Project

There is an identified military problem that currently is the focus of the

requirement for Joshua. The Joint Force Commander (JFC) requires

adequate capability to rapidly integrate and focus collection assets to

achieve the persistent surveillance.

A major goal is to improve and integrate JIPS by developing Tactics,

Techniques and Procedures (TTP’s), Concept of Operations (CONOPS),

and architectures that maximize tipping, cueing and communications.

Another goal is the efficient and effective use of sensors to achieve optimum

persistence in restricted and denied areas.

A third goal is to improve doctrine, organization, and TTPs to enable the JFC

to better command and support Joint Intelligence, Surveillance and

Reconnaissance (JISR) operations by:

(1) effective capability apportionment and management,

(2) timely and responsive analytic support

(3) fast, reliable tactical Command, Control, Communications (C3)

The last goal is to enhance employment, coordination and optimization of

ISR assets by improving management processes, tools, and CONOPS.

JIPS User Interface

Experimental Schedule

Summary

Return on Investment

JFCOM needs to realistically simulate urban settings with thousands of

combatants and millions of civilians

Some benefits are not easily quantified, e.g. finding out what tactics or sensor

would be successful in different cities could not practically be done for

reasons of security, public resistance to combat troops in their city, and

diplomatic issues emerging practicing in cities of potential conflict

Other costs can be estimated, including the estimated cost of keeping an

Army Division (around 10K soldiers) in the field at about $20M per day.

JFCOM, using the DHPI cluster, can run such a program with around 100

technicians (one one hundredth on the personnel with less expensive

support costs, equipment and energy use, less loss of life, …); without the

cluster, this could not be done. Cost saving may be ~$19.5M each day.

At one recent “exercise,” the Lieutenant General in charge pointed out that it

was probably the only time in his career he would have an opportunity to

command so large a unit, yet he may be called on to do just that any time

The cluster is incorporated into the DREN, allowing large numbers of soldiers

to participate, 1,500 in one recent Urban Resolve experiment.

Summary

Technical Merit

• Challenges laid out in the proposal have all been met

• Joshua deployed and in service

• Creating data for JIPS Experiment

• Two million entity goal exceeded

• Capability of GPU demonstrated

• Developers trained to use GPUs

• Route planning kernel implemented

• Joshua has changed the J9 culture

• New code being developed using client/server model

Summary

Computational Merit

• Joshua dedicated to a uniquely military problem

• Modeling of operations in urban terrain

• Users are often uniformed warfighters

• Hosts a large, heterogeneous ensemble of SAFs

• 1024 cores needed to model urban population

• HLA RTI fault tolerant communication model

• Anonymous publish/subscribe

• Novel research challenges

• Scalable interest management to bound messages

• Scaling individual behavior models

• Mining distributed data logs to analyze results

Summary

Current Progress



• Successfully deployed and accepted at JFCOM

• Exceeded technical goal of hosting 2M entities

• Classification issues led to partitioning

• Joshua is now fully engaged in day-to-day

simulation experiments at JFCOM

• Running ensembles of SLAMEM simulations

• Ops-tempo is expected to continue and increase

• Human-in-the-loop experiments in FY10

Summary

Appropriateness

• Dedicated system

• Required by classified, interactive use

• Linux cluster

• Easy migration path for users

• Low risk given prior experience with DC Linux clusters

• Low-cost GigE network adequate for current SAFs

• GPUs were inexpensive upgrade ($50K)

• Joshua has met JFCOMs requirements

• In service creating data for JIPS

Research Funded by

JFCOM and AFRL



This material is based on research sponsored by the

Air Force Research Laboratory under agreement

numbers F30602-02-C-0213 and FA8750-05-2-0204. The

U.S. Government is authorized to reproduce and

distribute reprints for Governmental purposes,

notwithstanding any copyright notation appearing

thereon. The views and conclusions contained herein

are those of the authors and should not be interpreted

as necessarily representing the official policies or

endorsements, either expressed or implied, of the Air

Force Research Laboratory or the U.S. Government.



Related docs
Other docs by yunyi
2.2 Virtueller Adressraum
Views: 3  |  Downloads: 0
HIGHLINE TAPPED TO PRODUCE INAUG
Views: 2  |  Downloads: 0
Heteroflexibility
Views: 8  |  Downloads: 0
Lynn Jones 5 Grade Lesson Plan F
Views: 0  |  Downloads: 0
SPONSOR SHIP AND TABLE HOSTING OPPOR TUNITIES
Views: 0  |  Downloads: 0
NJTinside2
Views: 0  |  Downloads: 0
The Vegetarian Food Pyramid J
Views: 0  |  Downloads: 0
Anti-Spam Measures for End Users
Views: 0  |  Downloads: 0
Slide 1 - UCL
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!