Embed
Email

FTB-Enabled InfiniBand Monitoring Software g

Document Sample
FTB-Enabled InfiniBand Monitoring Software g
Shared by: mcsx n
Stats
views:
15
posted:
11/18/2011
language:
English
pages:
12
FTB-Enabled

g

InfiniBand Monitoring Software



Karthik Gopalakrishnan

Ohio State University

InfiniBand and FTB:

Current State and Future Plans

System Components, Libraries, Applications and Autonomics

(MPI, Parallel Fil Systems, Chkpt/Rstrt, etc.)

(MPI P ll l File S t Chk t/R t t t )





( )

Fault Tolerance Backplane (FTB)





User-Transparent Network Fault

Recovery Prevention

(dynamic and adaptive (alternate paths using

Reconfiguration) LMC, APM, etc.)





Network Fault Network Fault

Monitoring Prediction

(link, switch, SM, (port counter, congestion,

topology change) history)

InfiniBand and FTB:

Current State and Future Plans

System Components, Libraries, Applications and Autonomics

(MPI, Parallel Fil Systems, Chkpt/Rstrt, etc.)

(MPI P ll l File S t Chk t/R t t t )





( )

Fault Tolerance Backplane (FTB)





User-Transparent Network Fault

Recovery Prevention

(dynamic and adaptive (alternate paths using

Reconfiguration) LMC, APM, etc.)





Network Fault Network Fault

Monitoring Prediction

(link, switch, SM, (port counter, congestion,

FTB-IB 1.0 topology change) history)

Release done

on 11/10/08

Fault Tolerant InfiniBand Component

Monitored Events

FTB_IB_ADAPTER_AVAILABLE

– FTB IB ADAPTER AVAILABLE

– FTB_IB_ADAPTER_UNAVAILABLE

_ _ _

– FTB_IB_ADAPTER_INFO

– FTB_IB_PORT_INFO

– FTB_IB_EVENT_PORT_ACTIVE

– FTB_IB_EVENT_PORT_ERR

– FTB_IB_EVENT_LID_CHANGE

FTB_IB_EVENT_CLIENT_REREGISTER

– FTB IB EVENT CLIENT REREGISTER

Fault Tolerant InfiniBand Component





FTB Agent









FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent









FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent







Port Down



FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent









FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent









FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent





Adapter

Unavailable

FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component





FTB Agent









FTB

Enabled

Component

FTB-IB









IB HCA

Fault Tolerant InfiniBand Component

• Future Plans

– Library based design to support affiliated events related to QP, SRQ and

CQ errors

– Network-Fault Prediction

Network-Fault Prevention

– N t k F lt P ti

– User-Transparent Recovery





• Availability

– FTB-IB 1.0 release can be downloaded from

http://nowlab.cse.ohio-state.edu/projects/ftb-ib/

– Also available from the CIFTS Software page at

p g p p

http://www.mcs.anl.gov/research/cifts/software/index.php



Related docs
Other docs by mcsx n
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!