“The Perfect Storm” – Learning How to Be A Quality Service Provider Dr. John S. Wasileski Ann F. Harbor Tom Crafton CSMS Charleston, SC March 22, 2005 Our Plan for Today is to … Give you a sense of the kind of institution we are Provide understanding of what it takes to be a service provider Discuss the project plan & implementation Have a frank discussion with you about what worked well and what did not University of Memphis Institutional Profile … • Regional, Metropolitan, Research, Doctoral Granting • Multiple campuses with ~21,000 students • 9 Colleges & 5 Administrative Divisions with @2400 faculty & staff • Desire to be a leading metropolitan university • FY05 Budget ~$260M • Member of Tennessee Board of Regents Information Technology facts • Shrinking resources (human & fiscal) • Heightened campus expectations for support • Strong and stable infrastructure • Little experience in hosting systems RODP Hosting Services Bid RFP • TBR announced that it would be asking TBR institutions to respond to an RFP for hosting services. This was due, primarily, to problems with their current provider. • Services sought through RFP – 7x24 Helpdesk – 7x24 uptime during semesters – TBR financial savings Memphis Management position • President wanted to host • ITD Executive Management wanted to satisfy president • ITD operational management wanted no part of hosting • Executive management ordered bid amount to be cut in half (first sign of pending storms) Memphis’ response to RFP What our response included – Robust mirrored design for platform – SAN storage of all course data – Backup strategy – Multiple internet paths to RODP – Leveraging current Help Desk and expanding to 7 X 24 & used students – Reporting capabilities not available from current provider – Priced at a bit over $800,000.00 Winning the bid & Planning the Project • Bid awarded in September 2003 for delivery of system on January 7, 2004 • Project management approach was to jointly create a master plan with linked sub plans created by individuals responsible Building the Plan by Visioning By Dec. 20th we will be fully prepared to deliver RODP – Hardware installed & tested – Software installed & certified – Back-up procedures in place – ALL courses migrated – Hired & trained staff for Help Desk – Reporting ready for use immediately – Customers will be recognized & given special handling – Problems resolved within one business day – Faculty will work easily with WebCT – The contract will be extended Looking at the resulting project plan How the Plan Worked • Time frame was aggressive but we made it • Implementation methodology was well- developed • Training of Help Desk was a problem • Data conversion was difficult • Platform certification was problematic for software vendor (2nd sign of brewing storms) • Go-live happened on-time with much hand holding and the semester finished well The Perfect Storm At the beginning of the Summer semester, the system suffered a complete melt-down and here is the sequence: • 6/6-9/04. Slow response • 6/9/04. Node 1 unresponsive. Node 2 started but WebCT application did not start The Perfect Storm cont’d • 6/16/04 – During rebuild, Node 2 production system erased • 6/17/04 – Rebuilding erased files begins. Recovery delayed due to over zealous client’s demanding too many meetings How we recovered • Immediate SWAT team formed with ITD and both vendors • Hardware vendor sent one of their best engineers to help on-site (important move!) • This engineer identified the problems and, bit- by-bit, helped recover course files • System unavailable for 11 days had hugh impact on students’ deadlines etc. • Personal contact with each customer The Impacts • Impacts on students • TBR/RODP Office reactions • Impacts on ITD staff What We Discovered Later • Conditions that created the storms we later discovered were: – Hardware vendor had created an incorrect and undetectable (by us) configuration – Early signs of system behavioral trouble were ignored by ITD • Difficulty with course mgmt. file structure • Slow response • Failure to check back-ups • Etc. – When the system ground to a halt, all technical staff worked round-the- clock for at least two days. This led to a … – Staff mistake – the production system was accidentally erased – The original configuration error (drives and mirrors with identical names) has actually confused our backup software and we had failed to verify that complete data was actually being written to back up tapes Post action review • Parties involved • Findings • Management reaction • Executive reaction • Customer reactions – Client – Enrolled students 2nd year Contract renegotiation • Memphis still better than former provider • Recognition of additional resources needed • Demonstrating improved customer focus Lessons Learned Bid more wisely (know the water you’re planning to swim) Work with vendors more wisely (look every-which-way) Lessons Learned cont’d Check/test back-ups (know how deep you’re in) Know what it REALLY means to be a service provider (be sure you can swim before getting in too deep) ???Questions??? & Comments Thank You!!!