Docstoc

Instructions

Document Sample
Instructions Powered By Docstoc
					Emergency Database Failover:
Impacts & Recovery Plan



Trey Felton – ERCOT IT
Synopsis


                                                       Market DB
             Market DB
                                                    Physical Standby
              (Taylor)
                                                        (Austin)




           Logical Standby                                ISM
                (RSS)                                    (EDW)




                             LodeStar




                                        Paperfree


                                                                       Siebel


                                                                                ISM - Information Services Master Database
                                             2                                  DB – Database
                                                                                EDW – Electronic Data Warehouse
Synopsis


                  Market DB
                   (Taylor)
                                             Failover                  Market DB
                                                                    Physical Standby
                                                                        (Austin)




                                                                                       –   Emergency DB failover on April 21st, 2008
                                                                                             • Market DB (which feeds ISM) became
                                                                                               unresponsive
                                                                                                  – Data could not be written/read
                                                                                       –   Synchronization issues caused a 24 hr
 Out of synch                                                                              gap in data
 (24 hrs)
                                                                          ISM
                                                                                             •    Propagated through to ISM
                Logical Standby
                     (RSS)                                               (EDW)




                                  LodeStar




                                                        Paperfree


                                                                                             Siebel


                                                                                                      ISM - Information Services Master Database
                                                             3                                        DB – Database
                                                                                                      EDW – Electronic Data Warehouse
Synopsis


                          Market DB
                           (Taylor)
                                                     Failover                  Market DB
                                                                            Physical Standby
                                                                                (Austin)




                                                                                               –   Physical Standby brought online
                                                                                               –   ISM rebuilt through Source data to recover
                                                                                                   affected extracts


  Source Data
                        Logical Standby                                           ISM
                             (RSS)                                               (EDW)




     Re-created ISM
     (for Recovery)


                                          LodeStar




                                                                Paperfree

            Recovered
                                                                                                     Siebel
             Extracts

                                                                                                              ISM - Information Services Master Database
                                                                     4                                        DB – Database
                                                                                                              EDW – Electronic Data Warehouse
    Impacts

•    Impacts:
      – Market transactions were prevented from updating ISM through Logical Standby
          • Market DB utilizes a standby to prevent outages / performance degradations
      – Logical Standby (RSS) became out of synch with Physical Standby by 24 hrs
          • April 22 at 11:14am through April 21 at 10:44am
          • Other DBs feeding ISM continued normally (only Market DB was out of synch)
      – Priority of rebuild led to the Standby being rebuilt before the RSS
          • Market DB has to be kept up
          • This prolonged the outage to the EDW and affected extracts
      – Prices had to be recalculated and extracts restored from Source
          • Price adjustments for NSRS were completed June 5th
          • Missing extracts for April 21 - April 30 completed on July 1st

•    Why did recovery take so long?
      – ISM generates up to 25-35G of data per day
      – Data restored from Source back to April 1st
          • 120 Terabytes had to be restored in order to roll-forward through transaction gap
          • Archive log changes applied during 24-hour gap




                                                   5
Emergency Database Failover


 • All data was restored with 100% accuracy

 • The affected market systems that caused the April failure:
         • Run the balancing energy and ancillary services markets
         • Not used for wholesale batch or the retail markets.


 • ERCOT considers this to be an isolated incident and not a systemic
   problem




                                       6
Going Forward


 • Actions to prevent future occurrences:
    – Nodal market DBs will utilize newer Hardware
        • More fault tolerance
        • Redundancy
    – Change of architecture in the replication process for Nodal
        • Proof of Concept recently introduced into the Nodal market systems
        • Testing underway
    – ERCOT is conducting a risk/cost analysis of several options for
      these Zonal systems
        • To be presented to TAC in August
    – New Backups / Recovery Procedures
        • Project initiated to stabilize our database backup procedures
        • Shorter recovery time



                                       7
Data Recovery


 NOTICE DATE: July 1, 2008
 NOTICE TYPE: W-A042308-48 UPDATE Extracts - Wholesale
 CLASSIFICATION: Public
 SHORT DESCRIPTION: ERCOT has completed recovery of the missing data for April 21 through
    April 30, 2008.
 INTENDED AUDIENCE: QSEs
 DAY AFFECTED: April 21 through April 30, 2008
 LONG DESCRIPTION: ERCOT conducted an emergency database failover on April 21, 2008
    following a hardware failure. This database failover resulted in an out-of-synch data problem from
    April 21 through April 30. ERCOT developed a phased process to attempt to thoroughly recover
    the missing data. The missing data has been recovered for the following extracts. A market
    notice will be sent when the extracts are expected to be posted.

 Act_Res_Output
 Ancillary_Services_Daily
 Bids_and_Schedules_Daily
 Forecast_Data_Daily
 Market_Information_Daily
 Sched_and_Actual_Load
 Self_Sch_Energy_Services
 ASDEPLOYMENTS




                                                    8

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:10/10/2011
language:English
pages:8