Docstoc

Testing Live Production Applicat

Document Sample
Testing Live Production Applicat Powered By Docstoc
					                 Presentation at: AsiaSTAR2004, Canberra, Australia, 7 Sep 2004


A Brave New Frontier:

Testing Live
Production
Applications

Dr Kelvin Ross, Steve Woodyatt, Dr Steven Butler
SMART Testing Technologies Pty Ltd
Roadmap

• Avoiding Production
  Problems
•   Testing for Service Level Management
•   Case Study
•   Considerations Unique to Production Testing
•   Information for SLM
•   Implementation Choices
•   Wrap-Up
Why Test On
Production
• Despite best efforts to test an application prior
  to deployment there are still post-deployment
  problems that frequently occur
   –   Server offline
   –   No response
   –   Functions not available
   –   Incorrect response
   –   Slow response
   –   Security breach
   –   Data out-of-date
The user experience

• What is it that the user will experience in
  dealing with our application

• E.g. Airline Reservation business process:
   –   Search for flights




                                                        Information Flow
   –   Make a reservation
   –   Pay with credit card
   –   Obtain electronic ticket reservation code
   –   Confirmation by email with matching details
   –   Reservation details reported in frequent flyer
                                               Internet




                Firewall
                                                            Distributed




            Web Application
                                                  Airline
                                                            Architecture




 ERP                 Mainframe

                               Remote
                           Payment
Email Gateway
                                Prices
                           Gateway
                                              Internet
                                           External
                                           Systems




    Email
                                 Payment
Roadmap

• Avoiding Production Problems
• Testing for Service Level
  Management
•   Case Study
•   Considerations Unique to Production Testing
•   Information for SLM
•   Implementation Choices
•   Wrap-Up
Service Level
Management
• Service Level Management (SLM)
   – “set of people and systems that allows the
     organisation to ensure that SLAs are being met and
     that the necessary resources are being provided
     efficiently”
• Service Level Agreement (SLA)
   – “contracts between service providers and customers
     that define the services provided, the metrics
     associated with these services, acceptable and
     unacceptable service levels, liabilities on the part
     of the service provider and the customer, and
     actions to be taken in specific circumstances”


         Definitions from IEC, “Service Level Management” tutorial, www.iec.org
SLM in the context of
ITIL
SLA KPIs

Availability                  Security
• End-to-end, not just        • Exposure
  components                      – No. of breaches
    – No. and duration of         – Vulnerabilities detected
      outages,                    – Viruses
    – Total uptime/downtime

Accuracy                      Performance
• Correct results             • Responsiveness
• Processes followed              – Response time for web
                                    request
                                  – Data transfer / throughput
                                  – MTTR
                                  – No. of incidents
                                  – Service degradation
      Approaches
                         Passive                         Active
                Listen into transactions and      Transactions are synthesised
                          analyse logs

                End-to-End                     SMART Cat
                Topaz                          Topaz
End User
Observes user   NetIQ                          Keynote
experience      …                              Netmechanic
                                               …
                Web Trends                     HP OpenView
Component       …                              IBM Tivoli
Focuses on
servers and                                    CA Unicentre
backend                                        BMC Patrol
processes
                                               ….
Business Process
Auditing (BPA)
Business Process Auditing
        Performance
        Availability
        Security
        Functionality                 Automated
        Correctness
        Accuracy                      Real-time
        Completion




                                  Fault
  Reporting         Alerting   Diagnosis &
                                 Remedy

          Service Level Management
Post-Deployment
Testing and SLM
• Testing can be used to synthesise business
  transactions
   – Interact with system through various interfaces
   – Collect and report metrics


• Transfer of technology predominantly used
  pre-deployment
Problems Detected
• Problems detected
   – End-To-End processes not available
   – Responses slow
   – Incorrect data
• Problems not detected
   – Issues localised to individual clients
   – Actual response times to all clients
Who Owns Production
Testing
•   The testing group?
•   The support group?
•   The operations group?
•   The application owners?
•   Marketing?

• Marriage of skills and technology required for
  efficiency


      “We don’t call that testing” syndrome
Which applications
most benefit
• Those with real time dependence for
  completion of vital business processes
   – High risk & dependence
       • Financial
       • Market reputation
   – Probity, Accountability and Liability
   – Potentially unreliable or difficult to manage
     technology dependencies
       • increasingly complex linkages
       • distributed application architectures
       • history of failure, problems


• Risk assessment:
   – SEVERITY X LIKELIHOOD
Roadmap

• Avoiding Production Problems
• Testing for Service Level Management
• Case Study
•   Considerations Unique to Production Testing
•   Information for SLM
•   Implementation Choices
•   Wrap-Up
Objectives

BPA Planning checklist:
 What are the critical business processes
 Who are the users
 What is the user experience
 How can success be determined
 How can the test be automated
Airline Reservation
Case Study
• Critical business processes
   –   Search available flights
   –   Make online booking
   –   Change booking
   –   Cancel booking
   –   Etc.
• Users
   – Consumers
   – Travel agents
   – Call centre
Airline Reservation
Case Study
• What is the user experience
   – Search for flights
      • Available
          – Function accessible
          – Response returned
      • Correct
          – Correct flights, source and destination, time,
            etc.
      • Complete
          – No missing flights with available seats
      • Responsive
          – With tolerable response times
• How can success be determined
   – What is the source of truth
Airline Reservation
Case Study
• Choose what to monitor based on risk
   – SEVERITY x LIKELIHOOD
   – Previous operational reliability problems, complex
     dynamic behaviour


• What was previously tested and will continue
  to function
   – Are there problems with distributed components
     continuing to run appropriately, e.g. tuxedo services,
     LDAP authentication, payment gateway not
     accessible
   – Are there problems with timely propagation/retrieval
     of data, e.g. flight data not retrieved consistently,
     bookings not updated in timely manner
Test Frameworks

• Outcomes have to be reported at business
  level, not application object level
   – Object level – Too Low Level for Audience
      getURL search.jsp
      saveForm, submitflight
      setParam, submitflight, startime, 200412011100
      …
      submitForm, submitflight
   – Business level – Appropriate for Audience
      searchFlight, return, 20041201110000, SYD, …


• “Action Word” approaches recommended
   – See Carl Nagle or Hans Buwalda’s work
Dynamic behaviour
• searchFlight, return, 200412011100, SYD, …
   – Won’t remain useful for long as production data is
     dynamic


• Dynamic input data
   searchFlight
   Type = return
   DepartTime = today()@10am + 1 month
   ReturnTime = today()@10am + 1 month + 5 days
   Depart = Sydney
   Arrive = Melbourne


• May even want to randomise data
   – Vary depart and arrive on successive runs
The Test Oracle

• Mechanisms for determining correct response
   – Get any response
   – Get a response containing predefined expected
     values
   – Expected values are checked using an oracle
      • E.g. formula determining whether valid date
         returned
   – Results are compared to reference data
      • 3rd party data feed
      • Trusted internal source, e.g. Mainframe
3rd Party Reference

• Trending against price data



                                 Price trend
                                  matches




           × Prices
             frozen
Airline Reservation
Case Study
• Verification failures for searchFlight response
Condition                 Code   Notify                     Test Oracle Required
Availability
No response received      FAIL   Ops support immediately    -
Response time
>= 8 secs, < 20 secs      WARN   App support if sustained   -
                                 more than 15 minutes
>= 20 secs                FAIL   Ops and App support        -
                                 immediately
Correctness
Gateway connection        FAIL   Ops support immediately    -
error page
Unexpected content        FAIL   App support immediately    -
Flight data isn’t for     FAIL   App support immediately    Confirm flights correct
intended routes and                                         – flight code lookup
dates                                                       table, dates consistent
No flights found          FAIL   App support immediately    -
Flight availability and   FAIL   App support immediately    Confirm against flight
pricing incorrect                                           availability and pricing
                                                            in Reservations
                                                            Mainframe using API
                                                            query
Roadmap

• Avoiding Production Problems
• Testing for Service Level Management
• Case Study
• Considerations Unique to
  Production Testing
• Information for SLM
• Implementation Choices
• Wrap-Up
Scheduling the test

• How often
   – 1 minutes, 5 minutes, hourly, daily, weekly
   – Depends on how quickly support can respond
• What business hours
   – 24x7, 9 to 5, higher frequency at certain events
• What about scheduled outages
   – Planned outages, public holidays
• Coordinating tests
   – Locking to prevent simultaneous tests
   – E.g. don’t check prices or submit orders unless
     logged in
   – Semaphores
Sensitive Data
• Frequently there may be sensitive information stored in
  scripts and test logs
    – Logins and passwords
    – Credit card ids
    – Personal details, e.g. phone numbers, ABNs, etc

• Where possible avoid
    – Use dummy accounts
    – Don’t log sensitive information
       • Can be difficult to control, eg. failure may save screen
         shot that then displays credentials
•   Use encryption
    – Sensitive data is stored in encrypted, but test engine still
      required key to send
    – At least it is obfuscated
Where tests should be
run from
• Many tools allow tests to be run from multiple locations
    – Simulate users of different geographies
    – Different connection speeds to report on a variety of user
      experiences
• Inside/outside firewall
    – Probably the largest concern
    – Consumer users outside, Corporate users inside
    – To provide end-to-end scenarios, may need combination
       • Scenario initiated internally, and end results are
          propagated to external, or vice-versa
       • External view of web may be verified using Test Oracle
          data that is internal
    – Agents may be deployed internal and external to run tests
Problems to Avoid

Need to be aware of impact of testing:
• Performance hits
• Volatile features
• Intrusive tests
• Biased results
• Compliance restrictions
• Impact on Business KPIs



• Taking measurements may distort the system
  being measured
Minimising the Effect
of Transactions
• Cost of Transaction
   – Financial – purchase flight may incur credit card
     merchant fee
   – Resource – seats unavailable until refund provided,
     searching places additional load on resource pool
• Reversing the transaction
   – Providing a refund, merchant fee may still apply
• What if the transaction is incomplete
   – What happens if refund process doesn’t
     occur/complete
• Compliance issues
   – Corporate
   – Legislative
Managing the Test
Impact
• Modifications to the application under test to
  cleanup data or control test effects
   – Manual fallback may be convenient option
• Test Objects
   – Dummy frequent flyer accounts
   – Dummy cost centres
• Testing the tests
   – Access to test environment pre-deployment
   – Endurance test that can be part of application test
     strategy
       • Transfer of load, stress and endurance test
         scripts
Roadmap

•   Avoiding Production Problems
•   Testing for Service Level Management
•   Case Study
•   Considerations Unique to Production Testing
• Information for SLM
• Implementation Choices
• Wrap-Up
Effective Reporting

• Who are the users of the reports, different
  expectations on presentation/content
   –   Business/Application Manager
   –   Operations
   –   Development
   –   Support
   –   SLM
• How do they access reports?
   – Web, email, Thick client
   – Which reports are real-time or batched
   – Is data summarised, or is original data accessible
 Historic Reporting
 •    Service level reports
 •    Trends
 •    Progress
 •    Post Mortem Analysis


Counts
Count = 525
Pass = 513 (97.71%)
Fail = 12 (2.29%)




Latency
Min = 4.339 sec
Avg = 8.253 sec
Max = 87.708 sec
Realtime Reporting
• Alerts
• Current status
• Diagnosis
Diagnosing root cause
and remedies
• Accessing fault and failure data for multiple
  components
   – Pinpoint failures
• Correlation is a skill
   – manual, expert analysis required
   – Variety of support:
      • Saved actual results
          – Unattended collection for debugging
      • Correlation with component performance analysis
• Automated correlation with component failure
  modes
   – Sophisticated “expert system”
   – Rules that correlate tested events to arrive at
     diagnosis of root cause(s)
                                     Fault Analysis
                                                                   Cant connect to
                                                                     OT agents




                 Cant connect to                                   Cant connect to                                                 Cant connect to
                       OT                                          OT Test Agent                                                    OT RefData
                                                                                                                                       Agent




Cant connect to Cant resolve IP of      Cant                                                            RefData agent       Requests cant      RefData Agent not
   internet       OT correctly       connect to                                                         servers failed       pass via OT           accepting
                                     OT gateway                                                                          firewall to RefData      connections
                                                                                                                                agent


                                                   Test agent       Requests cant      Test Agent not
                                                  server failed       pass via OT        processing
                                                                   firewall to Abbot    connections
                                                                         agent

                                                             SSH port
                                                          forward to Test
                                                            agents has
                                                              failed
Roadmap

•   Avoiding Production Problems
•   Testing for Service Level Management
•   Case Study
•   Considerations Unique to Production Testing
•   Information for SLM
• Implementation Choices
• Wrap-Up
Tool Requirements
• Evaluation Checklist
    –   Test script can interact with a variety of systems
          • GUI, Terminal, APIs, HTTP, SOAP, POP/SMTP, etc.
    –   Test script can respond to dynamic behaviour
    –   Agents can be deployed internal/external of the WAN
    –   Ability to control frequency
    –   Time based functions can be used to control execution
    –   Functions available for data manipulation for dynamic responses (time,
        extraction, etc.)
    –   Inter-process coordination between tests using locking/semaphores
    –   Test steps can be reported on business process steps, object actions can
        be hidden in reports
    –   Test outcomes saved to repository for later analysis
    –   Ability to export data for other purposes, e.g. trending, visualisation, etc.
    –   Reporting capability on stored data
    –   Online ability to drill into test data for problem diagnosis
    –   Alerting mechanisms to email, SMS, online dashboards
    –   Alerting can be controlled, ie. escalation, filtering
• Apply weighting to each criteria according to need
Implementation
Choices
• Available Commercial Tools/Services
   –   SmartTestTech - SMARTCat
   –   Mercury – Topaz
   –   Compuware –Vantage
   –   Keynote
   –   Lesser extent, enterprise monitoring tools:
        • BMC Patrol, Tivoli, HP Openview
• Home Brew Tools
   – Extensive support for testing protocols in open source
     frameworks
       • E.g. Java/Junit, .Net/Nunit, Perl/Ruby/Python
• Extend Existing In-house Regression Test Suites
   – Automated scripts may be adapted
      • Robot, QARun, WinRunner, Silk
   – Post results to Database
   – Provide reporting capability
      • e.g. Crystal Reports, Cognos, etc
Roadmap

•   Avoiding Production Problems
•   Testing for Service Level Management
•   Case Study
•   Considerations Unique to Production Testing
•   Information for SLM
•   Implementation Choices
• Wrap-Up
Wrap-up

• Strong business case
   – Benefit in bringing testing to the production world
   – Small %age availability increase translates to large $
   – Manages reputational risk with user base
   – Large investment in SLM
   – SLAs very ad-hoc and not measured
   – Uses tests to provide SLM reports to Business /
     Application Managers
   – Leveraging the investment in test resources
   – Protects overall investment
     Questions &
     Answers

Contact details:
   Dr Kelvin Ross
   SMART Testing Technologies Pty Ltd
   PO Box 131, West Burleigh, Q4219
   AUSTRALIA
   Ph: +61 7 5522 5131
   Email: kelvinr@smarttesttech.com

				
DOCUMENT INFO