Docstoc

Automated Testing

Document Sample
Automated Testing Powered By Docstoc
					Automated Testing of Massively
     Multi-Player Games

      Lessons Learned from
        The Sims Online
         Larry Mellon
         Spring 2003
 Context: What Is
Automated Testing?
    Classes Of Testing

 Feature          System
Regression         Stress
   QA              Load
 Developer        Random
                   Input
            Automation Components

    Startup         Repeatable, Sync’ed    Collection
        &                                        &
    Control             Test Inputs         Analysis




System Under Test      System Under Test   System Under Test
What Was Not Automated?

        Startup & Control

   Repeatable, Synchronized Inputs
         Results Analysis



       Visual Effects
Lessons Learned: Automated Testing
Design & Initial Implementation                         1/
 Architecture, Scripting Tests, Test Client               3
 Initial Results


                                                        1/
Fielding: Analysis & Adaptations                          3


Wrap-up & Questions
 What worked best, what didn’t                          1/
                                                          3
 Tabula Rasa: MMP / SPG
                                                  Time
                                              (60 Minutes)
             Design Constraints

Load                    Automation
                 (Repeatable, Synchronized Input)
                       (Data Management)
Regression


Churn Rate       Strong Abstraction
Single, Data Driven Test Client

Regression                     Load
                 Reusable
              Scripts & Data


             Single
             API
             Test Client
              Data Driven Test Client
“Testing feature correctness”                       “Testing system performance”

      Regression                                                  Load
                                      Reusable
                                   Scripts & Data

                                Single
                                API

                                  Test Client
                                Single
                                API
                                                                 Pass/Fail
   Key Game States
                                  Configurable                 Responsiveness
                                 Logs & Metrics
    Problem: Testing Accuracy
• Load & Regression: inputs must be
  – Accurate
  – Repeatable
• Churn rate: logic/data in constant motion
  – How to keep testing client accurate?
• Solution: game client becomes test client
  – Exact mimicry
  – Lower maintenance costs
Test Client == Game Client
           Test Client   Game Client


 Test Control                  Game GUI


   State                               State

                Commands
           Presentation Layer
      Client-Side Game Logic
Game Client: How Much To Keep?
          Game Client

              View
       Presentation Layer
             Logic
         What Level To Test At?
                   Game Client
                     View
Mouse
                 Presentation Layer
Clicks


                       Logic

          Regression: Too Brittle (pixel shift)
                   Load: Too Bulky
           What Level To Test At?
                   Game Client
                       View


Internal         Presentation Layer
Events
                       Logic
                Regression: Too Brittle
               (Churn Rate vs Logic & Data)
  Gameplay: Semantic Abstractions
 Basic gameplay changes less frequently
 than UI or protocol implementations.

            NullView Client
View                                      ~¾
             Presentation Layer
Logic       Buy Lot       Enter Lot       ~¼
                      …
        Buy Object           Use Object
   Scriptable User Play Sessions
• SimScript
  – Collection: Presentation Layer “primitives”
  – Synchronization: wait_until, remote_command
  – State probes: arbitrary game state
    • Avatar’s body skill, lamp on/off, …
• Test Scripts: Specific / ordered inputs
  – Single user play session
  – Multiple user play session
  Scriptable User Play Sessions
• Scriptable play sessions: big win
  – Load: tunable based on actual play
  – Regression: constantly repeat hundreds
    of play sessions, validating correctness
• Gameplay semantics: very stable
  – UI / protocols shifted constantly
  – Game play remained (about) the same
  SimScript: Abstract User Actions

include_script     setup_for_test.txt
enter_lot          $alpha_chimp
wait_until         game_state inlot

chat        I’m an Alpha Chimp, in a Lot.
log_message Testing object purchase.
log_objects
buy_object chair            10 10
log_objects
     SimScript: Control & Sync

# Have a remote client use the chair
remote_cmd $monkey_bot
             use_object chair sit

set_data      avatar   reading_skill 80
set_data      book     unlock
use_object    book     read
wait_until    avatar   reading_skill 100
set_recording on
    Client
Implementation
    Composable Client
                     - Scripts
       Generators
 Event Generators
EventGenerators      - Cheat Console
Event
                     - GUI

           Presentation Layer

             Game Logic
                  Composable Client

                                                              - Console
- Scripts          Generators
             Event Generators
            EventGenerators                     Systems
                                        Viewing Systems
                                       Viewing Systems        - Lurker
- Console   Event                      Viewing                - GUI
- GUI




                          Presentation Layer

                            Game Logic


            Any / all components may be loaded per instance
 Lesson: View & Logic Entangled
          Game Client


View

Logic
  Few Clean Separation Points
           Game Client


View
         Presentation Layer
Logic
 Solution: Refactored for Isolation
            Game Client


View
          Presentation Layer
Logic
  Lesson: NullView Debugging
                 Without (legacy) view system
                 attached, tracing was “difficult”.

             ?
        Presentation Layer
Logic
  Solution: Embedded Diagnostics

                          Timeout Handlers
           Diagnostics
          Diagnostics
          Diagnostics     …



         Presentation Layer
Logic
Talk Outline: Automated Testing
Design & Initial Implementation
   Architecture & Design                    1/
   Test Client                                3
   Initial Results

                                            1/
Lessons Learned: Fielding                     3


Wrap-up & Questions
                                            1/
                                              3

                                      Time
                                  (60 Minutes)
   Mean Time Between Failure

• Random Event, Log & Execute
• Record client lifetime / RAM
• Worked: just not relevant in
  early stages of development
 –Most failures / leaks found were
  not high-priority at that time, when
  weighed against server crashes
           Monkey Tests
• Constant repetition of simple,
  isolated actions against servers
• Very useful:
  – Direct observation of servers while
    under constant, simple input
  – Server processes “aged” all day
• Examples:
  – Login / Logout
  – Enter House / Leave House
    QA Test Suite Regression

• High false positive rate & high
  maintenance
 –New bugs / old bugs
 –Shifting game design
 –“Unknown” failures

 Not helping in day to day work.
Talk Outline: Automated Testing
Design & Initial Implementation
                                                  ¼
Fielding: Analysis&Adaptations
  Non-Determinism
  Maintenance Overhead
  Solutions & Results
                                                  ½
      Monkey / Sniff / Load / Harness


Wrap-up & Questions
                                                  ¼
                                            Time
                                        (60 Minutes)
Analysis: Testing Isolated Features
    Analysis: Critical Path
Test Case: Can an Avatar Sit in a Chair?
use_object ()                      Failures on the
   buy_object ()                   Critical Path
      enter_house ()
                                   block access to
                                   much of the
          buy_house ()             game.
                create_avatar ()
                   login ()
     Solution: Monkey Tests
• Primitives placed in Monkey Tests
  – Isolate as much possible, repeat 400x
  – Report only aggregate results
    • Create Avatar: 93% pass (375 of 400)
• “Poor Man’s” Unit Test
  – Feature based, not class based
  – Limited isolation
  – Easy failure analysis / reporting
Talk Outline: Automated Testing
Design & Initial Implementation
                                                 1/
                                                   3
Lessons Learned: Fielding
  Non-Determinism
  Maintenance Costs                              1/
                                                   3
  Solution Approaches
      Monkey / Sniff / Load / Harness



Wrap-up & Questions
                                                 1/
                                                   3

                                            Time
                                        (60 Minutes)
 Analysis: Maintenance Cost

• High defect rate in game code
 –Code Coupling: “side effects”
 –Churn Rate: frequent changes
• Critical Path: fatal dependencies
• High debugging cost
 – Non-deterministic, distributed logic
                Turnaround Time
                                                   Regression
Tests were too far removed
from introduction of defects.
                                           Smoke


                                 Build
                                                     days
        Bug
                       Checkin
    Introduced
                          Time to Fix
    Development
                       Cost of Detection
Critical Path Defects Were Very Costly

                                                   Regression


                                           Smoke

Impact
  on                               Build
Others                                               days
             Bug
                       Checkin
         Introduced

                         Time to Fix
         Development
                       Cost of Detection
                    Solution: Sniff Test
                                                       Regression
Pre-Checkin Regression: don’t let
broken code into Mainline.
                                               Smoke


                                     Checkin

                                     Working
                                      Code

        Candidate
                      Sniff
          Code
                      Pass / Fail,
                      Diagnostics
     Development
  Solution: Hourly Diagnostics

• SniffTest Stability Checker
  – Emulates a developer
  – Every hour, sync / build / test
• Critical Path monkeys ran non-stop
  – Constant “baseline”
• Traffic Generation
  – Keep the pipes full & servers aging
  – Keep the DB growing
 Analysis: CONSTANT SHOUTING
      IS REALLY IRRITATING

• Bugs spawned many, many, emails
• Solution: Report Managers
  – Aggregates / correlates across tests
  – Filters known defects
  – Translates common failure reports to
    their root causes
• Solution: Data Managers
  – Information Overload: Automated
    workflow tools mandatory
             ToolKit Usability

•   Workflow automation
•   Information management
•   Developer / Tester “push button” ease of use
•   XP flavour: increasingly easy to run tests
    – Must be easier to run than avoid to running
    – Must solve problems “on the ground now”
Sample Testing Harness Views
          Load Testing: Goals
•   Expose issues that only occur at scale
•   Establish hardware requirements
•   Establish response is playable @ scale
•   Emulate user behaviour
    – Use server-side metrics to tune test
      scripts against observed Beta behaviour
• Run full scale load tests daily
   Load Testing: Data Flow
       Resource    Load Testing Team   Debugging
        Metrics                          Data
                        Client
                        Metrics

                   Load Control Rig




        Test
      Test                 Test
                         Test                Test
                                           Test
     Test               Test
                         Client           Test
                                           Client
      Client
      Client             Client            Client
     Client             Client            Client

 Test Driver CPU    Test Driver CPU    Test Driver CPU

                        Game
                        Traffic

System                                         Internal
                     Server Cluster             Probes
Monitors
 Load Testing: Lessons Learned
• Very successful
  – “Scale&Break”: up to 4,000 clients
• Some conflicting requirements
  w/Regression
  – Continue on fail
  – Transaction tracking
  – Nullview client a little “chunky”
           Current Work
• QA test suite automation
• Workflow tools
• Integrating testing into the new
  features design/development
  process
• Planned work
  – Extend Esper Toolkit for general use
  – Port to other Maxis projects
Talk Outline: Automated Testing
Design & Initial Implementation            1/
                                             3


Lessons Learned: Fielding                  1/
                                             3

Wrap-up & Questions
  Biggest Wins / Losses                    1/
                                             3
  Reuse
  Tabula Rasa: MMP & SSP              Time
                                  (60 Minutes)
                Biggest Wins
• Presentation Layer Abstraction
    – NullView client
    – Scripted playsessions: powerful for
      regression & load
•   Pre-Checkin Snifftest
•   Load Testing
•   Continual Usability Enhancements
•   Team
    – Upper Management Commitment
    – Focused Group, Senior Developers
              Biggest Issues
• Order Of Testing
  – MTBF / QA Test Suites should have come last
  – Not relevant when early & game too unstable
  – Find / Fix Lag: too distant from Development
• Changing TSO’s Development Process
  – Tool adoption was slow, unless mandated
• Noise
  – Constant Flood Of Test Results
  – Number of Game Defects, Testing Defects
  – Non-Determinism / False Positives
          Tabula Rasa



How Would I Start The Next Project?
           Tabula Rasa


        PreCheckin Sniff Test




There’s just no reason to let code break.
                       Tabula Rasa

PreCheckin SniffTest      Keep Mainline working




            Hourly Monkey Tests

      Useful baseline & keeps servers aging.
                       Tabula Rasa

PreCheckin SniffTest        Keep Mainline working

Hourly Stability Checkers   Baseline for Developers




             Dedicated Tools Group

     Continual usability enhancements adapted tools
     To meet “on the ground” conditions.
                       Tabula Rasa

PreCheckin SniffTest        Keep Mainline working

Hourly Stability Checkers   Baseline for Developers

Dedicated Tools Group        Easy to Use == Used




              Executive Level Support

 Mandates required to shift how entire teams operated.
                       Tabula Rasa

PreCheckin SniffTest        Keep Mainline working

Hourly Stability Checkers   Baseline for Developers

Dedicated Tools Group        Easy to Use == Used

Executive Support           Radical Shifts in Process


              Load Test: Early & Often
                       Tabula Rasa

PreCheckin SniffTest        Keep Mainline working

Hourly Stability Checkers   Baseline for Developers

Dedicated Tools Group        Easy to Use == Used

Executive Support           Radical shifts in Process

Load Test: Early & Often      Break it before Live


        Distribute Test Development &
        Ownership Across Full Team
  Next Project: Basic Infrastructure

Control Harness   Reference Client      Self Test
 For Clients &      Reference Feature   Living Doc
 Components

Regression
 Engine
    Building Features: NullView First

             Reference Client       Self Test
Control         Reference Feature   Living Doc
Harness

             NullView Client
Regression
 Engine
   Build The Tests With The Code

Control             Reference Client       Self Test
Harness                Reference Feature

Regression
 Engine
                    NullView Client
                       Login               Monkey Test


             Nothing Gets Checked In Without A
             Working Monkey Test.
                  Conclusion
• Estimated Impact on MMP: High
  – Sniff Test: kept developers working
  – Load Test: ID’d critical failures pre-launch
  – Presentation Layer: scriptable play sessions
• Cost To Implement: Medium
  – Much Lower for SSP Games

    Repeatable, coordinated inputs @ scale
    and pre-checkin regression were very
    significant schedule accelerators.
 Conclusion




Go For It…
Talk Outline: Automated Testing
Design & Initial Implementation            1/
                                             3


Lessons Learned: Fielding                  1/
                                             3


Wrap-up
                                           1/
Questions                                    3

                                      Time
                                  (60 Minutes)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:2/28/2013
language:English
pages:64