Automated Testing

Automated Testing of Massively Multi-Player Games
Lessons Learned from The Sims Online
Larry Mellon Spring 2003
Context: What Is Automated Testing?
Classes Of Testing
Feature Regression
QA
System Stress
Load Random Input
Developer
Automation Components
Startup
&
Control
Repeatable, Synced Test Inputs
Collection
&
Analysis
System Under Test
System Under Test
System Under Test
What Was Not Automated?

Startup & Control Repeatable, Synchronized Inputs Results Analysis
Visual Effects
Lessons Learned: Automated Testing

Design & Initial Implementation
Architecture, Scripting Tests, Test Client Initial Results
1/ 3
Fielding: Analysis & Adaptations
1/ 3
Wrap-up & Questions

What worked best, what didnt Tabula Rasa: MMP / SPG
1/ 3
Time (60 Minutes)
Design Constraints
Load
Automation
(Repeatable, Synchronized Input) (Data Management)
Regression Churn Rate
Strong Abstraction
Single, Data Driven Test Client

Regression
Reusable Scripts & Data Single API
Load
Test Client
Data Driven Test Client

Testing feature correctness Testing system performance
Regression
Reusable Scripts & Data Single API
Load
Test Client
Single API Key Game States
Configurable Logs & Metrics
Pass/Fail Responsiveness
Problem: Testing Accuracy

Load & Regression: inputs must be
Accurate Repeatable
Churn rate: logic/data in constant motion

How to keep testing client accurate?
Solution: game client becomes test client

Exact mimicry Lower maintenance costs
Test Client == Game Client

Test Client Game Client
Test Control
Game GUI
State
State
Commands
Presentation Layer
Client-Side Game Logic
Game Client: How Much To Keep?

Game Client
View
Presentation Layer
Logic
What Level To Test At?

Mouse Clicks
Game Client View

Presentation Layer
Logic
Regression: Too Brittle (pixel shift) Load: Too Bulky
What Level To Test At?

Game Client View
Internal Events
Presentation Layer
Logic
Regression: Too Brittle
(Churn Rate vs Logic & Data)
Gameplay: Semantic Abstractions

Basic gameplay changes less frequently than UI or protocol implementations.
View
Logic
NullView Client
Presentation Layer
Buy Lot Buy Object
~ ~
Enter Lot Use Object
Scriptable User Play Sessions

SimScript
Collection: Presentation Layer primitives Synchronization: wait_until, remote_command State probes: arbitrary game state
Avatars body skill, lamp on/off,
Test Scripts: Specific / ordered inputs

Single user play session Multiple user play session
Scriptable User Play Sessions

Scriptable play sessions: big win
Load: tunable based on actual play Regression: constantly repeat hundreds of play sessions, validating correctness
Gameplay semantics: very stable

UI / protocols shifted constantly Game play remained (about) the same
SimScript: Abstract User Actions

include_script enter_lot wait_until setup_for_test.txt $alpha_chimp game_state inlot
chat Im an Alpha Chimp, in a Lot. log_message Testing object purchase. log_objects buy_object chair 10 10 log_objects
SimScript: Control & Sync

# Have a remote client use the chair remote_cmd $monkey_bot use_object chair sit
set_data avatar set_data book use_object book wait_until avatar set_recording on reading_skill 80 unlock read reading_skill 100
Client Implementation
Composable Client
Event Generators Generators Event Event Generators
- Scripts - Cheat Console - GUI
Presentation Layer
Game Logic
Composable Client
- Scripts - Console - GUI
Event Generators Generators Event Event Generators
Viewing Systems Systems Viewing Viewing Systems
- Console - Lurker - GUI
Presentation Layer
Game Logic
Any / all components may be loaded per instance
Lesson: View & Logic Entangled

Game Client View Logic
Few Clean Separation Points

Game Client View
Presentation Layer
Logic
Solution: Refactored for Isolation

Game Client View
Presentation Layer
Logic
Lesson: NullView Debugging

Without (legacy) view system attached, tracing was difficult.
?
Presentation Layer Logic
Solution: Embedded Diagnostics

Diagnostics Diagnostics Diagnostics
Timeout Handlers
Presentation Layer Logic
Talk Outline: Automated Testing

Architecture & Design
Test Client
1/ 3
Initial Results Lessons Learned: Fielding

1/ 3
Wrap-up & Questions
1/ 3
Time (60 Minutes)
Mean Time Between Failure

Random Event, Log & Execute Record client lifetime / RAM Worked: just not relevant in early stages of development
Most failures / leaks found were not high-priority at that time, when weighed against server crashes
Monkey Tests
Constant repetition of simple, isolated actions against servers Very useful:
Direct observation of servers while under constant, simple input Server processes aged all day
Examples:
Login / Logout Enter House / Leave House
QA Test Suite Regression

High false positive rate & high maintenance
New bugs / old bugs Shifting game design Unknown failures
Not helping in day to day work.

Fielding: Analysis&Adaptations
Non-Determinism
Maintenance Overhead Solutions & Results
Monkey / Sniff / Load / Harness

Time (60 Minutes)
Wrap-up & Questions
Analysis: Testing Isolated Features
Analysis: Critical Path

Test Case: Can an Avatar Sit in a Chair?
use_object () buy_object () enter_house () buy_house () create_avatar () login ()
Failures on the Critical Path block access to much of the game.
Solution: Monkey Tests

Primitives placed in Monkey Tests
Isolate as much possible, repeat 400x Report only aggregate results
Create Avatar: 93% pass (375 of 400)
Poor Mans Unit Test

Feature based, not class based Limited isolation Easy failure analysis / reporting

1/ 3
Lessons Learned: Fielding

Non-Determinism
Maintenance Costs
Solution Approaches
Monkey / Sniff / Load / Harness
1/ 3
Wrap-up & Questions
1/ 3
Time (60 Minutes)
Analysis: Maintenance Cost

High defect rate in game code
Code Coupling: side effects Churn Rate: frequent changes
Critical Path: fatal dependencies High debugging cost

Non-deterministic, distributed logic
Turnaround Time
Regression
Tests were too far removed from introduction of defects.
Smoke Build days Bug Introduced Checkin
Time to Fix
Development
Cost of Detection
Critical Path Defects Were Very Costly

Regression Smoke
Impact on Others
Build days Bug Introduced Checkin

Time to Fix
Development
Cost of Detection
Solution: Sniff Test

Pre-Checkin Regression: dont let broken code into Mainline.
Smoke Checkin
Working Code
Regression
Candidate Code
Sniff
Pass / Fail, Diagnostics
Development
Solution: Hourly Diagnostics

SniffTest Stability Checker
Emulates a developer Every hour, sync / build / test
Critical Path monkeys ran non-stop

Constant baseline
Traffic Generation
Keep the pipes full & servers aging Keep the DB growing
Analysis: CONSTANT SHOUTING

IS REALLY IRRITATING
Bugs spawned many, many, emails Solution: Report Managers

Aggregates / correlates across tests Filters known defects Translates common failure reports to their root causes
Solution: Data Managers

Information Overload: Automated workflow tools mandatory
ToolKit Usability
Workflow automation Information management Developer / Tester push button ease of use XP flavour: increasingly easy to run tests
Must be easier to run than avoid to running Must solve problems on the ground now
Sample Testing Harness Views
Load Testing: Goals

Expose issues that only occur at scale Establish hardware requirements Establish response is playable @ scale Emulate user behaviour
Use server-side metrics to tune test scripts against observed Beta behaviour
Run full scale load tests daily
Load Testing: Data Flow

Resource Metrics Load Testing Team
Client Metrics
Debugging Data
Load Control Rig
Test Test Test Client Client Client Test Driver CPU
Test Test Test Client Client Client Test Driver CPU Game Traffic
Test Test Test Client Client Client Test Driver CPU
System Monitors
Server Cluster
Internal Probes
Load Testing: Lessons Learned

Very successful
Scale&Break: up to 4,000 clients
Some conflicting requirements w/Regression

Continue on fail Transaction tracking Nullview client a little chunky
Current Work
QA test suite automation Workflow tools Integrating testing into the new features design/development process Planned work
Extend Esper Toolkit for general use Port to other Maxis projects

1/ 3
1/ 3
Wrap-up & Questions

Biggest Wins / Losses Reuse Tabula Rasa: MMP & SSP
1/ 3
Time (60 Minutes)
Biggest Wins
Presentation Layer Abstraction
NullView client Scripted playsessions: powerful for regression & load
Pre-Checkin Snifftest Load Testing Continual Usability Enhancements Team

Upper Management Commitment Focused Group, Senior Developers
Biggest Issues
Order Of Testing
MTBF / QA Test Suites should have come last Not relevant when early & game too unstable Find / Fix Lag: too distant from Development
Changing TSOs Development Process

Tool adoption was slow, unless mandated
Noise
Constant Flood Of Test Results Number of Game Defects, Testing Defects Non-Determinism / False Positives
Tabula Rasa
How Would I Start The Next Project?
Tabula Rasa
PreCheckin Sniff Test
Theres just no reason to let code break.
Tabula Rasa
PreCheckin SniffTest
Keep Mainline working
Hourly Monkey Tests

Useful baseline & keeps servers aging.
Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers
Keep Mainline working Baseline for Developers
Dedicated Tools Group

Continual usability enhancements adapted tools To meet on the ground conditions.
Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group
Keep Mainline working Baseline for Developers Easy to Use == Used
Executive Level Support

Mandates required to shift how entire teams operated.
Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group Executive Support
Keep Mainline working Baseline for Developers Easy to Use == Used Radical Shifts in Process
Load Test: Early & Often
Tabula Rasa
PreCheckin SniffTest Hourly Stability Checkers Dedicated Tools Group Executive Support Load Test: Early & Often
Keep Mainline working Baseline for Developers Easy to Use == Used Radical shifts in Process Break it before Live
Distribute Test Development & Ownership Across Full Team
Next Project: Basic Infrastructure

Control Harness For Clients & Components
Regression Engine
Reference Client
Reference Feature
Self Test
Living Doc
Building Features: NullView First

Reference Client
Control Harness
Reference Feature Self Test Living Doc
NullView Client
Regression Engine
Build The Tests With The Code

Control Harness
Regression Engine
Reference Client
Reference Feature
Self Test
NullView Client
Login Monkey Test
Nothing Gets Checked In Without A Working Monkey Test.
Conclusion
Estimated Impact on MMP: High
Sniff Test: kept developers working Load Test: IDd critical failures pre-launch Presentation Layer: scriptable play sessions
Cost To Implement: Medium

Much Lower for SSP Games
Repeatable, coordinated inputs @ scale and pre-checkin regression were very significant schedule accelerators.
Conclusion
Go For It

1/ 3
1/ 3
Wrap-up
Questions
1/ 3
Time (60 Minutes)

Automated Testing

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Automated Testing

Diunggah oleh

Hak Cipta:

Format Tersedia

Automated Testing of Massively Multi-Player Games

Lessons Learned from The Sims Online

Larry Mellon Spring 2003

Context: What Is Automated Testing?

Repeatable, Synced Test Inputs

System Under Test

System Under Test

System Under Test

What Was Not Automated?

Lessons Learned: Automated Testing

Fielding: Analysis & Adaptations

Wrap-up & Questions

Time (60 Minutes)

Regression Churn Rate

Single, Data Driven Test Client

Data Driven Test Client

Problem: Testing Accuracy

Churn rate: logic/data in constant motion

Solution: game client becomes test client

Test Client == Game Client

Client-Side Game Logic

Game Client: How Much To Keep?

What Level To Test At?

Game Client View

What Level To Test At?

Gameplay: Semantic Abstractions

Enter Lot Use Object

Scriptable User Play Sessions

Test Scripts: Specific / ordered inputs

Scriptable User Play Sessions

Gameplay semantics: very stable

SimScript: Abstract User Actions

SimScript: Control & Sync

- Scripts - Cheat Console - GUI

Event Generators Generators Event Event Generators

Viewing Systems Systems Viewing Viewing Systems

- Console - Lurker - GUI

Any / all components may be loaded per instance

Lesson: View & Logic Entangled

Few Clean Separation Points

Solution: Refactored for Isolation

Lesson: NullView Debugging

Solution: Embedded Diagnostics

Presentation Layer Logic

Talk Outline: Automated Testing

Architecture & Design

Initial Results Lessons Learned: Fielding

Wrap-up & Questions

Time (60 Minutes)

Mean Time Between Failure

QA Test Suite Regression

Not helping in day to day work.

Talk Outline: Automated Testing

Wrap-up & Questions

Analysis: Testing Isolated Features

Analysis: Critical Path

Failures on the Critical Path block access to much of the game.

Solution: Monkey Tests

Poor Mans Unit Test

Talk Outline: Automated Testing

Lessons Learned: Fielding

Wrap-up & Questions

Time (60 Minutes)

Analysis: Maintenance Cost