Juni 2009 1
Björn Knafla
… but will show you some methods and approaches that can help you exploit parallelism and
push your AI further
Mittwoch, 17. Juni 2009 10
Outline
1. Parallel Hardware
1. Parallel Hardware
2. Example AI
1. Parallel Hardware
2. Example AI
3. Parallelization Approaches
1. Parallel Hardware
2. Example AI
3. Parallelization Approaches
4. Conclusion
L1
L1
Memory
L1
L1
L1 L1
L1 L1
L2
Memory
L1 L1 L1 L1 L1 L1
L2 L2 L2
Memory
L1 L1 L1 L1 L1 L1
L2 L2 L2
Memory
Sharing
Prefetching
SMT -> THROUGHPUT
Maximize
Throughput
maximize minimize
vs.
Locality Sharing
Reactive behavior
Steering
Steering Locomotion
Steering Locomotion
Acting
Steering Locomotion
Acting Animation
1
Core
1
Core
1
2
3
4
5
...
1
Core
1 Agent 1
2
3
4
5
...
1
Core
1 Agent 1
2
3
4
5
...
1
Core
1 Agent 1
2 Freestep Path 1
3
4
5
...
1 2
Core
Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool
Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool
Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Task Pool
Thread 1
Thread 2
Thread n
Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Enqueued
Tasks
Task Pool
Thread 1
Thread 2
Thread n
Tasks
Creates threads it manages once - user enters tasks, doesn’t interact directly with raw
threads
Enables scaling, automatic load balancing
Central queue (thread pool) or central + queue per thread (task pool)
Tasks can spawn (sub-)tasks in local thread queue
Work stealing
Enqueued
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Agent 1
Enqueued
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Agent 1
Enqueued
1
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Agent 1
Enqueued
1
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Agent 1 Agent 2
Enqueued
1 2
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
1 2 3 4
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
1 2 3 4
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
2 3 4
Tasks
Task Pool
Thread 1
1
Thread 2
Thread n
1 2
Enqueued
2 3 4
Tasks
Task Pool
Thread 1
Thread 2
Thread n
1 2
Enqueued
2 3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Thread n
1 2
Enqueued
2 3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Thread n
1 2
Enqueued
3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
2
Thread n
1 2
Enqueued
3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Thread n
1 2
Enqueued
3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
1 2
Enqueued
3 4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
1 2
Enqueued
4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
3
1 2
Enqueued
4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
1 2
Enqueued
4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3
1 2
Enqueued
4
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3
1 2
Enqueued
Tasks
Task Pool
Thread 1
Freestep Path 1 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3
1 2
Enqueued
Tasks
Task Pool
Thread 1
Freestep Path 1
Thread 2
Freestep Path 2
Thread n
Freestep path 3
1 2
Enqueued
Tasks
Task Pool
Thread 1
Freestep Path 1 Freestep Path 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3
1 2
Thread 1
Freestep Path 1 Freestep Path 4
Thread 2
Freestep Path 2
Thread n
Freestep path 3
Futures
Events / messages
Polling of a pathfinding system
etc.
Future
Path
Future
Path
Future
Freestep Path
Mittwoch, 17. Juni 2009 46
Future
Agent
Instance
Path
Future
Freestep Path
Mittwoch, 17. Juni 2009 46
Future
Agent
Instance
Path
Future
Freestep Path
Mittwoch, 17. Juni 2009 47
Freestep Path
Mittwoch, 17. Juni 2009 48
Freestep Path
Mittwoch, 17. Juni 2009 48
Freestep Path
Mittwoch, 17. Juni 2009 48
Potentially better
memory locality
Determinism needs
Scaling (with Task Pool)
care
Potentially better
Get syncing right
memory locality
Determinism needs
Scaling (with Task Pool)
care
Potentially better
Get syncing right
memory locality
Update
per time
step
Agent
instances
Agent
instances Agent Agent Agent Agent
1 2 3 4
Paralleliz a tio n p o te n t ia l!
Agent
instances Agent Agent Agent Agent
1 2 3 4
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames
Update 1 Updt. 3
Update 2 Update 4
Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames
Update 1 Updt. 3
Update 2 Update 4
Shared data structures like spatial data and accessing / using other game world entities
-> unorganized - bad for bandwidth + locality bad and not deterministic
-> syncing needed -> convoying -> contention -> serialization -> blocking bad for task
pools
… if sequential solution was order dependent, then we are non-deterministic and need
refactoring
Frames
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames
Update 1 Updt. 3
Update 2 Update 4
Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames
Public
Frame-1
Update 1 Updt. 3
Update 2 Update 4
Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames
Update 1 Updt. 3
Update 2 Update 4
Split instance data into public data (read only) and private data (writable by instance owner)
Duplication of state
Public data represents state from last time step / frame
Annotation: add random number generator instances per agent private data
Frames
Update 1 Updt. 3
Update 2 Update 4
Update 1 Updt. 3
Update 2 Update 4
collection
Frames
Update 1 Updt. 3
Update 2 Update 4
Collection stage
Mittwoch, 17. Juni 2009 61
collection
Frames
Update 1 Updt. 3
Update 2 Update 4
Collection stage
Mittwoch, 17. Juni 2009 61
collection
Frames
Update 1 Updt. 3
Update 2 Update 4
Collection stage
Mittwoch, 17. Juni 2009 61
collection
Frames
Update 1 Updt. 3
Update 2 Update 4
Collection stage
Mittwoch, 17. Juni 2009 62
Update 1 Updt. 3
Update 2 Update 4
Private and public data a bit like double buffering, could also be handled with double
buffering…
On to the next time step
Frames
1 2
Private and public data a bit like double buffering, could also be handled with double
buffering…
On to the next time step
Parallel Agents
Pros Cons
Scaling possible
Implicit syncing
Bad locality
Bad locality
Unsolved problem from last approach: bad locality and bandwidth usage because of random/
unordered access to data structures -> lets improve this
Approach builds upon last approach
Approach supported by Valve’s Source Engine, used by Insomniac and Guerilla Games for
Killzone2
Mittwoch, 17. Juni 2009 68
Collection
stage
Collection Instance
stage update
stage
logic
logic
sensing
logic logic
sensing
logic logic
sensing
logic logic
sensing
logic logic
sensing
pathfinding
logic logic
sensing
pathfinding
logic logic
sensing
pathfinding
Aspects
Mittwoch, 17. Juni 2009 71
Group aspects processing and data -> and handle it by specialized systems
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure
Group aspects processing and data -> and handle it by specialized systems
Agent Entity Sensor Pathfinding data
inst. inst. spatial data
structure
Group aspects processing and data -> and handle it by specialized systems
Pathfinding data Sensor Agent
Entity
Path. Sensor
spatial data
structure
AI logic
inst.
Group aspects processing and data -> and handle it by specialized systems
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 75
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 75
Path. Sensor AI logic
system system system
Mittwoch, 17. Juni 2009 76
Path.
system
Innards of a system
Sensor system data
Innards of a system
Sensor system data
Collect data
updates and
sensing
requests
Exploit internal data access pattern by allow injection of Insomniac “SPU” shaders to generate
other data that might be useful.
Full control of context for system and “SPU” shader -> no side effects
-> easy to exploit parallelism offered by system developer
Data Parallel Systems
Pros Cons
Data exchange between systems -> walk through system outputs and the inputs needed
from last system to first and design the systems and their data accordingly (Insomniac’s
advice).
Porting system for system possible
Data Parallel Systems
Pros Cons
Lean Data Structures
Locality
Scaling
Deterministic
Implicit Synchronization
Explicit Context
Possibilities: “Shaders”
Twitter: @bjoernknafla