CUDA
- Architecture and programming model
- Strengths and limitations of the GPU
- Example applications
OpenCL
- Architecture and programming model
- Comparison with CUDA
- Example applications
References
OpenCL Specication
- http://www.khronos.org/registry/cl/specs/opencl-1.0.29.pdf
http://www.nvidia.com/object/cuda_get.html
CUDA Driver
- Software to communicate with the GPU
CUDA Toolkit
- Compiler, libraries, emulator, development tools
CUDA SDK
- Example programs
Hardware Architecture
High-latency, high-bandwidth
No global synchronization*
Thread Structure
2 main parts:
- Host programs execute on the host
- Kernels execute on one or more OpenCL devices
Synchronization
Data parallel
- One-to-one mapping between work-items and elements in a memory object
- Work-groups can be dened explicitly (like CUDA) or implicitly (specify the
number of work-items and OpenCL creates the work-groups)
Task parallel
- Kernel is executed independent of an index space
- Other ways to express parallelism: enqueueing multiple tasks, using device-specic
vector types, etc.
Synchronization
- Possible between items in a work-group
- Possible between commands in a context command queue
OpenCL Program Flow
Tomosynthesis mammography
3D Cardiac CT
Vascular segmentation
Hyperspectral imaging
Image manipulation
(convolution, ltering)
Phase unwrapping
Ray tracing
Compiler optimizations
Thank you!