Anda di halaman 1dari 4

5/11/2018 Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards - Blame Arnon

Home About me

IT

overheard

haskell
Blame Arnon
GPUs

gcc

Databases

SQream DB

Uncategorized

Network Security

Talks

Posted by Arnon Shimoni #compute


11/11/2016 #compute_52
#cuda

Matching SM architectures (CUDA #cuda arch

arch and CUDA gencode) for various


#gencode
#gpu

NVIDIA cards #gtx 1080


#nvcc

gcc GPUs #nvcc arch


#nvcc flags
I’ve seen some confusion regarding NVIDIA’s nvcc sm flags and what they’re used for: #nvcc sm
When compiling with NVCC, the arch flag (‘-arch‘) specifies the name of the NVIDIA GPU architecture #nvidia
that the CUDA files will be compiled for. #nvidia sm
Gencodes (‘-gencode‘) allows for more PTX generations, and can be repeated many times for #pascal
different architectures. #sm

When should different ‘gencodes’ or ‘cuda


Recent Posts
arch’ be used?
When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your
I built a GeoJSON to CSV parser in
most used GPU cards. This will enable faster runtime, because code generation will occur during
Python
compilation. Posted by Arnon Shimoni
If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT
compiler by the CUDA driver. How to check which CUDA version is
installed on Linux
Sometimes, you would also like to enable some backwards compatibility. In those cases, you want to Posted by Arnon Shimoni
add some more ‘-gencode‘ flags.

When would SQream DB be right for


Find out which GPU you have, and which CUDA version you have first.
me?

Supported SM and Gencode variations


Posted by Arnon Shimoni

Below are the supported sm variations and sample cards from that generation
Matching SM architectures (CUDA
arch and CUDA gencode) for various
Supported on CUDA 7 and later NVIDIA cards
Fermi (CUDA 3.2 and later, deprecated from CUDA 9): Posted by Arnon Shimoni
SM20 or SM_20, compute_30 – Older cards such as GeForce 400, 500, 600, GT-630

Kepler (CUDA 5 and later): SQream DB string cheat-sheet


Posted by Arnon Shimoni
SM30 or SM_30, compute_30 – Kepler architecture (generic – Tesla K40/K80, GeForce 700, GT-
730)
Adds support for unified memory programming

SM35 or SM_35, compute_35 – More specific Tesla K40


Adds support for dynamic parallelism. Shows no real benefit over SM30 in my experience.

SM37 or SM_37, compute_37 – More specific Tesla K80


Adds a few more registers. Shows no real benefit over SM30 in my experience

http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ 1/4
5/11/2018 Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards - Blame Arnon

Maxwell (CUDA 6 and later):


SM50 or SM_50, compute_50 – Tesla/Quadro M series

SM52 or SM_52, compute_52 – Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X

SM53 or SM_53, compute_53 – Tegra (Jetson) TX1 / Tegra X1

Pascal (CUDA 8 and later)


SM60 or SM_60, compute_60 – GP100/Tesla P100 – DGX-1 (Generic Pascal)

SM61 or SM_61, compute_61 – GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp,
Tesla P40, Tesla P4

SM62 or SM_62, compute_62 – Drive-PX2, Tegra (Jetson) TX2, Denver-based GPU

Volta (CUDA 9 and later)


SM70 or SM_70, compute_70 – Tesla V100

SM71 or SM_71, compute_71 – probably not implemented

SM72 or SM_72, compute_72 – currently unknown

Sample flags
According to NVIDIA:

The arch= clause of the -gencode= command-line option to nvcc specifies the front-end
compilation target and must always be a PTX version. The code= clause specifies the back-
end compilation target and can either be cubin or PTX or both. Only the back-end target
version(s) specified by the code= clause will be retained in the resulting binary; at least one
must be PTX to provide Volta compatibility.

Sample flags for generation on CUDA 7 for maximum compatibility:

1. -arch=sm_30 \
2. -gencode=arch=compute_20,code=sm_20 \
3. -gencode=arch=compute_30,code=sm_30 \
4. -gencode=arch=compute_50,code=sm_50 \
5. -gencode=arch=compute_52,code=sm_52 \
6. -gencode=arch=compute_52,code=compute_52
Sample flags for generation on CUDA 8 for maximum compatibility:

1. -arch=sm_30 \
2. -gencode=arch=compute_20,code=sm_20 \
3. -gencode=arch=compute_30,code=sm_30 \
4. -gencode=arch=compute_50,code=sm_50 \
5. -gencode=arch=compute_52,code=sm_52 \
6. -gencode=arch=compute_60,code=sm_60 \
7. -gencode=arch=compute_61,code=sm_61 \
8. -gencode=arch=compute_61,code=compute_61
Sample flags for generation on CUDA 9 for maximum compatibility. Note the removed SM_20:

1. -arch=sm_30 \
2. -gencode=arch=compute_30,code=sm_30 \
3. -gencode=arch=compute_50,code=sm_50 \
4. -gencode=arch=compute_52,code=sm_52 \
5. -gencode=arch=compute_60,code=sm_60 \
6. -gencode=arch=compute_61,code=sm_61 \
7. -gencode=arch=compute_62,code=sm_62 \
8. -gencode=arch=compute_70,code=sm_70 \
9. -gencode=arch=compute_70,code=compute_70
Leave a Reply

Enter your comment here...

http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ 2/4
5/11/2018 Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards - Blame Arnon

Alexander Stohr says: 02/12/2016 at 09:13


i can not find any hard information for term “SM62” on the web.
at least some are speculating that it is meant for Tegra.
what are your sources for your statements on “SM62”?
Reply

Arnon Shimoni says: 02/12/2016 at 11:38


You could be right… I’m not entirely sure
Reply

Yan says: 08/03/2017 at 10:00


Hi,
Then what happens if I only use the following at compile time
-gencode arch=compute_20,code=\”sm_20,compute_20\”
but run the compiled code on a 5.0 card? The JIT compiler will generate the GPU code, but is it
going to compile with
-gencode arch=compute_50,code=\”sm_50,compute_50\”
I’ve been searching the web, but couldn’t find anything. Please advice.
Thanks,
Ian
Reply

Arnon Shimoni says: 09/03/2017 at 03:22


Hey Ian
If you’re compiling for a 5.0 card, the second option you suggested is better. If you
have to have cross-compatibility, I’d recommend the first.
Reply

jg says: 24/05/2017 at 04:14


Thank you, very useful, what about sm_37 ?
Reply

Arnon Shimoni says: 24/05/2017 at 08:00


`sm_37` is for the Tesla K80 cards, but our experience proves that it’s not effective
to compile for it specifically. sm_30 gives the same results and is better if you also
have K40s or similar.
Reply

LostWorld says: 19/06/2017 at 17:57


kindly help me to find SM for GTX950 and compute_????
Reply

Arnon Shimoni says: 20/06/2017 at 01:30


-gencode=arch=compute_52,code=sm_52 – make sure you have CUDA 6.5 at least.
Reply

LostWorld says: 20/06/2017 at 03:26


thank you. so nice of u
Reply

Mandar Gogate says: 23/07/2017 at 15:46


Thank you.
Reply

Up Next

When would SQream DB be right for me?


http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ 3/4
5/11/2018 Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards - Blame Arnon

Copyright © 2014 Arnon Shimoni Powered by ExpressCurate.

http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ 4/4

Anda mungkin juga menyukai