DECLARING CUDA VARIABLES Variable declaration Memory Scope Lifetime int LocalVar; register thread thread deviceshared int SharedVar; shared block block device_ int GlobalVar; global grid application deviceconstant_int ConstantVar; constant grid application device is optional when used with shared or constant Automatic variables reside in a register .Except per-thread arrays that reside in global memory 电子料战女学 Universityf Electr Science and TachnoloChina O
▪ __device__ is optional when used with __shared__, or __constant__ ▪ Automatic variables reside in a register ▪ Except per-thread arrays that reside in global memory Variable declaration Memory Scope Lifetime int LocalVar; register thread thread __device__ __shared__ int SharedVar; shared block block __device__ int GlobalVar; global grid application __device__ __constant__ int ConstantVar; constant grid application
EXAMPLE: SHARED MEMORY VARIABLE DECLARATION void blurKernel(unsigned char in,unsigned char out,int w,int h) shared float ds in[TILE WIDTH][TILE WIDTH]; 电子料发女学 Universityof Electricience and TachnolopChina O
void blurKernel(unsigned char * in, unsigned char * out, int w, int h) { __shared__ float ds_in[TILE_WIDTH][TILE_WIDTH]; … }
WHERE TO DECLARE VARIABLES? Can host access it? global register constant shared Outside of In the kernel any Function 电子科线女学 niversitof Electr Science and TachnoloChina O
Can host access it? Outside of any Function In the kernel global constant register shared
SHARED MEMORY IN CUDA -A special type of memory whose contents are explicitly defined and used in the kernel source code ·One in each SM Accessed at much higher speed(in both latency and throughput)than global memory Scope of access and sharing Lifetime-thread block,contents will disappear after the corresponding thread finishes terminates execution Accessed by memory load/store instructions A form of scratchpad memory 电子科战女学 Universityof Electricience and TachnolopfChina O
▪A special type of memory whose contents are explicitly defined and used in the kernel source code ▪ One in each SM ▪ Accessed at much higher speed (in both latency and throughput) than global memory ▪ Scope of access and sharing - thread blocks ▪ Lifetime – thread block, contents will disappear after the corresponding thread finishes terminates execution ▪ Accessed by memory load/store instructions ▪ A form of scratchpad memory in computer architecture
HARDWARE VIEW OF CUDA MEMORIES Global Memory 1/O Processing Unit Shared Memory ALU Register File 个 Control Unit PC IR Processor(SM) 电子料皮女学 niversitof Electr Science and TachnoloChina O
HARDWARE VIEW OF CUDA MEMORIES