)) Kernel Program void MatrixMultiplication(float*M, float*N, float*P, int width) int size= Width x width k sizeof(float) float* Md, Nd, pd: 1.//Transfer M and n to device memory cudaMalloc((void**)&Md, size cudaMemcpy (Md, M. size. cudaMemcpy Host ToDevice): cuda Malloc((void*w)&Nd, size) cuda Memcpy (Nd, N, size, cudaMemcpy HostToDevice): / Allocate p on the device cudaMalloc((void**)&Pd, size) 2.// Kernel invocation code- to be shown later 3.// Transfer p from device to host cudaMemcpy (P, Pd, size, cudaMemcpy Dev iceToHost): // Free device matrices cudafree md): cudaFree(Nd): cudafree(pd)
Kernel Program 16
Calculating a Dot Dondi Nd, nd 0,1 1,1 N 2101,2 Mdo d, d2o Pao Pa o Pd2opd3 Mdo Mdi, Md, Md3,Pdo, Pd, i Pd2.1 Pdg PdoaIpdapdalpd Pdalpdaiphalpd
Calculating a Dot 17