• Welcome to Jose's Read Only Forum 2023.
 

OpenCL: High level wrapper + tools for PB coders

Started by Petr Schreiber, June 29, 2010, 08:37:32 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Petr Schreiber

Dear friends,

as promised, here comes include file, which helps programmer to develop OpenCL applications in easy and safe way; at least that was the original purpose.

I attach one sample code of sum of two vectors again, as it is very illustrative.

OpenCL itself is beautiful, but has one problem - the code to run simple kernel on GPU like this:

__kernel void vectorAdd(__global const float * a, __global const float * b, __global float * c)
{
// Vector element index
int nIndex = get_global_id(0);
c[nIndex] = a[nIndex] + b[nIndex];
}

means 100+ lines of OpenCL API calls just to invoke it.

To eliminate this disaster, I designed the mentioned high level wrapper and as the bonus I attach ThinBASIC application in EXE form, which generates 95% of code for you, by analyzing the OpenCL C kernel code.
It looks for kernel parameters, their type, number of problem dimensions... Programmer has to determine just the input and output data.

Here is example of how the code looks, keep in mind 95% of this has been generated for you:

#COMPILE EXE
#DIM ALL

#INCLUDE "OpenCL_Highlevel.inc"

%elements = 3

FUNCTION PBMAIN()

  LOCAL infoCL AS OpenCL_Info
  LOCAL pk AS OpenCL_ProgramKernel

  OpenCL_EnableErrorReport(EXE.PATH$+"CLog.txt", 1)
  OpenCL_BeginWork(infoCL, %CL_DEVICE_TYPE_GPU)
  OpenCL_CreateProgramAndKernel(pk, infoCL, EXE.PATH$+"VectorAdd.cl", "vectorAdd")

  DIM arrayA(1 TO %elements) AS SINGLE
  DIM arrayB(1 TO %elements) AS SINGLE
  DIM arrayC(1 TO %elements) AS SINGLE

  ArrayA(1) = 1
  ArrayA(2) = 2
  ArrayA(3) = 3

  ArrayB(1) = 4
  ArrayB(2) = 5
  ArrayB(3) = 6

  vectorAdd_Execute(pk, infoCL, arrayA(), arrayB(), arrayC() )
  OpenCL_EndWork(infoCL)

  MSGBOX FORMAT$(ArrayC(1))+", "+FORMAT$(ArrayC(2))+", "+FORMAT$(ArrayC(3))

END FUNCTION


FUNCTION vectorAdd_Execute(program_kernelCL AS OpenCL_ProgramKernel, infoCL AS OpenCL_Info, arrayA() AS SINGLE , arrayB() AS SINGLE , arrayC() AS SINGLE) AS LONG
  LOCAL errorCL AS LONG
  LOCAL queueCL AS DWORD

  queueCL = OpenCL_CreateQueue(infoCL)

  '----------------------------------------
  DIM globalDim(1) AS DWORD
  globalDim(1) = %elements
  '----------------------------------------

  LOCAL buffer_a, buffer_b, buffer_c AS DWORD

  '----------------------------------------

  buffer_a = clCreateBuffer(infoCL.context, %CL_MEM_READ_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayA(1), errorCL)

  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to create buffer_a!", errorCL)
  END IF

  buffer_b = clCreateBuffer(infoCL.context, %CL_MEM_READ_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayB(1), errorCL)

  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to create buffer_b!", errorCL)
  END IF

  buffer_c = clCreateBuffer(infoCL.context, %CL_MEM_WRITE_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayC(1), errorCL)

  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to create buffer_c!", errorCL)
  END IF

  '----------------------------------------

  errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 0, 4, buffer_a)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to set param a!", errorCL)
  END IF

  errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 1, 4, buffer_b)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to set param b!", errorCL)
  END IF

  errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 2, 4, buffer_c)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to set param c!", errorCL)
  END IF

  '----------------------------------------

  errorCL = clEnqueueNDRangeKernel(queueCL, program_kernelCL.kernel, 1, BYVAL 0, globalDim(1), BYVAL 0, 0, BYVAL 0, BYVAL 0)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to execute kernel!", errorCL)
  END IF
  '----------------------------------------

  errorCL = clEnqueueReadBuffer(queueCL, buffer_c, %CL_TRUE, 0, %elements * SizeOf_Single, arrayC(1), 0, BYVAL 0, BYVAL 0)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to read back buffer buffer_c!", errorCL)
  END IF

  '----------------------------------------

  errorCL = clReleaseMemObject(buffer_a)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to release buffer_a!", errorCL)
  END IF

  errorCL = clReleaseMemObject(buffer_b)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to release buffer_b!", errorCL)
  END IF

  errorCL = clReleaseMemObject(buffer_c)
  IF (errorCL <> %CL_SUCCESS) THEN
    OpenCL_ReportError("Failed to release buffer_c!", errorCL)
  END IF

  '----------------------------------------

  errorCL = OpenCL_ReleaseQueue(queueCL)
  FUNCTION = errorCL
END FUNCTION


Of course, the include file is built in way everything is logged, so you can easily detect what you are doing wrong.

Whole system is built on top of quality José Roca headers derived from Khronos C originals.

I hope you will like it, this is the future!*


Petr

* Summing two vectors is future? Of course not :D But the OpenCL technology itself is something to keep an eye on.

Target compiler: PowerBASIC for Windows 9.x
Target hardware: NVIDIA GeForce 8xxx and newer, Radeon HD 4xxx and newer
AMD Sempron 3400+ | 1GB RAM @ 533MHz | GeForce 6200 / GeForce 9500GT | 32bit Windows XP SP3

psch.thinbasic.com

Patrice Terrier

Patrice Terrier
GDImage (advanced graphic addon)
http://www.zapsolution.com

Jürgen Huhn

#2
Thank you for sharing!

I'm curious about OpenCL.

Usefull link:

http://www.khronos.org/registry/cl/
...

.¸.•'´¯)¸.•'´¯)¸.•'´¯)¸.•'´¯)
¤ª"˜¨¨¯¯¨¨˜"ª¤....¤ ª"˜¨