Dear friends,
as promised, here comes include file, which helps programmer to develop OpenCL applications in easy and safe way; at least that was the original purpose.
I attach one sample code of sum of two vectors again, as it is very illustrative.
OpenCL itself is beautiful, but has one problem - the code to run simple kernel on GPU like this:
__kernel void vectorAdd(__global const float * a, __global const float * b, __global float * c)
{
// Vector element index
int nIndex = get_global_id(0);
c[nIndex] = a[nIndex] + b[nIndex];
}
means 100+ lines of OpenCL API calls just to invoke it.
To eliminate this disaster, I designed the mentioned high level wrapper and as the bonus I attach ThinBASIC application in EXE form, which generates 95% of code for you, by analyzing the OpenCL C kernel code.
It looks for kernel parameters, their type, number of problem dimensions... Programmer has to determine just the input and output data.
Here is example of how the code looks, keep in mind 95% of this has been generated for you:
#COMPILE EXE
#DIM ALL
#INCLUDE "OpenCL_Highlevel.inc"
%elements = 3
FUNCTION PBMAIN()
LOCAL infoCL AS OpenCL_Info
LOCAL pk AS OpenCL_ProgramKernel
OpenCL_EnableErrorReport(EXE.PATH$+"CLog.txt", 1)
OpenCL_BeginWork(infoCL, %CL_DEVICE_TYPE_GPU)
OpenCL_CreateProgramAndKernel(pk, infoCL, EXE.PATH$+"VectorAdd.cl", "vectorAdd")
DIM arrayA(1 TO %elements) AS SINGLE
DIM arrayB(1 TO %elements) AS SINGLE
DIM arrayC(1 TO %elements) AS SINGLE
ArrayA(1) = 1
ArrayA(2) = 2
ArrayA(3) = 3
ArrayB(1) = 4
ArrayB(2) = 5
ArrayB(3) = 6
vectorAdd_Execute(pk, infoCL, arrayA(), arrayB(), arrayC() )
OpenCL_EndWork(infoCL)
MSGBOX FORMAT$(ArrayC(1))+", "+FORMAT$(ArrayC(2))+", "+FORMAT$(ArrayC(3))
END FUNCTION
FUNCTION vectorAdd_Execute(program_kernelCL AS OpenCL_ProgramKernel, infoCL AS OpenCL_Info, arrayA() AS SINGLE , arrayB() AS SINGLE , arrayC() AS SINGLE) AS LONG
LOCAL errorCL AS LONG
LOCAL queueCL AS DWORD
queueCL = OpenCL_CreateQueue(infoCL)
'----------------------------------------
DIM globalDim(1) AS DWORD
globalDim(1) = %elements
'----------------------------------------
LOCAL buffer_a, buffer_b, buffer_c AS DWORD
'----------------------------------------
buffer_a = clCreateBuffer(infoCL.context, %CL_MEM_READ_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayA(1), errorCL)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to create buffer_a!", errorCL)
END IF
buffer_b = clCreateBuffer(infoCL.context, %CL_MEM_READ_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayB(1), errorCL)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to create buffer_b!", errorCL)
END IF
buffer_c = clCreateBuffer(infoCL.context, %CL_MEM_WRITE_ONLY OR %CL_MEM_COPY_HOST_PTR, SizeOf_Single * %elements, arrayC(1), errorCL)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to create buffer_c!", errorCL)
END IF
'----------------------------------------
errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 0, 4, buffer_a)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to set param a!", errorCL)
END IF
errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 1, 4, buffer_b)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to set param b!", errorCL)
END IF
errorCL = errorCL OR clSetKernelArg(program_kernelCL.kernel, 2, 4, buffer_c)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to set param c!", errorCL)
END IF
'----------------------------------------
errorCL = clEnqueueNDRangeKernel(queueCL, program_kernelCL.kernel, 1, BYVAL 0, globalDim(1), BYVAL 0, 0, BYVAL 0, BYVAL 0)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to execute kernel!", errorCL)
END IF
'----------------------------------------
errorCL = clEnqueueReadBuffer(queueCL, buffer_c, %CL_TRUE, 0, %elements * SizeOf_Single, arrayC(1), 0, BYVAL 0, BYVAL 0)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to read back buffer buffer_c!", errorCL)
END IF
'----------------------------------------
errorCL = clReleaseMemObject(buffer_a)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to release buffer_a!", errorCL)
END IF
errorCL = clReleaseMemObject(buffer_b)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to release buffer_b!", errorCL)
END IF
errorCL = clReleaseMemObject(buffer_c)
IF (errorCL <> %CL_SUCCESS) THEN
OpenCL_ReportError("Failed to release buffer_c!", errorCL)
END IF
'----------------------------------------
errorCL = OpenCL_ReleaseQueue(queueCL)
FUNCTION = errorCL
END FUNCTION
Of course, the include file is built in way everything is logged, so you can easily detect what you are doing wrong.
Whole system is built on top of quality José Roca headers derived from Khronos C originals.
I hope you will like it, this is the future!*
Petr
* Summing two vectors is future? Of course not :D But the OpenCL technology itself is something to keep an eye on.
Target compiler: PowerBASIC for Windows 9.x
Target hardware: NVIDIA GeForce 8xxx and newer, Radeon HD 4xxx and newer
Thank you Petr!
...
Thank you for sharing!
I'm curious about OpenCL.
Usefull link:
http://www.khronos.org/registry/cl/ (http://www.khronos.org/registry/cl/)
...