PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Nested loops and zeroed variables
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Jon Lester



Joined: 14 Sep 2010
Posts: 6

PostPosted: Tue May 07, 2013 7:39 am    Post subject: Nested loops and zeroed variables Reply with quote

I am currently working with #pragma directives on CUDA accelerator. It works rather smoothly but in these days I have got a curious behavior. The code has at least 5-6 levels of nested loops but the computations executed starting with the penultimate loop has all the variables zeroed and so this is the output of the algorithm. Without #pragma acc directives the code runs fine. I tried to use -Mvect=levels:<n> but this does not work while -Mconcur=levels:<n> makes the code crash.

Could you provide any hint?

Thanks beforehand.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue May 07, 2013 11:25 am    Post subject: Reply with quote

Hi Jon,

We currently max at 7 loop levels (though are in the process of expanding this), but since you're only at 5-6 levels, this shouldn't matter. Something else is going on.

Can you please post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

If not, what is the output from "-Minfo=accel"? How are the loops being scheduled?

-Mat
Back to top
View user's profile
Jon Lester



Joined: 14 Sep 2010
Posts: 6

PostPosted: Wed May 08, 2013 1:26 am    Post subject: Reply with quote

mkcolg wrote:
Hi Jon,

We currently max at 7 loop levels (though are in the process of expanding this), but since you're only at 5-6 levels, this shouldn't matter. Something else is going on.

Can you please post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

If not, what is the output from "-Minfo=accel"? How are the loops being scheduled?

-Mat


Hi mkcolg,

Thank you for the prompt answer. I cannot send around the code but I can provide you the output of the compilation. Please, note that is a mex function for Matlab and all the environment I built up is properly working to get such Matlab extensions to properly run. Loops at lines 966 and 1035 are those not working zeroing the variables computed above in the code.

>> mex -g -DCUDA -DDEBUG_MODE addTotalClutter_rain_mex.c RainClutter.c

PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC/x86-64-Extractor Windows 12.10-0: completed with warnings

"C:/Program Files/PGI/win64/12.10/bin\pgc_ex.EXE" addTotalClutter_rain_mex.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc "C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include" -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def __WIN32__ -def _WIN64 -def __WIN64 -def __WIN64__ -def __x86_64__ -def __X86_64__ -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def fastcall= -def __PGI_TOOLS10 -predicate "#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)" -idir "C:\Program Files\MATLAB\R2011b\extern\include" -idir "C:\Program Files\MATLAB\R2011b\simulink\include" -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline "C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il" 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 14 0x80 -exlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext

"C:/Program Files/PGI/win64/12.10/bin\pgc.EXE" addTotalClutter_rain_mex.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc "C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include" -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def __WIN32__ -def _WIN64 -def __WIN64 -def __WIN64__ -def __x86_64__ -def __X86_64__ -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def fastcall= -def __PGI_TOOLS10 -predicate "#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)" -idir "C:\Program Files\MATLAB\R2011b\extern\include" -idir "C:\Program Files\MATLAB\R2011b\simulink\include" -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -cmdline "+pgcc addTotalClutter_rain_mex.c -m64 -DCUDA -DDEBUG_MODE -c -acc -Minfo=all -Minline -Mvect=levels:10 -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -DMATLAB_MEX_FILE -DPGI_COMPILER -v -IC:\Program Files\MATLAB\R2011b\extern\include -IC:\Program Files\MATLAB\R2011b\simulink\include -g -DMX_COMPAT_32" -inlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext -x 14 32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline "C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il" 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
PGC-W-0095-Type cast required for this conversion (addTotalClutter_rain_mex.c: 183)
mexFunction:
127, Loop not vectorized/parallelized: contains call
141, Loop not vectorized/parallelized: contains call
155, Memory copy idiom, loop replaced by call to __c_mcopy8
PGC/x86-64 Windows 12.10-0: compilation completed with warnings
-x 129 2 -x 164 0x1000 -asm C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm

"C:/Program Files/PGI/win64/12.10/bin\pgsmart.EXE" -agg 0x62000020 -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm

"C:/Program Files/PGI/win64/12.10/bin\as64.EXE" C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s "-IC:\Program Files\MATLAB\R2011b\extern\include/" "-IC:\Program Files\MATLAB\R2011b\simulink\include/" -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj

"C:/Program Files/PGI/win64/12.10/bin\pgcnv.EXE" C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj addTotalClutter_rain_mex.obj
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3bHazbJH9AYu9w.sm
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4cPazb7Tjh8Hyl.s
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dXazbtD34R-da.obj
Unlinking directory C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2azazblGriZM-K.ext
PGC/x86-64-Extractor Windows 12.10-0: completed

"C:/Program Files/PGI/win64/12.10/bin\pgc_ex.EXE" RainClutter.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc "C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include" -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def __WIN32__ -def _WIN64 -def __WIN64 -def __WIN64__ -def __x86_64__ -def __X86_64__ -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def fastcall= -def __PGI_TOOLS10 -predicate "#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)" -idir "C:\Program Files\MATLAB\R2011b\extern\include" -idir "C:\Program Files\MATLAB\R2011b\simulink\include" -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline "C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il" 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 14 0x80 -exlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext

"C:/Program Files/PGI/win64/12.10/bin\pgc.EXE" RainClutter.c -debug -x 120 0x200 -opt 2 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp nehalem -vect 56 -y 34 16 -x 34 0x8 -x 32 12582912 -y 19 8 -y 35 0 -x 30 10 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 34 0x400000 -x 149 1 -x 150 1 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x10 -astype 0 -x 121 1 -stdinc "C:/Program Files/PGI/win64/12.10/include;C:/Program Files/PGI/Microsoft Open Tools 10/include/sys;C:/Program Files/PGI/Microsoft Open Tools 10/include;C:/Program Files/PGI/Microsoft Open Tools 10/PlatformSDK/include" -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def __WIN32__ -def _WIN64 -def __WIN64 -def __WIN64__ -def __x86_64__ -def __X86_64__ -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def __extension__= -def __amd64__ -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def fastcall= -def __PGI_TOOLS10 -predicate "#machine(i386) #lint(off) #system(unix) #system(winnt) #cpu(i386)" -idir "C:\Program Files\MATLAB\R2011b\extern\include" -idir "C:\Program Files\MATLAB\R2011b\simulink\include" -def CUDA -def DEBUG_MODE -def _ACCEL=201003 -def _OPENACC=201111 -def MATLAB_MEX_FILE -def PGI_COMPILER -def MX_COMPAT_32 -cmdline "+pgcc RainClutter.c -m64 -DCUDA -DDEBUG_MODE -c -acc -Minfo=all -Minline -Mvect=levels:10 -fast -Mvect=sse -Mscalarsse -Mcache_align -Mflushz -Mpre -DMATLAB_MEX_FILE -DPGI_COMPILER -v -IC:\Program Files\MATLAB\R2011b\extern\include -IC:\Program Files\MATLAB\R2011b\simulink\include -g -DMX_COMPAT_32" -inlib C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext -x 14 32 -x 123 0x80000000 -x 123 4 -x 119 0x20 -alwaysinline "C:/Program Files/PGI/win64/12.10/lib\libintrinsics.il" 4 -autoinl 10 -x 168 100 -x 174 8000 -x 14 0x200000 -x 120 0x200000 -x 186 0x80000 -x 180 0x400 -x 180 0x4000000 -x 163 0x1 -x 186 2 -accel nvidia -x 176 0x140000 -x 177 0x0202007f -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 53239 -x 162 53239 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -x 80 0x800000 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -asm C:\Users\ADexecuting C:/Program Files/PGI/win64/12.10/bin/pgnvd C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc2a0G7CRm6WoML.gpu -computecap=13 -ptx C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc3buG78gYvxywW.ptx -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc4c0G7C_cE03ib.bin -ptxinfo C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc5duG78XA4qzqi.info -4.1
executing C:/Program Files/PGI/win64/12.10/bin/pgnvd C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc2a0G7CRm6WoML.gpu -computecap=20 -ptx C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc6e0G7CbX13it4.ptx -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc7fuG78m5TuQRy.bin -ptxinfo C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgacc8g0G7C0c3WriS.info -4.1
RainClutter:
208, Loop not vectorized: may not be beneficial
Unrolled inner loop 4 times
Used combined stores for 1 stores
233, Loop not vectorized: may not be beneficial
Generated an alternate version of the loop
Unrolled inner loop 4 times
Used combined stores for 1 stores
243, Generating present_or_create(numGridpoints)
Generating present_or_create(power)
Generating present_or_create(thisCellPower)
Generating present_or_create(num_ones)
Generating present_or_create(temp)
Generating present_or_create(iDM)
Generating present_or_create(iDMax)
Generating present_or_create(iDMin)
Generating present_or_create(iC)
Generating present_or_create(k2)
Generating present_or_create(k1)
Generating present_or_create(kk2)
Generating present_or_create(ATT_RAIN)
Generating present_or_copy(Vol_pos_s[0:])
Generating present_or_copyin(mpos_s1[0:])
Generating present_or_copy(Vol_pos_s_no_m[0:])
Generating present_or_copyin(DCM_s_to_be[0:3][0:])
Generating present_or_copy(rel_pos_norm_be_bar[0:])
Generating present_or_copyin(M_ant[0:][0:])
Generating present_or_copy(dir_s_norm[0:])
Generating present_or_copy(Volume_dir_ant[0:])
Generating present_or_copyin(azimuths1[0:179])
Generating present_or_copyin(rain_tab[0:][0:])
Generating present_or_copy(test1[0:24][0:2])
Generating present_or_copy(r[0:][0:])
Generating present_or_copyin(el_range[0:])
Generating present_or_copy(antel[0:])
Generating present_or_copyin(az_range[0:])
Generating present_or_copyin(sum_data2[0:][0:][0:])
Generating present_or_copy(p1[0:])
Generating present_or_copy(p2[0:])
Generating present_or_copy(p3[0:])
Generating present_or_copy(p4[0:])
Generating present_or_copy(complexGain[0:])
Generating present_or_copyin(m_vel_ant[0:])
Generating present_or_copyin(Range_vector[0:nr])
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
249, Loop is parallelizable
Accelerator kernel generated
249, #pragma acc loop gang /* blockIdx.x */
CC 1.3 : 108 registers; 136 shared, 836 constant, 40 local memory bytes
CC 2.0 : 63 registers; 120 shared, 736 constant, 0 local memory bytes
368, #pragma acc loop vector(128) /* threadIdx.x */
272, Loop is parallelizable
356, Loop is parallelizable
368, Loop is parallelizable
470, Loop is parallelizable
570, Loop is parallelizable
644, Loop is parallelizable
651, Loop is parallelizable
768, Loop is parallelizable
769, Loop is parallelizable
774, Loop is parallelizable
785, Loop is parallelizable
790, Loop is parallelizable
966, Loop is parallelizable
1035, Loop is parallelizable
PGC/x86-64 Windows 12.10-0: compilation successful
MINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm

"C:/Program Files/PGI/win64/12.10/bin\pgsmart.EXE" -agg 0x62000020 -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm

"C:/Program Files/PGI/win64/12.10/bin\as64.EXE" C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s "-IC:\Program Files\MATLAB\R2011b\extern\include/" "-IC:\Program Files\MATLAB\R2011b\simulink\include/" -o C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj

"C:/Program Files/PGI/win64/12.10/bin\pgcnv.EXE" C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj RainClutter.obj
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc3brWzbZNHXXTGU.sm
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc4czWzblMoQTYVE.s
Unlinking C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc5dHWzbJxJm4gUH.obj
Unlinking directory C:\Users\ADMINI~1.SD\AppData\Local\Temp\pgcc2ajWzbBe552NLl.ext
File with unknown suffix passed to linker: /DLL
File with unknown suffix passed to linker: /export:mexFunction
File with unknown suffix passed to linker: /implib:C:\Users\ADMINI~1.SD\AppData\Local\Temp\mex_BPwzBU\templib.x
File with unknown suffix passed to linker: /MACHINE:X64
File with unknown suffix passed to linker: /LIBPATH:C:\Program Files\MATLAB\R2011b\extern\lib\win64\microsoft;C:\Program Files\PGI\win64\12.10\lib
[/code]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Wed May 08, 2013 9:52 am    Post subject: Reply with quote

Hi Jon,
Quote:

Please, note that is a mex function for Matlab and all the environment I built up is properly working to get such Matlab extensions to properly run.
Interesting. I have a background project to write an article on using OpenACC in Matlab, but unfortunately have gotten sidetrack with other projects so haven't had the opportunity to work on it. Glad to see that you are experimenting with it.


I'm not liking the schedule being generated:

Quote:
249, Loop is parallelizable
Accelerator kernel generated
249, #pragma acc loop gang /* blockIdx.x */
CC 1.3 : 108 registers; 136 shared, 836 constant, 40 local memory bytes
CC 2.0 : 63 registers; 120 shared, 736 constant, 0 local memory bytes
368, #pragma acc loop vector(128) /* threadIdx.x */
272, Loop is parallelizable
356, Loop is parallelizable
368, Loop is parallelizable
470, Loop is parallelizable
570, Loop is parallelizable
644, Loop is parallelizable
651, Loop is parallelizable
768, Loop is parallelizable
769, Loop is parallelizable
774, Loop is parallelizable
785, Loop is parallelizable
790, Loop is parallelizable
966, Loop is parallelizable
1035, Loop is parallelizable


It looks to me that you're using the "parallel" construct and only have loop directives around the loops at lines 249 and 368. The rest of the loops are paralleizable, but getting executed sequentially within the "gang".

What I'd like you to try is to change to using the "kernels" construct and remove any loop directives. This will allow the compiler to generate what it thinks is the best schedule. I'm not sure this will fix the problem, but I'm curious what it comes up with.

- Mat
Back to top
View user's profile
Jon Lester



Joined: 14 Sep 2010
Posts: 6

PostPosted: Tue May 14, 2013 7:32 am    Post subject: Reply with quote

Dear mkcolg,

Now the code is almost perfectly running. The problem was that nested loops were not rectangular. This yielded a strange behavior with the loops extrema set to zero and loops never executed.

Fixed this, I have now the problem that an array, that I initialize to zero before the accelerated region, is no more initialized inside the region. This sums up a lot of garbage producing an Inf as output instead of the correct result, after a summation r[i][j] += k is performed. This is the only remaining problem as the code performs really well otherwise and with an exceptional gain of about two magnitude orders with respect to normal Matlab code.

Mex function+PGI compilers work and perform well. It is time that Mathworks supports PGI compilers.

Jon
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group