PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Run-time error for multi-gpu programming with openmp (pgfort
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Praveen B K



Joined: 04 Jan 2012
Posts: 6

PostPosted: Wed May 15, 2013 7:27 am    Post subject: Run-time error for multi-gpu programming with openmp (pgfort Reply with quote

Hi All,
We are using pgfortran compiler version 12.9(64 bit) on windows system.

We are spawning 4 openmp threads and each openmp thread is supposed to run on its respective GPU. We are calling this omp parallel region over NSTEPS steps.

With larger NSTEPS value (NSTEPS =100000) the probability of program failing at run-time increases.

With smaller NSTEPS value (NSTEPS =100) the probability of program failing at run-time decreases and most of the times it successfully finishes execution.

The compiled exe keeps on giving different run-time errors as follows
0: DEV_MKDESC: copyin Memcpy FAILED:11(invalid argument)
0: DEV_MKDESC: allocate FAILED:30(unknown error)
0: DEV_MKDESC: allocate FAILED:30(unknown error)
0: ALLOCATE: 4000 bytes requested; status = 30(unknown error)
0: DEV_MKDESC: copyin Memcpy FAILED:11(invalid argument)

PFA the code,
We used following script for compile and execute.
set OMP_NUM_THREADS=4
pgfortran -Mcuda -mp test.CUF
test.exe

Thanks and regards,
Praveen

Code:


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11  KERNEL
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11

module m_Kernels
use cudafor
   real, device, allocatable :: VAR_1_D(:)
       
contains

ATTRIBUTES(GLOBAL) SUBROUTINE ProcessArray_Kernel_1(VAR_1_D)
        real , device :: VAR_1_D(:)
       
END SUBROUTINE ProcessArray_Kernel_1

end module m_Kernels

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11  main
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11

program main
        use omp_lib
        use cudafor
        use m_Kernels
       
        integer :: threadsPerBlock, numOfThreads,numOfBlocks, ierr, array_size, num_of_omp_threads, NSTEPS, omp_thread_id
       
       
        real , allocatable :: VAR_1(:)
        !$OMP THREADPRIVATE(VAR_1)
       
        ! openmp shared VAR_1iables
        num_of_omp_threads = 4
        NSTEPS = 100000
        array_size=1000
       
       
        ! setting number of threads
        CALL omp_set_num_threads(num_of_omp_threads)


        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        !running omp parallel region NSTEPS times
        DO NSTEP = 1, NSTEPS
      !omp parallel region that will be called NSTEPS times
      !$OMP Parallel Do &
      !$OMP SHARED( num_of_omp_threads, NSTEPS,array_size) &
      !$OMP PRIVATE( VAR_1_D, VAR_2_D, threadsPerBlock, numOfThreads,numOfBlocks)
      DO omp_thread_id = 1, num_of_omp_threads
      
                   !setting device
                   ierr =  CUDASETDEVICE((omp_thread_id-1))
                   if ( ierr /= cudaSuccess ) then
                   write (* ,*) cudaGetErrorString ( ierr )
                   else
                   write (* ,*) 'device was set to:', (omp_thread_id-1), 'nstep is:',nstep
                   end if          
                   
                   !allocating arrays
                   allocate(VAR_1_D(array_size))
                   allocate(VAR_1(array_size))
                   
                   !CPU to GPU copy
                   VAR_1=1.0
                   VAR_1_D = VAR_1
                   
                   !Kernel Call
                   threadsPerBlock = 512
                   numOfThreads = array_size
                   numOfBlocks = CEILING(real(numOfThreads) / threadsPerBlock)
                   call ProcessArray_Kernel_1<<<numOfBlocks, threadsPerBlock>>>(VAR_1_D)
                   ierr=cudaThreadSynchronize()
                   if ( ierr /= cudaSuccess ) write (* ,*) cudaGetErrorString ( ierr )
                 
                   !deallocating arrays
                   deallocate(VAR_1_D)
                   deallocate(VAR_1)
                   
      END DO
      !$OMP END Parallel Do         
        END DO
       
end program

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!11
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6122
Location: The Portland Group Inc.

PostPosted: Wed May 15, 2013 3:20 pm    Post subject: Reply with quote

Hi Praveen,

I was able to recreate the errors here but unfortunately wasn't able to determine the cause. I'll need to pass this on to a compiler engineer to since what ever is wrong seems to occur in the run time libraries, either PGI or NVIDIA.

I filed this as TPR#19033.

Thanks,
Mat
Back to top
View user's profile
Praveen B K



Joined: 04 Jan 2012
Posts: 6

PostPosted: Wed May 15, 2013 10:27 pm    Post subject: Reply with quote

Thanks Mat
Back to top
View user's profile
Praveen B K



Joined: 04 Jan 2012
Posts: 6

PostPosted: Wed May 15, 2013 10:52 pm    Post subject: Reply with quote

Dear Mat,
Where can I check the bug you filed with id TPR#19033.
Sorry I am not aware where to check this.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6122
Location: The Portland Group Inc.

PostPosted: Thu May 16, 2013 10:12 am    Post subject: Reply with quote

Quote:
Where can I check the bug you filed with id TPR#19033.
We don't have a external view into our issue tracker (pgroup.com is on it's own network that can't access our internal network), so for updates, please either post on the UF or send a note to customer service. Customer service typically contacts end users once issues have been resolved and any fixed TPRs appear in the release notes.

Best Regards,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group