PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

out of memory
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Sun Apr 03, 2011 2:53 pm    Post subject: out of memory Reply with quote

Hello,

I'm using PGI Accelerator and Fortran. Whenever I run the executable file, I see that the program uses the RAM of the system. In some cases it exhausts all available memory, and the program stops and sometimes a message saying 'out of memory' displays. I thought the accelerated program was supposed to use the GPU memory and not all of the system RAM. Any idea what is causing this?

Program info: The program works with several matrices running several iterations. Once results are received from the GPU, one of the matrices is updated with a new value and then all the matrices are sent to the GPU for a new calculation. This process is done several times.

Thank you for any help
BL
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Apr 04, 2011 8:58 am    Post subject: Reply with quote

Hi BL,

My best guess is that you have a memory leak somewhere. What I'd do is compile without the Accelerator directives enabled, then run the program under Valgrind (www.valgrind.org). You could also have an uninitialized variable which is being used as an size of an array (either allocatable, automatic, or a implicit compiler generated temporary array). Valgrind can help with this as well.

- Mat
Back to top
View user's profile
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Sat Apr 09, 2011 2:45 pm    Post subject: Reply with quote

Mat,

Thanks for the reply. I do not have a machine to use Valgrind on. Here is a small sample of what my code structure is like:

Code:

PROGRAM test
implicit none

real,allocatable:: A(:),B(:),C(:)     !  arrays

integer i,j,k

!--------------------------------------------------
allocate(A(10*10*10));
allocate(B(10*10*10));
allocate(C(10*10*10));
!--------------------------------------------------

A=0.0;B=1.0; C=2.0;
do k=1,5000
write(*,*) 'step',k
!$acc region
do i=1,1000
    do j=1,1000
        A(i)=A(i)+B(j)+C(j);
    enddo
enddo
!$acc end region
A=A/1000.0;
enddo
write(*,*) 'press Enter key'
read(*,*)
stop
end


When I compile it, I get the following:

Code:

C:\Desktop>pgfortran test.f90 -ta=nvidia,time -Minfo
test:
     14, Memory zero idiom, loop replaced by call to __c_mzero4
     17, Generating copy(a(1:1000))
         Generating copyin(b(1:1000))
         Generating copyin(c(1:1000))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     18, Loop is parallelizable
     19, Complex loop carried dependence of 'a' prevents parallelization
         Loop carried dependence of 'a' prevents parallelization
         Loop carried backward dependence of 'a' prevents vectorization
         Inner sequential loop scheduled on accelerator
         Accelerator kernel generated
         18, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             Using register for 'a'
         19, !$acc do seq(256)
             Cached references to size [256] block of 'b'
             Cached references to size [256] block of 'c'
             CC 1.0 : 9 registers; 2100 shared, 12 constant, 0 local memory byte
s; 100% occupancy
             CC 1.3 : 9 registers; 2100 shared, 12 constant, 0 local memory byte
s; 100% occupancy
             CC 2.0 : 23 registers; 2060 shared, 56 constant, 0 local memory byt
es; 83% occupancy


The code I have posted is meaningless. However, the structure is not. Basically I have a loop that calls the kernel several times. Whenever I run this on the cpu, the memory usage stays constant. Running on the GPU however, taskmanager shows that the program increasingly uses memory. If the number of calls to the kernel is sufficiently high, the program stops execution. Could the cause of this still be a memory leak? Thank you for the help.

Regards
BL
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Apr 11, 2011 12:00 pm    Post subject: Reply with quote

Hi BL,

I was unable to recreate your issue. On Windows the code ran without taskmgr showing any additional memory usage. On Linux, Valgrind showed no memory problems. Hence, it is unclear why you are getting this error.

What is the output from the 'pgaccelinfo' command? What compiler version are you using? What version of Windows? Also, please post the exact error you are getting.

Thanks,
Mat
Back to top
View user's profile
BL_user



Joined: 27 Jan 2011
Posts: 13

PostPosted: Mon Apr 11, 2011 5:10 pm    Post subject: Reply with quote

Mat,

Thank you for verifying the code in Valgrind and on your machine. The system I use has Windows Server 2008 with SP2. I am using PGI Workstation with Command Shells 11.1. Is this also the compiler version? If not, how can I check it? The command pgaccelinfo returns the following message:
Code:

CUDA Driver Version:           3020

Device Number:                 0
Device Name:                   GeForce GTX 285
Device Revision Number:        1.3
Global Memory Size:            1046151168
Number of Multiprocessors:     30
Number of Cores:               240
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           16384
Warp Size:                     32
Maximum Threads per Block:     512
Maximum Block Dimensions:      512, 512, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             256B
Clock Rate:                    1476 MHz
Current free memory:           1007550464
Upload time (4MB):                7 microseconds (   1 ms pinned)
Download time:                    3 microseconds (   2 ms pinned)
Upload bandwidth:              599186 MB/sec (4194304 MB/sec pinned)
Download bandwidth:            1398101 MB/sec (2097152 MB/sec pinned)

I could not get the error message to show. Usually what happens is that the execution stops completely before it can finish. I have also tried running the executable file on a GeForce 9600 GT card on a Windows XP 64bit SP2.
Thank you again for your help.

Regards
BL
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group