PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Allocate memory in structure of arrays (CUDA Fortran)

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
brute11k



Joined: 22 Apr 2011
Posts: 9

PostPosted: Thu Dec 08, 2011 3:36 am    Post subject: Allocate memory in structure of arrays (CUDA Fortran) Reply with quote

Hello.

I'm trying to compile and execute next code:

1.cuf :

Code:
module fluid

  type omg
    real, allocatable :: sad(:)
    real, allocatable :: saf
  end type

end module

program broken_derivative_types
  use cudafor
  use cudadevice
  use fluid
  implicit none
  integer :: istat
  type (omg) :: h
  type (omg), device :: d
  real, allocatable, device :: d_z(:)
  real, allocatable :: h_z(:)

print *, "dbg"
  istat = cudaMalloc(d_z, 10)
  allocate(h_z(10))
  d_z = 10.0
  h_z = d_z
  print *, h_z
print *, "dbg2"
  allocate(h%sad(10))
print *, "dbg3"
  istat = cudaMalloc(d%sad, 10)
  allocate(d%sad(10))
print *, "istat: ", istat
  d%sad = 10.0
print *, "dbg4"
  h%sad = d%sad
print *, h%sad
end program broken_derivative_types


Here is the output:

Code:
 dbg
    10.00000        10.00000        10.00000        10.00000     
    10.00000        10.00000        10.00000        10.00000     
    10.00000        10.00000   
 dbg2
 dbg3
0: copyout Memcpy (host=0xbfc5de10, dev=0x809d3c4, size=76) FAILED: 11(invalid argument)


And if I uncomment this line:

Code:
!  istat = cudaMalloc(d%sad, 10)


Then I get next error from compiler:

Code:
PGF90-S-0155-Could not resolve generic procedure cudamalloc (1.cuf: 30)
  0 inform,   0 warnings,   1 severes, 0 fatal for broken_derivative_types


pgfortran version is 11.9:

Code:
pgfortran 11.9-0 32-bit target on x86 Linux -tp core2
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/x86 Linux 11.9-0
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.


If this possible, I want to see, where did I make error (and see correct code for managing memory with derivative types).
Thanks in advance.
Back to top
View user's profile
brute11k



Joined: 22 Apr 2011
Posts: 9

PostPosted: Sun Dec 25, 2011 7:45 am    Post subject: Reply with quote

Okay, that's too much.
When I'm using a structure with variables in it:

dimen = 1, prk = 4;

Code:
  type Fluidstemp
    sequence
    real(kind=prk) :: energy, density
    real(kind=prk) :: U(dimen)
    real(kind=prk) :: r(dimen)
  end type


and when I add some variable like that:

Code:
  type Fluidstemp
    sequence
    real(kind=prk) :: z, energy, density
    real(kind=prk) :: U(dimen)
    real(kind=prk) :: r(dimen)
  end type


I've been encountering problem that simply described in theese two plots:

Structure without additional variable: http://i.imgur.com/qVNAC.png
With: http://i.imgur.com/SD2C3.png

Here is full code: http://pastebin.com/pYzvcZyy

pgf90 -V:

Code:
pgf90 11.9-0 32-bit target on x86 Linux -tp core2
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2011, STMicroelectronics, Inc.  All Rights Reserved.


Compiling with:
pgf90 -rc=rc4.0 -m32 coursework_2011_cuda_tesla_1dv2.cuf csort.o -o b.out

rc4.0:

Code:
set CUDAROOT=/opt/cuda;
set CUDAVERSION=4.0;


csort.o was obtained by executing:

nvcc -m32 -c -arch sm_13 csort.cu

csort.cu: http://pastebin.com/Xcb5RRT8

I think thats kinda serious bug in there.
Thanks in advance.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6214
Location: The Portland Group Inc.

PostPosted: Tue Dec 27, 2011 12:50 pm    Post subject: Reply with quote

Hi Brute1k,

In your first example, this is a known limitation of CUDA Fortran. The size of device types must be known upon allocation. Hence, user defined types used as device types can only be composed of basic types, or fixed size arrays.

As for the seconds issue, I was able to compile and run your program. However, I'm not sure how to determine if I'm getting wrong answers. The program outputs to ~9000 text files but I'm not sure what to do with them. Can I compare a particular file or do I need to plot them?

I did compare the two resulting CUDA C code (-Mcuda=keepgpu), with and and without "z", but don't see anything obvious.

Finally what is the contents of your "rc4.0" file?

Thanks,
Mat
Back to top
View user's profile
brute11k



Joined: 22 Apr 2011
Posts: 9

PostPosted: Tue Dec 27, 2011 3:57 pm    Post subject: Reply with quote

mkcolg wrote:

In your first example, this is a known limitation of CUDA Fortran. The size of device types must be known upon allocation.


Okay, that's clear for me now. Anyway, is this possible somehow to "fix" this limitation or maybe just avoid it (except just using basic types or hard coding array limits) in near future?

mkcolg wrote:

The program outputs to ~9000 text files but I'm not sure what to do with them. Can I compare a particular file or do I need to plot them?


Oh, thats my bad. I'm really sorry about so much output.
Generally, you need to get gnuplot or any program, that understand 2-column plot data (just "x" and "y" values). After that, you need to plot all values (p_, V_, Rho_ with the same number) in one graph. Well, I've developed a little .sh script that will help you to generate output easily, if you have linux environment and gnuplot installed: http://pastebin.com/aBWwDjw4
Just place it in the same location where your output is generated and run. It should generate 3000 (that's should be enough to see the evolution) *.png . Here is crucial string to generate correct x axis limits:

Code:
xrangemx=`cat coursework_2011_cuda_tesla_1dv2.cuf | grep inparams%dx | awk '{print $3}'`*`cat coursework_2011_cuda_tesla_1dv2.cuf | grep 'integer, parameter :: grid' | awk '{print $6}'`;


I hope you are familiar with bash scripting and if not then you should just rename "coursework_2011_cuda_tesla_1dv2.cuf" here to your program name.

Then you can copy, for example "plot0140.png" to another location, change structure Fluidstemp according to my instructions in post before, run tests again, make graphs, copy new "plot0140.png" and take a look if they have any differences. They should be absolutely the same (because in fact, we didn't change anything except structure memory alignment).

mkcolg wrote:
I did compare the two resulting CUDA C code (-Mcuda=keepgpu), with and and without "z", but don't see anything obvious.

Well, I can tell you the error happens in global subroutine called 'cuda_calculate_flux_tvd'.
This is just bugs me. I can't tell why variable Fl%density (or Fr%density) isn't saving any values (and just contains some garbage in output) in this condition (change corresponding lines to get that buggy output):

571: Fr%density = 0.0
590: outfluxes(dims)%density(idx) = Fr%density ! for example
! dims = 1 anyway, you can just change it to outfluxes(1)

Aaand... The output in outfluxes(1)%density(:) will be NOT 0.0
Debug code in lines 1046 -- 1049.

mkcolg wrote:

Finally what is the contents of your "rc4.0" file?

If you read my previous post you can see it:

Code:
set CUDAROOT=/opt/cuda;
set CUDAVERSION=4.0;


mkcolg wrote:

Thanks,
Mat


Thank you too! I'm really glad you've answered me.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6214
Location: The Portland Group Inc.

PostPosted: Tue Jan 03, 2012 12:10 pm    Post subject: Reply with quote

Hi Brute11k,

FYI, now that I'm back in the office I'll have more time to figure this one out. For some reason, the "non-z" version of the program keeps hanging for me around iteration 5512254. Not sure why.

I'll keep you posted as I investigate. I think my next step is to remove thrust and put in my own sort routines. This will allow me to run the program in emulation mode and the debugger.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group