doubt about attributes(global/device)

OpenACC and CUDA Fortran
Henrique Rennó
Posts: 10
Joined: Nov 21 2019

doubt about attributes(global/device)

Post by Henrique Rennó » Wed Dec 04, 2019 12:05 pm

Hello,

I'm working with a scientific model that contains 3 routines that I want to speed up with CUDA Fortran. I wrote a first version that only uses global kernels, one for each routine, and it is working. However, I wrote another version where each routine is a device subroutine called from a global kernel, but the final results of the model are wrong.

Below is a simplified scheme of both versions, where mainsub() is called from main program, and all subroutines are inside modules and use a module that includes all device variables. Routines must be called sequentially because sub2 depends on sub1, sub3 on sub2, and sub1 on sub3 (in the first call, sub1 uses initial values). I added syncthreads() because I think global kernels have implicit synchronization, whereas device kernels don't.

Working version:

Code: Select all

subroutine mainsub()
    call modelsub()
end subroutine

subroutine modelsub()
    do while(condition)
        call modelsub1<<<blocks,threads>>>()
        call modelsub2<<<blocks,threads>>>()
        call modelsub3<<<blocks,threads>>>()
    enddo
end subroutine

attributes(global) subroutine modelsub1()
end subroutine

attributes(global) subroutine modelsub2()
end subroutine

attributes(global) subroutine modelsub3()
end subroutine
Not working version:

Code: Select all

subroutine mainsub()
    call modelsub<<<blocks,threads>>>()
end subroutine

attributes(global) subroutine modelsub()
    do while(condition)
        call modelsub1()
        call syncthreads()
        call modelsub2()
        call syncthreads()
        call modelsub3()
        call syncthreads()
    enddo
end subroutine

attributes(device) subroutine modelsub1()
end subroutine

attributes(device) subroutine modelsub2()
end subroutine

attributes(device) subroutine modelsub3()
end subroutine
What could be the cause for the second version does not work as the first version?

Thanks
Last edited by Henrique Rennó on Wed Dec 04, 2019 2:18 pm, edited 1 time in total.

aglobus
Posts: 171
Joined: Jun 11 2018

Re: doubt about attributes(global/device)

Post by aglobus » Wed Dec 04, 2019 1:26 pm

Hi Henrique,

My knowledge of CUF is a little limited, but modelsub is not a kernel (missing the attrribute statement) yet you are calling it like one with the chevron syntax.

Code: Select all

subroutine mainsub()
    call modelsub<<<blocks,threads>>>()
end subroutine

subroutine modelsub()
...
"The device attribute, specified on the subroutine or function statement, declares that the subprogram is to be executed on the device; such a routine must be called from a subprogram with the global or device attribute."
- https://www.pgroup.com/resources/docs/1 ... ttr-device

Henrique Rennó
Posts: 10
Joined: Nov 21 2019

Re: doubt about attributes(global/device)

Post by Henrique Rennó » Wed Dec 04, 2019 2:25 pm

Sorry, I forgot to add that after copying and pasting. Just edited the code in my message.

I'm still debugging the code trying to find the problem, and one weird thing that occurs while debugging is that depending on how many variables are printed inside the kernels, the results that are printed change, as if the print command affects the computations.

aglobus
Posts: 171
Joined: Jun 11 2018

Re: doubt about attributes(global/device)

Post by aglobus » Wed Dec 04, 2019 2:57 pm

Hi Henrique,

Looking at it again, you'll need to group those subroutines in an interface. Putting them into a module and `use` it will work since modules have an implicit interface.

Code: Select all

module m
attributes(device) subroutine modelsub1()
...
end subroutine
...
end module

...
attributes(global) subroutine modelsub()
use foo
...
end subroutine

Henrique Rennó
Posts: 10
Joined: Nov 21 2019

Re: doubt about attributes(global/device)

Post by Henrique Rennó » Wed Dec 04, 2019 5:24 pm

I didn't write the modules here to save space, but all routines are inside modules, and the callers have a "use" for each module that contains a routine to be called. I'll try changing the code to keep all routines in only one module instead of each routine inside its own module.

Post Reply