PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

reshape call with LONG execution times

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
swsides



Joined: 07 Mar 2005
Posts: 2

PostPosted: Mon Mar 07, 2005 5:54 pm    Post subject: reshape call with LONG execution times Reply with quote

My code make lots of calls to the reshape() command
in F90. The Portland compilers seem to be taking
FAR longer to execute these calls than the Intel compilers
at Sandia. Are there special optimizations I should be
running to get these intrinsic calls to work more quickly?

Scott Sides
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue Mar 08, 2005 12:07 pm    Post subject: Reply with quote

Hi Scott,

One possible cause is excessive page swapping. Experimenting with "-Mcache_align", "-Mprefetch", and/or "-Mvect=sse,prefetch" might help. If your on a 64-bit system, try compiling in 32-bits (unless you really need 64-bits) using "-tp k8-32". Is the same issue exhibited on other systems?

This is just a guess but without seeing the code it's very difficult to determine. If your able to send a test case to trs@pgroup.com that would help a lot, but of course I understand if your unable. What optimizations are you using now? What type of system are you using? What version of the compilers are you using? I'll try to create my own test case but it may take me a few days to free up some time.

Thanks,
Mat
Back to top
View user's profile
swsides



Joined: 07 Mar 2005
Posts: 2

PostPosted: Thu Mar 10, 2005 2:53 pm    Post subject: long execution for reshape Reply with quote

Thanks for the reply.

I've timed these calls on another machine with the
Lahey-Fujitsu compiler. the execution times are longer
than at Sandia but far shorter than the times taken
with the Portland compiler at UCSB.

Also, I tried the Malign_cache and there was no change.
the other compiler options you mentioned wouldnt
even compile.

I tried running very small systems and there was no change
in the fraction of time taken by the reshape calls so I dont
*think* it could be some sort of cache size issue.

Any other ideas?

thanks again for the help
Scott
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Mar 10, 2005 6:04 pm    Post subject: Reply with quote

Hi Scott,


I created the following toy program that uses the RESHAPE intrinsic. I ran this on both an AMD Opteron and Intel Xeon with EM64T using the PGI 5.2-4 and Ifort Version 8.1 Build 20050203. I also adjusted the number of iterations and the number of elements. In all cases that I tested, PGI out performed Intel. Granted, this code is not a good measure of performance and may not even be relavant to your code. So please let me know how I can tweek it to best reproduce the performance difference you're seeing.

Is it possible that the performance disparity can be attributed to some other factor? Try profiling your application by compiling with your flags plus "-Mprof=lines" and then view the results with PGPROF.

Thanks,
Mat

r.f90:
Code:
      program test_reshape

      integer, parameter :: n=1000
      integer :: i = 0
      real(kind(1d0)), dimension(n,n) :: A, rA(n*n)

      call random_seed()
     
      do i=1,1000
         call random_number(rA)
         A = reshape(rA,(/n,n/))
      end do
      end


Runtimes from an AMD Opteron
Code:
% pgf90 -V -fastsse ~/tmp/r.f90 -o pr.out

pgf90 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/any Linux/x86-64 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/x86 Linux/x86-64 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
arroyo:/tmp/qa% time pr.out
22.063u 0.222s 0:22.28 100.0%   0+0k 0+0io 151pf+0w

% ifort -V -O3 -xW ~/tmp/r.f90 -o ir.out
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 8.1    Build 20050203 Package ID: l_fce_pc_8.1.025
Copyright (C) 1985-2005 Intel Corporation.  All rights reserved.

~/tmp/r.f90(11) : (col. 9) remark: LOOP WAS VECTORIZED.
GNU ld version 2.14.90.0.5 20030722 (SuSE Linux)
  Supported emulations:
   elf_x86_64
   elf_i386
   i386linux
% time ir.out
27.124u 0.227s 0:27.34 100.0%   0+0k 0+0io 285pf+0w


Runs from Intel Xeon with EM64T:
Code:
pgf90 -V -fastsse ~/tmp/r.f90 -o pr.out

pgf90 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/any Linux/x86-64 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
PGF90/x86 Linux/x86-64 5.2-4
Copyright 1989-2000, The Portland Group, Inc.  All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc.  All Rights Reserved.
% time pr.out
29.843u 0.017s 0:29.85 100.0%   0+0k 0+0io 0pf+0w

% ifort -V -fast ~/tmp/r.f90 -o ir.out
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 8.1    Build 20050203 Package ID: l_fce_pc_8.1.025
Copyright (C) 1985-2005 Intel Corporation.  All rights reserved.

IPO: performing single-file optimizations
IPO: generating object file ipo_ifortnqYca.o
~/tmp/r.f90(11) : (col. 9) remark: LOOP WAS VECTORIZED.
GNU ld version 2.15.90.0.1.1 20040303 (SuSE Linux)
  Supported emulations:
   elf_x86_64
   elf_i386
   i386linux
/opt/intel_fce_81/lib/libifcore.a(for_open_proc.o)(.text+0x3d86): In function `for__compute_filename':
: warning: Using 'getpwnam' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/opt/intel_fce_81/lib/libifcore.a(for_open_proc.o)(.text+0x3e9d): In function `for__compute_filename':
: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
% time ir.out
44.591u 0.016s 0:44.60 100.0%   0+0k 0+0io 0pf+0w
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group