| View previous topic :: View next topic |
| Author |
Message |
adityak
Joined: 19 Mar 2007 Posts: 10
|
Posted: Mon Mar 19, 2007 12:14 pm Post subject: Shared memory error using the mpirun |
|
|
In the sample that code I am trying to run using mpirun, the mpi_init gives me an "error attaching to the shared memory object as a slave; Permission denied" message.
Is this because I killed the hanging processes on the cluster nodes. Does this not clear the shared memory usage?
Why is it that ipcs does not show any shared memory segments?
Is there a way to release the shared memory and clean up so I can run the mpirun command?
Thanks
Aditya |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Mon Mar 19, 2007 3:14 pm Post subject: |
|
|
Hi Aditya,
| Quote: | | Is this because I killed the hanging processes on the cluster nodes. Does this not clear the shared memory usage? | Killing the process does not clear the shared memory.
| Quote: | | Why is it that ipcs does not show any shared memory segments? | The ipcs command should show all shared memory used.
| Quote: | | Is there a way to release the shared memory and clean up so I can run the mpirun command? | The ipcrm command will remove the shared memory segment.
As for the initial error, we're not sure. Which MPI version are you using and how do you have it configured? Which sample code exhibits the behavior and what is the command you use to run it? What OS are you using? Which version of the PGI compilers are you using?
- Mat |
|
| Back to top |
|
 |
adityak
Joined: 19 Mar 2007 Posts: 10
|
Posted: Tue Mar 20, 2007 8:36 am Post subject: |
|
|
Hi Mat,
Thanks for the reply!
The version of PGI compilers I am using is the Linux/x86-64 6.2-4 for a linux 64 bit machine. The error occured while I run the mpihello code using mpirun
i.e mpirun -np 4 mpihello. I am trying to run this on a cluster.
This probably might be the reason why the fault occured:
I initially had some processes hanging on the node (since I was not able to access the nodes due to password settings). So I explicitly killed the hanging processes on the nodes. As you said this did not clear the shared memory.
So after this I have been getting the shared memory error. It seems surprising to me that I keep getting this error although ipcs does not show any shared memory segments. Do you have a clue on why this might be so and how I can clear the shared memory? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Tue Mar 20, 2007 12:26 pm Post subject: |
|
|
Hi Aditya,
While I don't know specifically what's wrong, it seems that you have a fundamental problem with your MPI installation, your cluster, or your OS. Which MPI do you have and how was it built?
Note that we do have a pre-built MPICH-1 library available for download if you want to try it: http://www.pgroup.com/support/downloads.php?release=620
- Mat |
|
| Back to top |
|
 |
adityak
Joined: 19 Mar 2007 Posts: 10
|
Posted: Tue Mar 20, 2007 2:41 pm Post subject: |
|
|
Hi Mat,
The version I have was part of the PGI CDK software bought by my institution from Portland group.I have the mpi version 1.2.7 in this package.
Actually I did not understand why you think the installation might be wrong. Isnt this something that was supposed to happen when I killed the processes?. The only inconsistent part is that ipcs shows no shared regions. Is that why you said that installation might not be right? |
|
| Back to top |
|
 |
|