Troubleshooting file systems 9–31
9.26.9 Troubleshooting supplementary groups access
If a user receives unexpected access denied errors when using supplementary groups, use the following
procedures and information to troubleshoot the problem:
• Ensure that ssh access is set up correctly (refer to the Configuring supplementary groups section in
Chapter 9 of the HP StorageWorks Scalable File Share System Installation and Upgrade Guide for
more information).
• Verify that the group servers are being accessed regularly (through the ssh utility) by the MDS server.
If the group servers are being accessed regularly, there will be sshd messages in the /var/log/
messages file on the group servers. You may need to wait for the cache timeout period to expire
before you can expect to see a successful ssh access reported.
• Enter the hpls_getgroups command on the group servers and verify that it returns the correct
group information for the UID of the specified user.
• Examine the event log during the time that the denied user attempts to access the file.
• Attempt to force the upcall mechanism to start and then examine the event log again; however,
because the group information is cached, you must wait until the cache timeout period between
access attempts has expired before you attempt to force the upcall mechanism to start.
To force the upcall mechanism to start, and to examine the event log, enter the commands shown in
the following example, where south-mds1 is the MDS service, and 538 is a user UID:
# usr/opt/hpls/bin/hpls_groups_upcall south-mds1 538
# sfsmgr show log recent
.
.
.
Nov 17 10:56:38 south2-adm hpls_groups_upcall: write(148) failed: Invalid argument
There will be an error in the log, as shown in the above example; this error is returned because the
MDS service did not request the hpls_groups_upcall script to run, and the message is normal in
this context. Any other error messages are not normal and may indicate why the user is being denied
access.
• The five-second period allowed for the upcall/ssh process is specified by the /proc/fs/
lustre/mds/filesystem_name-mdsnumber/group_acquire_expire file. You can
determine the allowed period for a file system as shown in this example, where the file system is
called data and has an MDS service called mds5:
# cat /proc/fs/lustre/mds/data-mds5/group_acquire_expire
5
If you suspect that the upcall/ssh process is failing because it is taking too long, you can increase
the interval as follows:
# cat 20 > /proc/fs/lustre/mds/data-mds5/group_acquire_expire
However, you must make this change each time the MDS service starts or restarts.
• Look in the event log for events such as the following:
group_upcall: Failed to get groups for uid 100, timeout waiting to connect to
172.100.100.100
You can search for such events using this example query:
sfs> show log facility=lustre && data contains “timeout waiting to connect to”
If you find such events, increase the value of the lustre.groups_ssh_timeout attribute, as
described in the Configuring supplementary groups section in Chapter 9 of the HP StorageWorks
Scalable File Share System Installation and Upgrade Guide.
Komentarze do niniejszej Instrukcji