Hp StorageWorks Scalable File Share Instrukcja Użytkownika Strona 290

  • Pobierz
  • Dodaj do moich podręczników
  • Drukuj
Przeglądanie stron 289
Troubleshooting9–66
5. If you know which file system is causing the problem, start that file system; otherwise, start all file
systems, but wait until each file system goes to the started state before starting the next file system.
If all file systems start normally, you can skip the remaining steps. If the MDS service crashes again after you
start a file system, continue with the remaining steps.
6. If the MDS server crashes again, stop all file systems (you can enter the stop filesystem
filesystem_name command on the administration server while you wait for the MDS server to
reboot). The stop filesystem filesystem_name command may appear to fail; however,
because the MDS service will not start when the MDS server is rebooted, the file system is effectively
stopped.
7. Unmount all Lustre file systems on every client node. If you cannot unmount a file system on a client
node, reboot the client node.
8. Check that the /proc/fs/lustre file does not exist on any client node. If the file exists on a client
node, reboot the client node.
9. Attempt to mount one client node.
The mount operation may hang or fail with an Input/output error message.
If the mount operation hangs, do not abort the operation; wait for ten minutes.
If the mount operation fails, repeat the mount attempt every few minutes.
If the mount operation succeeds, the MDS service will go to the running state within ten minutes. If
the MDS server crashes again or the MDS service fails to go to the running state after ten minutes
or after repeated attempts to mount the file system, contact your HP Customer Support representative
for more information.
When you have finished this procedure, if the mount operation succeeded, and the MDS service is in the
running state, enable the administration server using the enable server server_name command.
9.40 Rebuilding logical drives after disk failures
If you see messages similar to the following in the EVL logs for a server attached to SFS20 storage, it is
possible that a logical drive on an SFS20 array has failed:
cciss(105,48): cmd 43d86420 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d86c78 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d874d0 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d87d28 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d88580 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d88dd8 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d89630 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
cciss(105,48): cmd 43d89e88 ctlr_info = 0x482c0000 has CHECK CONDITION, sense
key = 0x4
For such a failure to occur, at least three disks (if ADG redundancy is used, or two disks if RAID5 redundancy
is used) in the logical drive must have failed at the same time. Such a failure is catastrophic, and it is highly
unlikely that you will be able to access the original data stored on the LUN associated with the logical drive.
If a logical drive fails as described here, use the show array array_number command to examine the
status of the individual disks on the array. (For information on using the show array array_number
command, see Section 4.5.) Replace the failed disks (see Section 8.1 for procedures for replacing
hardware components).
Przeglądanie stron 289
1 2 ... 285 286 287 288 289 290 291 292 293 294 295 ... 361 362

Komentarze do niniejszej Instrukcji

Brak uwag