RAID Notes for FSL 11

Document revision history

Click the “Details” toggle below for the revision history.

Details

1.2.1 — Explain recovery for simple errors from refresh_secondary during a disk rotation; emphasize that in the incidental uses of rotation_shutdown, the -p option inhibits a shutdown
1.2.0 — Minor release
1.1.3 — Add collapsible box for document revision history
1.1.2 — Break Splitting the RAID and Testing into separate subsections of Recoverable testing.
1.1.1 — Adjustments for CIS hardening using backup_usr2
1.1.0 — Update refresh_spare_usr2 and disk rotation for CIS hardening; improve rotation steps
1.0.2 — Require a minimum age for disk being refreshed
1.0.1 — Bring refresh_spare_usr2 up to date for CIS hardened systems
1.0.0 — Initial release

1. Introduction

These notes are intended to cover, albeit tersely, the major issues for RAID operations with FSL11 (see the FS Linux 11 Installation Guide document). The scripts have been updated since FSL10 to handle the occasional reversals of the assignment of sda and sdb relative to the controller numbering. Some other minor improvements are included as well.

All operations and scripts in this document require root privileges unless otherwise indicated.

2. Guidelines for RAID operations

The FSL11 RAID configuration normally uses two disks configured according to the FSL11 installation instructions (see the FS Linux 11 Installation Guide document). Mandatory and recommended guidelines are given below.

2.1. Mandatory practices

These practices are necessary for the procedures in this document.

Make sure there are no SATA devices on lower numbered controllers than the primary and secondary disks. The primary disk must be on a lower numbered controller than the secondary. Putting the primary disk on controller 0 and the secondary on 1 is usually a good choice. This may require changing the internal cabling.

Note

Which disk is primary disk and which secondary is determined by which controllers they are on and so is fixed by the slots. Although the primary will usually be sda and the secondary, sdb, the designations may occasionally be reversed. The scripts used in this document take that into account and will work correctly for the primary and secondary slots.

Set your BIOS to allow hot swapping of disks for both the primary and secondary controllers. This is necessary to use the RAID procedures described in this document.
Never mix disks from different computers in one computer.

Never split up a RAID pair unless already synced. To enforce this, use the rotation_shutdown command for shutdowns whenever two disks are in use in the RAID.

This command will only shutdown the system if the RAID is synced. This script can also be useful in other cases when you are splitting the disks and want to make sure they are synced first.

A RAID pair (kept together and in order) can be removed/reinserted or moved between computers if need be. A disk rotation, recoverable testing, and initializing a new disk are the only routine reasons to split a pair.

Note

When booting a disk from a RAID by itself, you may see ~20 volume group not found/processed error message pairs and a mdadm: /dev/md/0 assembled from 1 drive out of 2, but not started message (typically after the second pair of the volume group messages), then the machine will boot. These error messages only appear like this the first time a disk from a RAID is booted without its partner.

When the single disk is booted subsequently (or booted normally with its partner) there may be a couple of volume group not found/processed error message pairs.

2.2. Recommended practices

These recommendations are intended to provide consistent procedures and make it easier to understand any problems, if they occur.

Make the upper (or left) slot the primary, the lower (or right) slot the secondary. This may require changing the internal cabling.
Label the slots as primary and secondary as appropriate.
Always boot for a refresh/blank with the primary slot turned on and the secondary slot turned off: so it is clear which is the active disk
Label the disks (so visible when in use) with the system name and number them 1, 2, and 3, …
Label the disks (so visible when in use) with their serial numbers, determined either from mdstat when only one disk inserted or by examining the disk
For reference, place the disk serial numbers in a file with their corresponding numbers, e.g.:
/root/DISKS.txt
```
1=ZC1B1YCC
2=ZC1A6WZ1
3=ZC1AHENM
```
When rotating disks, keep the disks in cyclical order (primary, secondary, shelf): 1, 2, 3; then 2, 3, 1;, then 3, 1, 2, then 1, 2, 3; and so on.
Rotate disks for a given computer at least once a month, and before any updates
If you have a spare computer (and/or additional systems), keep the numbered order of the disks the same on all computers.

Occasionally, extra rotations may be needed to re-sync the order of the disks in the computers.
Do not turn a disk off while the system is running. The only time a key switch state should be changed while the system is running is to add a disk for a blank or refresh operation.

3. Disk rotation

This section describes the disk rotation procedure. It is used to make periodic updates of the shelf disk.

Note	For systems with AUID accounts (i.e., CIS hardened systems), both of the commands in this section can typically be run from any AUID account. Usually, the user will be prompted for their AUID account password.

From the root (or an AUID) account, shut the system down with:
```
rotation_shutdown
```
This command will check the status of the RAID and proceed to shutting down only if the RAID is synced. Click the “Details” toggle below for how to handle the script refusing to shutdown.
Details
There are three errors that can prevent shutting down:
- If the FS is running: you should terminate it, when convenient, before trying again.
- If the RAID is recovering: you will need to wait until the recovery is finished before shutting down. You can check the progress with the mdstat command (or use rotation_shutdown -p for a progress meter; the -p option also inhibits actually shutting down).
- If the RAID is degraded: seek expert advice.
After the shutdown has completed, take the disk from the primary slot and put it on the shelf.

We recommend that you label the disk immediately, including the date (and possibly the time). In addition to getting the disk labeled before it is put away, this will reduce the chances that it will be confused with the old shelf disk.
Move the disk from the secondary slot to the primary slot and turn the slot on.
Boot.
Login as root (or with an AUID account) and run:
```
refresh_secondary
```
When the script says it is waiting for the secondary disk to be loaded:
- Move the old shelf disk to the secondary slot and turn the slot on.
If the script rejects the disk (it will stop with an error), corrective action will be needed. Click the “Details” toggle below for how to proceed.
Details
Important
You should not run blank_secondary, as is suggested by some of the error messages, without expert advice. Running blank_secondary is only suggested as a reminder for experts.

If an error stop occurred during refresh_secondary, you should generally seek expert advice; be sure to copy down the full error message to aid troubleshooting. Neither the secondary nor the primary disk have been harmed.

While generally expert help is recommended, the following three errors usually have a simple remedy:
- The disk is too recent (you probably inserted the new shelf disk).
- Foreign RAID(s) were detected (you inserted a disk from a different computer/RAID).
- The MD_UUID doesn’t match (also a disk from a different computer/RAID).
For these errors, the recovery method is basically to try again with the correct disk. The following steps should resolve the problem:

Tip
If you are unsure, you can always seek expert advice first.
1. Shut the system down (as root: shutdown -h now).
2. After the shutdown has completed, remove the secondary disk and put it where it belongs, properly labeled.
3. Locate the correct disk.
4. Reboot.
5. Login as root (or with an AUID account) and run:
  
  refresh_secondary
6. When the script says it is waiting for the secondary disk to be loaded:
  
  Insert the correct disk in the secondary slot and turn the slot on.
7. If this disk is also rejected, you must seek expert advice; be sure to copy down the full error message to aid troubleshooting. Neither the secondary nor the primary disk have been harmed.
8. If this disk is accepted, proceed to the next step.
If the disk is accepted, let the refresh run to completion.

You can check its progress with mdstat (or use rotation_shutdown -p for a progress meter; the -p option also inhibits actually shutting down). The system can be used for operations while the refresh is in progress, but may be a little slow.

4. Recoverable testing

Seek expert advice before using this method.

This section describes a method for testing updates in a way that provides a relatively easy recovery option if a problem occurs. Should that recovery fail for some reason, it is still possible to recover with the shelf disk as described in the Recover from a shelf disk section below.

The basic plan is given in the four subsections below. The first covers Splitting the RAID; the second, the actual Testing; the final two, what to do If the update is deemed successful or If the update is deemed to have failed.

4.1. Splitting the RAID

Note	Your BIOS must be set to allow hot swapping of disks for both the primary and secondary controllers.

If a rotation hasn’t just been completed, perform one (as an extra backup) according to Disk rotation above.

Shut the system down with the rotation_shutdown command.

Tip

If an update is relatively minor or the envisaged testing is intended to be of short duration and success is likely, expert users may wish to make use of the drop_primary script to split the RAID pairing in place of the reboot cycle method described here. Note that some (hopefully minor) data loss is possible on the primary (backup) disk as it is removed from the RAID whilst all the file systems are still mounted read/write. Hence this script should only be used on a unloaded or single-user system. The main advantage of using this script is that, if the test is successful, no manipulation of the key switches is required.

Warning

Do NOT use the drop_primary script for testing kernel updates or any other testing that could affect grub and/or require you to reboot in order to evaluate the success thereof.

Key-off the primary slot
Reboot (primary keyed-off, secondary keyed-on)
Proceed to the next subsection.

4.2. Testing

Install and test the update

The update and testing will occur on the secondary disk only.
Proceed to one of the two subsections below, If the update is deemed successful or If the update is deemed to have failed, as appropriate.

4.3. If the update is deemed successful

The other disk can be updated:

Key-on the primary slot
Run recover_raid to add the primary slot disk back into the RAID.

The recover_raid script will fail if the disk hasn’t spun up and been recognized by the kernel. It is perfectly fine to try several times until it succeeds.
Once the recovery completes (this may only take a few minutes), the system has been successfully updated.

4.4. If the update is deemed to have failed

The system can be recovered as follows:

Shutdown the system, e.g., shutdown -h now
Key-off the secondary slot
Key-on the primary slot
Reboot (primary keyed-on, secondary keyed-off)
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary
Once the refresh is complete (this may take several hours), you have recovered to the original state.

5. Recover from a shelf disk

The section describes how to recover from a good shelf disk. This might be needed, e.g., if it is discovered that a problem has developed on the RAID pair since the last disk rotation. This might be due to a bad update of some type or some other problem.

Tip	Before using this procedure, it should be considered whether the damage is extensive enough to require starting over from the shelf disk or whether it can be reasonably repaired in place.

Important

This will only produce a good result if the shelf disk is a good copy.

Warning

Do not use this procedure if a problem with the computer caused the damage to the RAID.

Note	Your BIOS must be set to allow hot swapping of disks, particularly for the secondary controller (it should also be set for the primary controller).

Shutdown the system, e.g., shutdown -h now
Take the disks from both the primary and secondary slots, set them aside.
Insert the good shelf disk in the primary slot, keyed-on.
Insert the disk that is next in cyclic order (from the ones set aside) in the secondary slot, keyed-off.
Reboot (primary keyed-on, secondary keyed-off)
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary

Once the refresh has entered the recovery phase, the system can be used for operations, if need be. In that case, the rest of this procedure can be completed when time allows.
Wait until the RAID is not recovering, check with mdstat
Shut the system down with the rotation_shutdown command.
Take the disk from primary slot, put it back on the shelf
Move the disk from the secondary slot to the primary slot, keyed-on
Insert the remaining disk, that was set aside, in the secondary slot, keyed-off.
Reboot (primary keyed-on, secondary keyed-off)
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary
When the refresh is complete, you have recovered to the state of the previous good shelf disk.

6. Initialize a new disk

If one or more of the disks in the set for the RAID fails, you can initialize new ones to replace them.

Important

The new disks should be at least as large as the smallest of the remaining disks.

The subsections below cover various scenarios for initializing one new disk to complete a set of three, i.e., one of three disks in a set has failed. It is assumed that you want to maintain the cyclic numbering of the disks for rotations (but that is not required). It should be straightforward to adapt the procedures for other cases.

If you need to initialize more than one disk, please follow the instructions in the Setup additional disks subsection of the FS Linux 11 Installation Guide document.

6.1. Currently two disks are running in the RAID

This case corresponds to not having a good shelf disk.

Shut the system down with the rotation_shutdown command.

If the disks are in cyclical order (i.e., primary, secondary are numbered in order: 1, 2, or 2, 3, or 3, 1), you should:

Take the disk from primary slot, put it on the shelf, labeled with the date
Move the disk from the secondary slot to the primary slot, keyed-on

If the disks are not in cyclical order (i.e., primary, secondary are numbered in order: 1, 3, or 2, 1, or 3, 2), you should:

Take the disk from secondary slot, put it on the shelf

In either case, finish with:

Put the new disk in the secondary slot, key-off.
Boot (primary keyed-on, secondary keyed-off)
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary
Once the refresh is complete, the disk can be used normally.
Label the new disk with its system name, number, and serial number.

6.2. Currently one disk is running in the RAID, but two are installed

In this case, there is a good shelf disk. The strategy used avoids overwriting it until there are three functional disks again.

Use mdstat to determine which disk is running, compare the serial number to those shown on the labels or inspect the disks to determine their serial numbers.
Shutdown the system, e.g., shutdown -h now
Remove the non-working disk.
Move the working disk to the primary slot, if it isn’t already there, keyed-on.
Put the new disk in the secondary slot, keyed-off.
Boot (primary keyed-on, secondary keyed-off)
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary
Once the refresh is complete, the disk can be used normally.
Label the new disk with its system name, number, and serial number.

If the disks are not in cyclical order (i.e., primary, secondary are numbered in order: 1, 3, or 2, 1, or 3, 2), then on the next disk rotation you should move the secondary disk to the shelf instead of moving the primary.

6.3. Currently one disk is installed and running

In this case, the shelf disk is assumed to be healthy, but older. Again, the strategy is to avoid overwriting it until there is a full complement of disks available.

If the working disk is not in the primary slot:

Shutdown the system, e.g., shutdown -h now
Move the working disk to the primary slot, keyed-on.
Boot (primary keyed-on, secondary empty)

Then in any event:

Put the new disk in the secondary slot, keyed-off.
Run blank_secondary
Key-on the secondary slot when prompted
Answer y to blank
Run refresh_secondary
Once the refresh is complete, the disk can be used normally.
Label the new disk with its system name, number, and serial number.

If the disks are not in cyclical order (i.e., primary, secondary are numbered in order: 1, 3, or 2, 1, or 3, 2), then on the next disk rotation you should move the secondary to the shelf instead of the primary.

7. Script descriptions

This section describes the various scripts that are used for RAID maintenance.

7.1. mdstat

This script can be used by any user (not just root) to check the status of the RAID. It is most useful for checking whether a recovery is in process or has ended, but is also useful for showing the current state of the RAID, including any anomalies.

The script also lists various useful details for all block devices (such as disks) that are currently connected, including: the controller they are on, their model, and serial numbers, where applicable.

7.2. rotation_shutdown

This script can be used to shut the system down if the RAID is in a state that allows a disk rotation to be performed, i.e., synced. The RAID must not be recovering and not be degraded. Otherwise, an appropriate error message is printed. If the RAID is recovering, you will need to wait until the recovery is finished before shutting down; you can check the progress with the mdstat command. If it is degraded, seek expert advice.

The script will also not shutdown the system if the FS is in use. To override this, the -F option can be used, but is not recommended. It is better to terminate the FS.

The script includes a -p option to display a progress meter for a recovery if one is active. Whether there is an active recovery or not, there will not be a shutdown if -p is used. This makes the command useful for starting a progress meter after a recovery had been started.

7.3. refresh_secondary

This can be used to refresh a shelf disk for the RAID as a new secondary disk as part of a standard three (or more) disk rotation.

Initially, the script performs some sanity checks to confirm that the RAID /dev/md0:

Exists.
Is not a clean state, i.e., it needs recovery.
Is not already recovering, i.e., is in a recoverable state.

Additional checks are performed to confirm that the content the script intends to copy is where it expects it to be and has the right form. Any primary disk will be rejected that:

Is not part of the RAID (md0)
Has a boot scheme other than the BIOS or UEFI set up as described in the FSL11 Installation Document.

To ensure that only an old shelf disk for this system is overwritten, any secondary disk will be rejected that:

Was loaded (slot keyed-on) before starting the script

Unless overridden by -A or previously loaded by this or the blank_secondary script (see below).
Is already part of RAID md0

Which should only happen if run incorrectly with -A (or other interfering commands have been executed) or the disk has fallen out of the RAID due to failure.
Has a RAID from a different computer, i.e., foreign

Technically this could also be another RAID from the same computer, but not of a properly set up FSL11 computer, which should have only the one RAID
Has any part already mounted

Again catching misuse of the -A option.
Has a different boot scheme than the primary

And hence is probably from a different computer.
Has a different RAID UUID

This would be a disk from a different computer. Though whether this check can actually trigger after the test for a foreign RAID above remains to be seen.
Was last booted at a future TIME (possibly due to a mis-set clock or clocks)

Has a higher EVENT count, i.e., is newer

Warning

The check on the EVENT counter is intended to prevent accidentally using the shelf disk to overwrite a newer disk from the RAID. This check can be over-run if the primary has run for a considerable period of time before the refresh is attempted. This should not be an issue if the refresh is attempted promptly after the shelf disk is booted for the first time by itself and the RAID was run on the other disks for more than a trivial amount of time beforehand.

Has been used (booted) separately by itself
Was last used less than 24 hours ago (a mis-set clock or clocks can invalidate this check).

This is intended to prevent accidentally refreshing a new shelf disk.
Has a different partition layout from the primary
Is smaller than the size of the RAID on the primary disk.

If any of the checks reject the disk, we recommend you seek expert advice; please record the error so it can be reported.

The checks are included to make the refresh process as safe as possible, particular at a station with more than one FSLx computer. We believe all the most common errors are trapped, but the script should still be used with care.

If the disk being refreshed is from the same computer and has just been on the shelf unused since it was last rotated, it is safe to refresh and should be accepted by all the checks. In other words, normal disk rotation should work with no problems.

If the primary and/or secondary disks are removable, the user will be provided with some information about the disks and given an opportunity to continue with Enter or abort with Ctrl+C. Typically, if a USB disk is identified as the primary or secondary, one would not want to continue. However for some machines, the SATA disks that are the primary and/or secondary may be marked removable if they are hot swappable, but would still be appropriate to use.

This script requires the secondary disk to not be loaded, i.e., the slot turned off, when the script is started. However, it has an option, -A (use only with expert advice), to “Allow” an already loaded disk to be used. It is intended to make remote operation possible and must be used with extra care.

If the disk is turned on (when prompted) during the script, it will automatically be “Allowed” by both this script and blank_secondary, which also supports this feature. This allows (expert use only), after a failed refresh_secondary, running blank_secondary then rerunning refresh_secondary, all without having to shutdown, turn the disk off, reboot, start the script, and turn the disk on for each script.

The refresh will take several hours. You can check the progress with mdstat. If you prefer, you can run the script with the -p option to display a progress meter. The system can be used normally while it refreshing, but it may be a little slow.

The system can rebooted while the refresh is still active, as long as the neither disk is removed until it is finished. The refresh will resume automatically after the reboot.

Note

If the primary disk has a larger capacity than the secondary and the latter is new or has been blanked (typically with blank_secondary), you may see a warning like:

Caution! Secondary header was placed beyond the disk's limits! Moving the
header, but other problems may occur!

In this case, the message is benign and can be ignored if the primary disk has a partition layout that will fit on the smaller disk. This should be the case if the system was setup initially as described in the FS Linux 11 Installation Guide document. This situation can occur if one (or more) of the disks is larger than the smallest one, perhaps because it was obtained as a replacement for a failed disk.

7.4. blank_secondary

This script should only be used with expert advice.

It can be used to make any secondary disk refreshable, if it is big enough. It must be used with care and only on a secondary disk that you know is safe to erase. Generally speaking you don’t want to use it with a disk from a different FSLx computer, except for very unusual circumstances; see the Recovery scenarios section below for some example cases. It will ask you to confirm before blanking.

It will reject any secondary disk that:

Was loaded (slot keyed-on) before starting the script

Unless you have just loaded it through refresh_secondary's auspices or used the -A option to “Allow” it (see below).
Is still part of the RAID md0

Which should only happen if run incorrectly with -A (or other interfering commands have been executed).
Has any partition already mounted

Again catching misuse of the -A option.
Has a partition that is in RAID md0

This is essentially redundant with the “Is still part of the RAID md0” check above, but is included out of an abundance of caution.
Has a partition that is included in any RAID.
Is smaller in size than the primary disk

This may be relaxed with the -A option, if the script is being used to blank a disk that will not be used in this RAID.

If the secondary disk is removable, the user will be provided with some information about the disk and given an opportunity to continue with Enter or abort with Ctrl+C. Typically, if a USB disk is identified as the secondary, one would not want to continue. However for some machines the SATA disk that is the secondary may be marked removable if it is hot swappable, but would still be appropriate to use.

If the disk is turned on (when prompted) during the script, it will automatically be “Allowed” by both this script and refresh_secondary, which also supports this feature. This allows you to then run refresh_secondary immediately without having to shutdown, turn the disk off, reboot, start the script, and turn the disk on.

The -A will also allow blanking of a disk that is too small to support the current RAID. This might be used to initialize a disk that will not be used in the current RAID. As before, use the -A option only will expert advice.

The -Z option (for expert use only) will “zap” the partition table and the start of each individual partition with 1 MiB of zeros. Each additional -Z specified will double the number of zeros written to the individual partitions. This option may be useful to force a disk into a state that the installer can handle.

7.5. drop_primary

This script is only for use with expert advice.

This script can be used to drop a primary disk out of a RAID pair (by marking it as failed) so that it can act as a safety backup during testing of upgrades or other significant changes.

Initially, the script performs some sanity checks to confirm that the RAID /dev/md0:

Exists.
Is in a clean state, i.e., both disks are present and no recovery is currently in progress.
Contains the primary disk as a member.

If the primary disk is removable, the user will be provided with some information about the disk and given an opportunity to continue with Enter or abort with Ctrl+C. Typically, if a USB disk is identified as the primary, one would not want to continue. However for some machines the SATA disk that is the primary may be marked removable if it is hot swappable, but would still be appropriate to use.

Note	This script is non-destructive in nature and its effect can easily be reversed by running the recover_raid script mentioned below.

7.6. recover_raid

This script is only for use with expert advice.

This script can be used to recover a disk, (primary or secondary) that has fallen out of the RAID array, becoming inactive. (The disk the system is then running on is referred to as the active disk.) A disk can fall out of the array for several possible reasons, including:

A real disk fault of some sort, including one caused by turning it off whilst it is still in use.

Using the mdadm command with -f option to mark it as faulty.

Caution

Using -f is risky and is for experts only. Using it on a disk that is being refreshed (or is synced) should be relatively easy to recover from with recover_raid. Using it on the disk that is being recovered from can cause problems (including possibly crashing the system). If -f has been used in that way, the system should be rebooted. At which point, it should restart recovering the RAID. This is in contrast to having a hard failure of the disk being recovered from. In that case, you will need to use the Recover from a shelf disk procedure with the remaining working disk.

Turning it off whilst the system is shutdown and booting without it.
Using the drop_primary script.

This script is designed to be used only with a set of disks that were most recently used together in an active RAID. It is recommended only to use this script if the key switches for the disks have not been manipulated since the inactive disk fell out of the RAID; in this case it should always be safe. The script normally works on md0, but a different md device can be specified as the first argument.

Important

This script must NOT be used if the inactive disk has been changed in any way e.g., by being used (booted) separately (which is caught by the script) or refreshed against some other disk, or if the active disk has been used to refresh any other disk in the interim. In particular, this script must NOT be used to refresh a shelf disk — only use refresh_secondary for that purpose.

Note

The inactive disk is either failed or missing. It is failed if it was either marked failed by hand or dropped out of the RAID due to disk errors. It is missing if either the system was rebooted with the disk failed or physically missing or it was manually marked removed. You can check which state an inactive disk is in with mdadm --detail /dev/md0 — which lists failed as faulty but a missing disk will not appear at all.

Tip

It is okay to use this script even if the inactive disk fell out the RAID a (long) long time ago (in a galaxy far, far away) and/or there have been extensive changes to the active disk. It is also okay to use if the system was rebooted (even multiple times) or the active disk was used (booted) separately by itself since the inactive disk fell out of the RAID.

Note

In extreme cases, the changes since the inactive disk fell out of the RAID may be too extensive to allow for a recovery with this script. You may get a message similar to

mdadm: --re-add for … to
device /dev/md0 is not possible

. If this happens, seek expert advice. It should be possible to recover by blanking and then refreshing the inactive disk. (If the inactive disk is in the primary slot, it will be necessary to reboot with the active disk installed in the primary slot then run blank_secondary and refresh_secondary, and finally shutdown and, reverse the disks between the slots and reboot.) Alternatively, it should be possible to use the --add option of the mdadm command to add the inactive disk to the RAID; this will take as long as a refresh_secondary.

The script will refuse to recover the RAID if the RAID:

Does not need recovery
Is not in a recoverable state, e.g., is already recovering

or if any missing disk:

Has a later modification TIME than the active disk
Has a higher EVENT count, i.e., is newer, than the active disk
Has been used (booted) separately (as mentioned above in the IMPORTANT item)

or if no matching missing disk can be found.

The recovery may be fairly quick, as short as a few minutes, if the inactive disk is relatively fresh. You can check the progress with mdstat. If you prefer, you can run the script with the -p option to display a progress meter. The system can be used normally while it recovering, but it may be a little slow.

7.7. raid-events

The mdmonitor service can be configured to use the raid-events script to send email reports on RAID rebuilds and checks. This is most useful for getting reports for the start and end of a RAID build triggered by refresh_secondary. The script will also report on the start and end of any other RAID rebuilds, including those triggered by the recover_raid script. Checks are triggered periodically to verify the integrity of the RAIDs.

The emails are sent to root, then typically redirected to oper, and then forwarded to off-system accounts that may have their email read more frequently. There are four different possible subject lines used in the emails:

Rebuild Running on device

Note	Sometimes for a rebuild started by refresh_secondary, this message may be sent about 20 minutes after the rebuild has started. The cause of this is not entirely understood, but the message is eventually sent.

Rebuild Ended state on device
Check Running on device
Check Ended state on device

where:

device is the RAID device, e.g., /dev/md/0
state is OKAY if the final state was not degraded; DEGRADED, if it was degraded.

The body of each email is the output of the mdstat script at the time the message was sent.

7.7.1. Checks

The checking process is triggered by /etc/cron.d/mdadm on the first Sunday of each month. It uses the /usr/share/mdadm/checkarray script and takes a similar amount of time as a rebuild of the RAID triggered by refresh_secondary.

7.7.2. Installing raid-events

To install the script, use the following commands as root:

cd /usr/local/sbin
cp ~/fsl11/RAID/raid-events .
chmod u+x raid-events
cat <<EOF >>/etc/mdadm/mdadm.conf

PROGRAM /usr/local/sbin/raid-events
EOF

And then reboot.

7.7.3. Disabling checking

If the checking process causes performance problems at inconvenient times, there are at least three options for dealing with it:

Disable the AUTOCHECK option in /etc/default/mdadm

This is suitable if the RAID is rebuilt monthly using refresh_secondary. In this case, the check is superfluous.
Change the time at which it runs as configured in /etc/cron.d/mdadm

Cancel a running check, with:

/usr/share/mdadm/checkarray --cancel --all

7.8. refresh_spare_usr2

This script is not part of RAID operations per se, but is included in this document for completeness. In a two system configuration (operational and spare), it is used to make a copy of the operational system’s /usr2 partition on the spare system. Normally this partition holds all the operational FS programs and data.

A full description of the features of the script are available from the refresh_spare_usr2 -h output.

Important

This script should be installed on the spare system only.

Caution

For this script to work most usefully, the operational and spare systems should have the same set-up including particularly having the same user accounts and groups (but the UIDs and GIDs don’t need to be the same) for owners of files on /usr2, as well as other OS set-up information the FS may depend on such as /etc/hosts and /etc/ntp.conf.

Tip

A recommended monthly backup strategy is to do a disk rotation on both systems. Once the RAIDs on both systems are recovering you can log-out of both systems and then login into the spare system again to start refresh_spare_usr2.

While refresh_spare_usr2 with two nearly synchronized /usr2 partitions is fairly fast, the recovery of the RAIDs may increase the amount of time required by about a factor of three.

Once refresh_spare_usr2 completes, it is safe to reboot, even if a recovery is still ongoing. The only requirement is to reboot the spare system before the FS is run on it again.

A feature of this approach is that it will make the spare system shelf disk a deeper back-up than the spare system RAID disks.

7.8.1. Installing refresh_spare_usr2

Note	For CIS hardened systems, please see the Installing backup_usr2 with CIS hardening section of the Additional items for FS operations appendix of the CIS hardening for FSL11 document.

All the steps below must be performed as root on the specified system. You should read all of each step and sub-step before following it.

On the operational system:
1. Temporarily set sshd to allow root login:
  1. Edit /etc/ssh/sshd_config
    
    Add an uncommented line (or change an existing line) for PermitRootLogin to set it to yes
  2. Restart sshd. Execute:
    
    systemclt restart sshd
On the spare system:
1. Make sure the operational system is represented in the /etc/hosts file.
  
  If it is not already there, add it. It is recommended that it be given a simple alias for routine use.
2. Install refresh_spare_usr2. Execute:
  ~/fsl11/RAID/install_refresh_spare_usr2
3. Customize refresh_spare_usr2, following the directions in the comments in the script (repeated here):
  1. Comment-out the lines (add leading #s):
    
    echo "This script must be customized before use. See script for details." exit 1
  2. Change the operational in the line:
    
    remote_node=operational
    
    to the alias (preferred), FQDN, or IP address of your operational system.
4. Create and copy a key for root. Execute:
  
  Tip
  If root already has a key, you only use the second command below, to copy it to the spare system.
  
  Caution
  You should not set a passphrase.
  ssh-keygen ssh-copy-id root@operational
  where operational is the alias, name, or IP of your operational system.

On the operational system:

Set the root account to only allow a forced command with ssh:

Replace the ssh-rsa at the start of the line (probably the only one) in ~root/.ssh/authorized_keys for the root account on the spare system with:

command="rrsync -ro /usr2" ssh-rsa

Tip

If your spare system is registered with DNS, you can provide some additional security by adding from="node" (note the trailing space) at the start of the line, where node is the FQDN or IP address of the spare system. It may be necessary to provide the FQDN, IP address, and/or alias of the spare system in a comma separated list in place of node to get reliable operation.

Set sshd to only allowed forced commands for root by replacing yes with forced-commands-only on the uncommented PermitRootLogin line.
Restart sshd. Execute:
```
systemctl restart sshd
```

7.8.2. Using refresh_spare_usr2

Note	For CIS hardened systems, you should use the instructions in the Using backup_usr2 with CIS hardening section in the Additional items for FS operations appendix of the CIS hardening for FSL11 document.

As part of a monthly backup, you would usually start a disk rotation on both the operational and spare systems first. Once both systems are recovering, you should log out of both systems. You can also use refresh_spare_usr2 at other times to “freshen” /usr2 on the spare system.

Start with no one logged into either system.

Important
Before proceeding, make sure that no one is logged into either system and that no processes are running on /usr2 on either system, particularly the FS.

The best choice for this is as root on a local virtual console text terminal.

Tip

Logging in as a non-root user will also work. Any available means can be used: a text console, ssh from another system (preferably not the operational system), or the graphics X11 display. You must then promote to root using su.

Caution

If you use the -I option (which would not normally be used) of refresh_spare_usr2, you must change your working directory to be somewhere off of /usr2, e.g., /tmp, before using su to promote to root. We have made an effort to make this reliable, but there still may be a chance that the script will fail with the error umount: /usr2: target is busy.. If this happens, you can try to recover by simply rerunning the script. This should work because although the error happens in the critical phase (see refresh_spare_usr2 -h), the /usr2 partition does not get unmounted when it occurs. It might take more than one try of rerunning to achieve success.

Execute the script on the spare system:
```
refresh_spare_usr2
```
Answer the question y if it is safe to proceed.
Log out of the spare system.

Wait : Wait until the refresh_spare_usr2 script has finished before logging in again and resuming other activities on the systems.

An email will be sent to root when the script finishes. If your email to root is being forwarded to a mailbox off the system, you can use receipt of that message (and that it shows no errors) as the indication that it finished successfully.

Alternatively you can examine the logs (before starting the script) in /root/refresh_spare_usr2_logs on the spare system to see how long the script typically takes. When at least that much time has elapsed, you can login to the spare system and can check the new log to verify that it has finished.

Caution

Generally speaking, it is best to not login to either the spare or operational system while the script is running. Under normal circumstances the script should run quickly enough that this does not cause a significant burden. If it is necessary to login to either system, the following paragraphs in this CAUTION cover the relevant considerations.

If you do login to the spare system, it is best to not use an account with a home directory on the /usr2 partition (logging in as root on a text console is okay) or otherwise access that partition while the script is running. In any event, activity on /usr2 should be minimized.

It is possible to use the operational system while the script is running if necessary, but this should be avoided if possible and activity on the /usr2 partition should be minimized. You should not expect any changes on the operational system /usr2 that occur after the script starts to be propagated to the spare system. If any files are deleted before they can be transferred, there will be a warning file has vanished: "file", for each such file, and there will be a summary warning that starts with rsync warning: some files vanished before they could be transferred, but without additional warnings or errors, the transfer should otherwise be successful.

In case you have logged into either system while the script is running, you can touch-up the copy on the spare system, by rerunning the script after logging out.

If the refresh_spare_usr2 script finished with no problems, you can reboot the spare system as soon as is convenient. You may reboot even if the RAID is recovering, but you can wait until the recovery is complete. The only requirement is to reboot before the FS is run again on the spare system.

8. Multiple computer set-up

You may have more than one FSL11 computer at a site, either an operational and spare for one system and/or additional computers for a additional systems. In this case, we recommend that you do a full setup of each computer from scratch from FSL11 installation notes. The main, but not only, reason for this is to make sure each RAID has a unique UUID, so the refresh_secondary script will be able to help you avoid accidentally mixing disks while doing a refresh. While in principle is it possible to do one set-up and clone the configuration to more disks and then customize for each computer, we are not providing detailed instructions on how to do that at this time.

It is recommended that the network configuration on each machine be made independent of the MAC address of the hardware. This will make it possible to move a RAID pair to a different computer and have it work on the network. Please note that the IP address and host name is tied to the disks and not the computers. For information on how to configure this, please see the (optional) Stabilize network configuration section of the FS Linux 11 Installation Guide document.

The configuration of the system outside of the /usr2 partition between operational and spare computers should be maintained in parallel so that the same capabilities are available on both. In particular, any packages installed on one should also be installed on the other. It should not be necessary to maintain parallelism with OS updates, but that is recommended as well. It is recommended to maintain parallelism with other independent operational/spare systems at a site as well for simplicity. This may enable additional recovery options in extreme cases.

9. Recovery scenarios

The setup provided by FSL11 provides several layers of recovery in case of problems with the computers or the disks. Each system has a shelf disk, which can serve as a back-up. Additionally if there is a spare computer for each operational computer, there are additional recovery options. If there are other FSL11 computers at the site, it may be possible in extreme cases to press those computers and/or disks into service, particularly if they have been maintained in parallel.

A few example recovery scenarios are described below in rough order of likelihood of being needed. None of them are very likely to be needed, particularly those beyond the first two.

Important

In any scenario, if disks and/or a computer have failed, they should be repaired or replaced as soon as feasible.

9.1. Operational computer failure

This might be caused by a power supply or other hardware failure. If the contents of the operational RAID are not damaged, the RAID pair can be moved to the spare computer until the operational computer is repaired. Once the RAID has been moved, whether the contents have been damaged can be assessed. It will be necessary to move connections for any serial/GPIB devices to the spare computer as well.

Tip

If the disks do not connect to network after first booting in a different computer:

Shut the system down.
Remove the power cord.
Press and hold the power button for 15 or more seconds.

The goal is drain any residual energy in the computer in order to completely reset the NIC.
Reboot and try again.

This has been seen to solve the problem, perhaps because it forces the NIC to re-register with ARP. Waiting longer may also solve the problem.

9.2. One disk in the operational computer RAID fails

This should not interrupt operations. The computer should continue to run seamlessly on the remaining disk. If the system is rebooted in this state, it should use the working disk. At the first opportunity, usually after operations, the recover_raid script can be tried to restore the disk to the RAID. If that doesn’t work, the disk may have failed and may need to be replaced (it may worthwhile to try blanking and refreshing it first). If the disk has failed, it should be removed and a disk rotation should be performed (with the still good disk in the primary slot) to refresh the shelf disk and make a working RAID. The failed disk should be repaired or replaced with a new disk that is at least as large. The blank_secondary script should be used to erase the new disk before it is introduced into the rotation sequence. See the Initialize a new disk section above for full details on initializing a new disk.

9.3. Operational computer RAID corrupted

As well as a large scale corruption, this can include recovery from accidental loss of important non-volatile files. This would generally not include .skd, .snp, and .prc files; those can be more easily restored by generating them again. It also can be used to recover from a bad OS patch (which is extremely unlikely). That is easier to manage if the patches were applied just after a disk rotation (see also the Recoverable testing section).

In this case, the shelf disk can be used to restore the system to the state at the time of the most recent rotation. To do this, follow the procedure in Recover from a shelf disk section above. The system can be used for operations once the RAID is recovering for the first refresh in the procedure. All needed volatile operational files that were created/modified after the last disk rotation will need to be recreated. Then as time allows, the other disk can recovered by finishing the procedure in Recover from a shelf disk section.

If the first disk that is tried for blanking and recovery doesn’t work, the other one can be tried. If neither works, it should be possible to run on just what was the shelf disk until a fuller recovery is possible, probably with replacements for the malfunctioning disks.

This approach could also be used for a similar problem with the spare computer and using its shelf disk for recovery.

This approach of this section should not be used if a problem with the operational computer caused the damage to its RAID. In that case, follow the Operational computer RAID corrupted and operational computer failure subsection below.

9.4. Operational computer RAID corrupted and operational computer failure

This might happen if the operational computer is exposed to fire and/or water. In this case, there are two options. One is switching to using the spare computer as in the Loss of operational computer and all its disks subsection below. The other is to use the operational computer’s shelf disk in the spare computer, either by itself or by making a ersatz RAID by blanking the spare computer’s shelf disk and refreshing it from the operational computer’s shelf disk.

In the latter scenario, be sure to preserve the original working RAID from the spare computer. All needed volatile operational files that were created/modified after the last operational computer disk rotation will need to be recreated. It will be necessary to move connections for any serial/GPIB devices to the spare computer as well. However, it will not be necessary to enable any daemon’s like metserver and metclient as it would be in the former scenario; this may be a significant time saver.

9.5. Loss of all operational computer disks

If the RAID and shelf disk on the operational computer are beyond recovery, the RAID pair from the spare computer can be moved to the operational computer. All needed volatile operational files that were created/modified after the last refresh_spare_usr2 will need to be recreated. If daemons like metserver and metclient are needed, they will need to be enabled.

This approach should not be used if a problem with the operational computer caused the damage to its RAID. In that case, follow the Operational computer RAID corrupted and operational computer failure subsection above.

9.6. Loss of operational computer and all its disks

In this case, operations should be moved to the spare computer until the operational computer is repaired or replaced. It will be necessary to move connections for any serial/GPIB devices to the spare computer as well. If daemons like metserver and metclient are needed, they will need to be enabled. All needed volatile operational files that were created/modified after the last refresh_spare_usr2 will need to be recreated.