raidctl(8) - NetBSD Manual Pages

Command:
Section:
Arch:
Collection:
>>>
RAIDCTL(8)              NetBSD System Manager's Manual              RAIDCTL(8)


NAME

     raidctl -- configuration utility for the RAIDframe disk driver


SYNOPSIS

     raidctl dev command [arg [...]]
     raidctl [-v] -A [yes | no | forceroot | softroot] dev
     raidctl [-v] -a component dev
     raidctl [-v] -B dev
     raidctl [-v] -C config_file dev
     raidctl [-v] -c config_file dev
     raidctl [-v] -F component dev
     raidctl [-v] -f component dev
     raidctl [-v] -G dev
     raidctl [-v] -g component dev
     raidctl [-v] -I serial_number dev
     raidctl [-v] -i dev
     raidctl [-v] -L dev
     raidctl [-v] -M [yes | no | set params] dev
     raidctl [-v] -m dev
     raidctl [-v] -P dev
     raidctl [-v] -p dev
     raidctl [-v] -R component dev
     raidctl [-v] -r component dev
     raidctl [-v] -S dev
     raidctl [-v] -s dev
     raidctl [-v] -t config_file
     raidctl [-v] -U unit dev
     raidctl [-v] -u dev


DESCRIPTION

     raidctl is the user-land control program for raid(4), the RAIDframe disk
     device.  raidctl is primarily used to dynamically configure and unconfig-
     ure RAIDframe disk devices.  For more information about the RAIDframe
     disk device, see raid(4).

     This document assumes the reader has at least rudimentary knowledge of
     RAID and RAID concepts.

     The simplified command-line options for raidctl are as follows:

     create level component1 component2 ...
             where level specifies the RAID level and is one of 0 , 1 (or
             mirror ), or 5 and each of componentN specify the devices to be
             configured into the RAID set.

     The advanced command-line options for raidctl are as follows:

     -A yes dev
             Make the RAID set auto-configurable.  The RAID set will be auto-
             matically configured at boot before the root file system is
             mounted.  Note that all components of the set must be of type
             RAID in the disklabel.

     -A no dev
             Turn off auto-configuration for the RAID set.

     -A forceroot dev
             Make the RAID set auto-configurable, and also mark the set as
             being eligible to be the root partition.  A RAID set configured
             this way will override the use of the boot disk as the root
             device.  All components of the set must be of type RAID in the
             disklabel.  Note that only certain architectures (currently arc,
             alpha, amd64, bebox, cobalt, emips, evbarm, i386, landisk, ofppc,
             pmax, riscv, sandpoint, sgimips, sparc, sparc64, and vax) support
             booting a kernel directly from a RAID set.  Please note that
             forceroot mode was referred to as root mode on earlier versions
             of NetBSD.  For compatibility reasons, root can be used as an
             alias for forceroot.

     -A softroot dev
             Like forceroot, but only change the root device if the boot
             device is part of the RAID set.

     -a component dev
             Add component as a hot spare for the device dev.  Component
             labels (which identify the location of a given component within a
             particular RAID set) are automatically added to the hot spare
             after it has been used and are not required for component before
             it is used.

     -B dev  Initiate a copyback of reconstructed data from a spare disk to
             its original disk.  This is performed after a component has
             failed, and the failed drive has been reconstructed onto a spare
             drive.

     -C config_file dev
             As for -c, but forces the configuration to take place.  Fatal
             errors due to uninitialized components are ignored.  This is
             required the first time a RAID set is configured.

     -c config_file dev
             Configure the RAIDframe device dev according to the configuration
             given in config_file.  A description of the contents of
             config_file is given later.

     -F component dev
             Fails the specified component of the device, and immediately
             begin a reconstruction of the failed disk onto an available hot
             spare.  This is one of the mechanisms used to start the recon-
             struction process if a component does have a hardware failure.

     -f component dev
             This marks the specified component as having failed, but does not
             initiate a reconstruction of that component.

     -G dev  Generate the configuration of the RAIDframe device in a format
             suitable for use with the -c or -C options.

     -g component dev
             Get the component label for the specified component.

     -I serial_number dev
             Initialize the component labels on each component of the device.
             serial_number is used as one of the keys in determining whether a
             particular set of components belong to the same RAID set.  While
             not strictly enforced, different serial numbers should be used
             for different RAID sets.  This step MUST be performed when a new
             RAID set is created.

     -i dev  Initialize the RAID device.  In particular, (re-)write the parity
             on the selected device.  This MUST be done for all RAID sets
             before the RAID device is labeled and before file systems are
             created on the RAID device.

     -L dev  Rescan all devices on the system, looking for RAID sets that can
             be auto-configured.  The RAID device provided here has to be a
             valid device, but does not need to be configured.  (e.g.

                   raidctl -L raid0

             is all that is needed to perform a rescan.)

     -M yes dev
             Enable the use of a parity map on the RAID set; this is the
             default, and greatly reduces the time taken to check parity after
             unclean shutdowns at the cost of some very slight overhead during
             normal operation.  Changes to this setting will take effect the
             next time the set is configured.  Note that RAID-0 sets, having
             no parity, will not use a parity map in any case.

     -M no dev
             Disable the use of a parity map on the RAID set; doing this is
             not recommended.  This will take effect the next time the set is
             configured.

     -M set cooldown tickms regions dev
             Alter the parameters of the parity map; parameters to leave
             unchanged can be given as 0, and trailing zeroes may be omitted.
             The RAID set is divided into regions regions; each region is
             marked dirty for at most cooldown intervals of tickms millisec-
             onds each after a write to it, and at least cooldown - 1 such
             intervals.  Changes to regions take effect the next time is con-
             figured, while changes to the other parameters are applied imme-
             diately.  The default parameters are expected to be reasonable
             for most workloads.

     -m dev  Display status information about the parity map on the RAID set,
             if any.  If used with -v then the current contents of the parity
             map will be output (in hexadecimal format) as well.

     -P dev  Check the status of the parity on the RAID set, and initialize
             (re-write) the parity if the parity is not known to be up-to-
             date.  This is normally used after a system crash (and before a
             fsck(8)) to ensure the integrity of the parity.

     -p dev  Check the status of the parity on the RAID set.  Displays a sta-
             tus message, and returns successfully if the parity is up-to-
             date.

     -R component dev
             Fails the specified component, if necessary, and immediately
             begins a reconstruction back to component.  This is useful for
             reconstructing back onto a component after it has been replaced
             following a failure.

     -r component dev
             Remove the specified component from the RAID. The component must
             be in the failed, spare, or spared state in order to be removed.

     -S dev  Check the status of parity re-writing, component reconstruction,
             and component copyback.  The output indicates the amount of
             progress achieved in each of these areas.

     -s dev  Display the status of the RAIDframe device for each of the compo-
             nents and spares.

     -t config_file
             Read and parse the config_file, reporting any errors, then exit.
             No raidframe operations are performed.

     -U unit dev
             Set the last_unit field in all the raid components, so that the
             next time the raid will be autoconfigured it uses that unit.

     -u dev  Unconfigure the RAIDframe device.  This does not remove any com-
             ponent labels or change any configuration settings (e.g. auto-
             configuration settings) for the RAID set.

     -v      Be more verbose.  For operations such as reconstructions, parity
             re-writing, and copybacks, provide a progress indicator.

     The device used by raidctl is specified by dev.  dev may be either the
     full name of the device, e.g., /dev/rraid0d, for the i386 architecture,
     or /dev/rraid0c for many others, or just simply raid0 (for
     /dev/rraid0[cd]).  It is recommended that the partitions used to repre-
     sent the RAID device are not used for file systems.

   Simple RAID configuration
     For simple RAID configurations using RAID levels 0 (simple striping), 1
     (mirroring), or 5 (striping with distributed parity) raidctl supports
     command-line configuration of RAID setups without the use of a configura-
     tion file.  For example,

           raidctl raid0 create 0 /dev/wd0e /dev/wd1e /dev/wd2e

     will create a RAID level 0 set on the device named raid0 using the compo-
     nents /dev/wd0e, /dev/wd1e, and /dev/wd2e.  Similarly,

           raidctl raid0 create mirror absent /dev/wd1e

     will create a RAID level 1 (mirror) set with an absent first component
     and /dev/wd1e as the second component.  In all cases the resulting RAID
     device will be marked as auto-configurable, will have a serial number set
     (based on the current time), and parity will be initialized (if the RAID
     level has parity and sufficent components are present).  Reasonable per-
     formance values are automatically used by default for other parameters
     normally specified in the configuration file.

   Configuration file
     The format of the configuration file is complex, and only an abbreviated
     treatment is given here.  In the configuration files, a `#' indicates the
     beginning of a comment.

     There are 4 required sections of a configuration file, and 2 optional
     sections.  Each section begins with a `START', followed by the section
     name, and the configuration parameters associated with that section.  The
     first section is the `array' section, and it specifies the number of col-
     umns, and spare disks in the RAID set.  For example:

           START array
           3 0

     indicates an array with 3 columns, and 0 spare disks.  Old configurations
     specified a 3rd value in front of the number of columns and spare disks.
     This old value, if provided, must be specified as 1:

           START array
           1 3 0

     The second section, the `disks' section, specifies the actual components
     of the device.  For example:

           START disks
           /dev/sd0e
           /dev/sd1e
           /dev/sd2e

     specifies the three component disks to be used in the RAID device.  Disk
     wedges may also be specified with the NAME=<wedge name> syntax.  If any
     of the specified drives cannot be found when the RAID device is config-
     ured, then they will be marked as `failed', and the system will operate
     in degraded mode.  Note that it is imperative that the order of the com-
     ponents in the configuration file does not change between configurations
     of a RAID device.  Changing the order of the components will result in
     data loss if the set is configured with the -C option.  In normal circum-
     stances, the RAID set will not configure if only -c is specified, and the
     components are out-of-order.

     The next section, which is the `spare' section, is optional, and, if
     present, specifies the devices to be used as `hot spares' -- devices
     which are on-line, but are not actively used by the RAID driver unless
     one of the main components fail.  A simple `spare' section might be:

           START spare
           /dev/sd3e

     for a configuration with a single spare component.  If no spare drives
     are to be used in the configuration, then the `spare' section may be
     omitted.

     The next section is the `layout' section.  This section describes the
     general layout parameters for the RAID device, and provides such informa-
     tion as sectors per stripe unit, stripe units per parity unit, stripe
     units per reconstruction unit, and the parity configuration to use.  This
     section might look like:

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
           32 1 1 5

     The sectors per stripe unit specifies, in blocks, the interleave factor;
     i.e., the number of contiguous sectors to be written to each component
     for a single stripe.  Appropriate selection of this value (32 in this
     example) is the subject of much research in RAID architectures.  The
     stripe units per parity unit and stripe units per reconstruction unit are
     normally each set to 1.  While certain values above 1 are permitted, a
     discussion of valid values and the consequences of using anything other
     than 1 are outside the scope of this document.  The last value in this
     section (5 in this example) indicates the parity configuration desired.
     Valid entries include:

     0     RAID level 0.  No parity, only simple striping.

     1     RAID level 1.  Mirroring.  The parity is the mirror.

     4     RAID level 4.  Striping across components, with parity stored on
           the last component.

     5     RAID level 5.  Striping across components, parity distributed
           across all components.

     There are other valid entries here, including those for Even-Odd parity,
     RAID level 5 with rotated sparing, Chained declustering, and Interleaved
     declustering, but as of this writing the code for those parity operations
     has not been tested with NetBSD.

     The next required section is the `queue' section.  This is most often
     specified as:

           START queue
           fifo 100

     where the queuing method is specified as fifo (first-in, first-out), and
     the size of the per-component queue is limited to 100 requests.  Other
     queuing methods may also be specified, but a discussion of them is beyond
     the scope of this document.

     The final section, the `debug' section, is optional.  For more details on
     this the reader is referred to the RAIDframe documentation discussed in
     the HISTORY section.

     Since NetBSD 10 RAIDframe has been been capable of autoconfiguration of
     components originally configured on opposite endian systems.  The current
     label endianness will be retained.

     See EXAMPLES for a more complete configuration file example.


FILES

     /dev/{,r}raid*  raid device special files.


EXAMPLES

     The examples given in this section are for more complex setups than can
     be configured with the simplified command-line configuration option
     described early.

     It is highly recommended that before using the RAID driver for real file
     systems that the system administrator(s) become quite familiar with the
     use of raidctl, and that they understand how the component reconstruction
     process works.  The examples in this section will focus on configuring a
     number of different RAID sets of varying degrees of redundancy.  By work-
     ing through these examples, administrators should be able to develop a
     good feel for how to configure a RAID set, and how to initiate recon-
     struction of failed components.

     In the following examples `raid0' will be used to denote the RAID device.
     Depending on the architecture, /dev/rraid0c or /dev/rraid0d may be used
     in place of raid0.

   Initialization and Configuration
     The initial step in configuring a RAID set is to identify the components
     that will be used in the RAID set.  All components should be the same
     size.  Each component should have a disklabel type of FS_RAID, and a typ-
     ical disklabel entry for a RAID component might look like:

           f:  1800000  200495     RAID              # (Cyl.  405*- 4041*)

     While FS_BSDFFS will also work as the component type, the type FS_RAID is
     preferred for RAIDframe use, as it is required for features such as auto-
     configuration.  As part of the initial configuration of each RAID set,
     each component will be given a `component label'.  A `component label'
     contains important information about the component, including a user-
     specified serial number, the column of that component in the RAID set,
     the redundancy level of the RAID set, a `modification counter', and
     whether the parity information (if any) on that component is known to be
     correct.  Component labels are an integral part of the RAID set, since
     they are used to ensure that components are configured in the correct
     order, and used to keep track of other vital information about the RAID
     set.  Component labels are also required for the auto-detection and auto-
     configuration of RAID sets at boot time.  For a component label to be
     considered valid, that particular component label must be in agreement
     with the other component labels in the set.  For example, the serial num-
     ber, `modification counter', and number of columns must all be in agree-
     ment.  If any of these are different, then the component is not consid-
     ered to be part of the set.  See raid(4) for more information about com-
     ponent labels.

     Once the components have been identified, and the disks have appropriate
     labels, raidctl is then used to configure the raid(4) device.  To config-
     ure the device, a configuration file which looks something like:

           START array
           # numCol numSpare
           3 1

           START disks
           /dev/sd1e
           /dev/sd2e
           /dev/sd3e

           START spare
           /dev/sd4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
           32 1 1 5

           START queue
           fifo 100

     is created in a file.  The above configuration file specifies a RAID 5
     set consisting of the components /dev/sd1e, /dev/sd2e, and /dev/sd3e,
     with /dev/sd4e available as a `hot spare' in case one of the three main
     drives should fail.  A RAID 0 set would be specified in a similar way:

           START array
           # numCol numSpare
           4 0

           START disks
           /dev/sd10e
           /dev/sd11e
           /dev/sd12e
           /dev/sd13e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
           64 1 1 0

           START queue
           fifo 100

     In this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e, and /dev/sd13e
     are the components that make up this RAID set.  Note that there are no
     hot spares for a RAID 0 set, since there is no way to recover data if any
     of the components fail.

     For a RAID 1 (mirror) set, the following configuration might be used:

           START array
           # numCol numSpare
           2 0

           START disks
           /dev/sd20e
           /dev/sd21e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
           128 1 1 1

           START queue
           fifo 100

     In this case, /dev/sd20e and /dev/sd21e are the two components of the
     mirror set.  While no hot spares have been specified in this configura-
     tion, they easily could be, just as they were specified in the RAID 5
     case above.  Note as well that RAID 1 sets are currently limited to only
     2 components.  At present, n-way mirroring is not possible.

     The first time a RAID set is configured, the -C option must be used:

           raidctl -C raid0.conf raid0

     where raid0.conf is the name of the RAID configuration file.  The -C
     forces the configuration to succeed, even if any of the component labels
     are incorrect.  The -C option should not be used lightly in situations
     other than initial configurations, as if the system is refusing to con-
     figure a RAID set, there is probably a very good reason for it.  After
     the initial configuration is done (and appropriate component labels are
     added with the -I option) then raid0 can be configured normally with:

           raidctl -c raid0.conf raid0

     When the RAID set is configured for the first time, it is necessary to
     initialize the component labels, and to initialize the parity on the RAID
     set.  Initializing the component labels is done with:

           raidctl -I 112341 raid0

     where `112341' is a user-specified serial number for the RAID set.  This
     initialization step is required for all RAID sets.  As well, using dif-
     ferent serial numbers between RAID sets is strongly encouraged, as using
     the same serial number for all RAID sets will only serve to decrease the
     usefulness of the component label checking.

     Initializing the RAID set is done via the -i option.  This initialization
     MUST be done for all RAID sets, since among other things it verifies that
     the parity (if any) on the RAID set is correct.  Since this initializa-
     tion may be quite time-consuming, the -v option may be also used in con-
     junction with -i:

           raidctl -iv raid0

     This will give more verbose output on the status of the initialization:

           Initiating re-write of parity
           Parity Re-write status:
            10% |****                                   | ETA:    06:03 /

     The output provides a `Percent Complete' in both a numeric and graphical
     format, as well as an estimated time to completion of the operation.

     Since it is the parity that provides the `redundancy' part of RAID, it is
     critical that the parity is correct as much as possible.  If the parity
     is not correct, then there is no guarantee that data will not be lost if
     a component fails.

     Once the parity is known to be correct, it is then safe to perform
     disklabel(8), newfs(8), or fsck(8) on the device or its file systems, and
     then to mount the file systems for use.

     Under certain circumstances (e.g., the additional component has not
     arrived, or data is being migrated off of a disk destined to become a
     component) it may be desirable to configure a RAID 1 set with only a sin-
     gle component.  This can be achieved by using the word ``absent'' to
     indicate that a particular component is not present.  In the following:

           START array
           # numCol numSpare
           2 0

           START disks
           absent
           /dev/sd0e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
           128 1 1 1

           START queue
           fifo 100

     /dev/sd0e is the real component, and will be the second disk of a RAID 1
     set.  The first component is simply marked as being absent.  Configura-
     tion (using -C and -I 12345 as above) proceeds normally, but initializa-
     tion of the RAID set will have to wait until all physical components are
     present.  After configuration, this set can be used normally, but will be
     operating in degraded mode.  Once a second physical component is
     obtained, it can be hot-added, the existing data mirrored, and normal
     operation resumed.

     The size of the resulting RAID set will depend on the number of data com-
     ponents in the set.  Space is automatically reserved for the component
     labels, and the actual amount of space used for data on a component will
     be rounded down to the largest possible multiple of the sectors per
     stripe unit (sectPerSU) value.  Thus, the amount of space provided by the
     RAID set will be less than the sum of the size of the components.

   Maintenance of the RAID set
     After the parity has been initialized for the first time, the command:

           raidctl -p raid0

     can be used to check the current status of the parity.  To check the par-
     ity and rebuild it necessary (for example, after an unclean shutdown) the
     command:

           raidctl -P raid0

     is used.  Note that re-writing the parity can be done while other opera-
     tions on the RAID set are taking place (e.g., while doing a fsck(8) on a
     file system on the RAID set).  However: for maximum effectiveness of the
     RAID set, the parity should be known to be correct before any data on the
     set is modified.

     To see how the RAID set is doing, the following command can be used to
     show the RAID set's status:

           raidctl -s raid0

     The output will look something like:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare
           Component label for /dev/sd1e:
              Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Component label for /dev/sd2e:
              Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Component label for /dev/sd3e:
              Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0
           Parity status: clean
           Reconstruction is 100% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     This indicates that all is well with the RAID set.  Of importance here
     are the component lines which read `optimal', and the `Parity status'
     line.  `Parity status: clean' indicates that the parity is up-to-date for
     this RAID set, whether or not the RAID set is in redundant or degraded
     mode.  `Parity status: DIRTY' indicates that it is not known if the par-
     ity information is consistent with the data, and that the parity informa-
     tion needs to be checked.  Note that if there are file systems open on
     the RAID set, the individual components will not be `clean' but the set
     as a whole can still be clean.

     To check the component label of /dev/sd1e, the following is used:

           raidctl -g /dev/sd1e raid0

     The output of this command will look something like:

           Component label for /dev/sd1e:
              Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
              Version: 2 Serial Number: 13432 Mod Counter: 65
              Clean: No Status: 0
              sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
              RAID Level: 5  blocksize: 512 numBlocks: 1799936
              Autoconfig: No
              Last configured as: raid0

   Dealing with Component Failures
     If for some reason (perhaps to test reconstruction) it is necessary to
     pretend a drive has failed, the following will perform that function:

           raidctl -f /dev/sd2e raid0

     The system will then be performing all operations in degraded mode, where
     missing data is re-computed from existing data and the parity.  In this
     case, obtaining the status of raid0 will return (in part):

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare

     Note that with the use of -f a reconstruction has not been started.  To
     both fail the disk and start a reconstruction, the -F option must be
     used:

           raidctl -F /dev/sd2e raid0

     The -f option may be used first, and then the -F option used later, on
     the same disk, if desired.  Immediately after the reconstruction is
     started, the status will report:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: reconstructing
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: used_spare
           [...]
           Parity status: clean
           Reconstruction is 10% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     This indicates that a reconstruction is in progress.  To find out how the
     reconstruction is progressing the -S option may be used.  This will indi-
     cate the progress in terms of the percentage of the reconstruction that
     is completed.  When the reconstruction is finished the -s option will
     show:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd4e: optimal
                      /dev/sd3e: optimal
           No spares.
           [...]
           Parity status: clean
           Reconstruction is 100% complete.
           Parity Re-write is 100% complete.
           Copyback is 100% complete.

     as /dev/sd2e has been removed and replaced with /dev/sd4e.

     If a component fails and there are no hot spares available on-line, the
     status of the RAID set might (in part) look like:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           No spares.

     In this case there are a number of options.  The first option is to add a
     hot spare using:

           raidctl -a /dev/sd4e raid0

     After the hot add, the status would then be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: failed
                      /dev/sd3e: optimal
           Spares:
                      /dev/sd4e: spare

     Reconstruction could then take place using -F as described above.

     A second option is to rebuild directly onto /dev/sd2e.  Once the disk
     containing /dev/sd2e has been replaced, one can simply use:

           raidctl -R /dev/sd2e raid0

     to rebuild the /dev/sd2e component.  As the rebuilding is in progress,
     the status will be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: reconstructing
                      /dev/sd3e: optimal
           No spares.

     and when completed, will be:

           Components:
                      /dev/sd1e: optimal
                      /dev/sd2e: optimal
                      /dev/sd3e: optimal
           No spares.

     In circumstances where a particular component is completely unavailable
     after a reboot, a special component name will be used to indicate the
     missing component.  For example:

           Components:
                      /dev/sd2e: optimal
                     component1: failed
           No spares.

     indicates that the second component of this RAID set was not detected at
     all by the auto-configuration code.  The name `component1' can be used
     anywhere a normal component name would be used.  For example, to add a
     hot spare to the above set, and rebuild to that hot spare, the following
     could be done:

           raidctl -a /dev/sd3e raid0
           raidctl -F component1 raid0

     at which point the data missing from `component1' would be reconstructed
     onto /dev/sd3e.

     When more than one component is marked as `failed' due to a non-component
     hardware failure (e.g., loss of power to two components, adapter prob-
     lems, termination problems, or cabling issues) it is quite possible to
     recover the data on the RAID set.  The first thing to be aware of is that
     the first disk to fail will almost certainly be out-of-sync with the
     remainder of the array.  If any IO was performed between the time the
     first component is considered `failed' and when the second component is
     considered `failed', then the first component to fail will not contain
     correct data, and should be ignored.  When the second component is marked
     as failed, however, the RAID device will (currently) panic the system.
     At this point the data on the RAID set (not including the first failed
     component) is still self consistent, and will be in no worse state of
     repair than had the power gone out in the middle of a write to a file
     system on a non-RAID device.  The problem, however, is that the component
     labels may now have 3 different `modification counters' (one value on the
     first component that failed, one value on the second component that
     failed, and a third value on the remaining components).  In such a situa-
     tion, the RAID set will not autoconfigure, and can only be forcibly re-
     configured with the -C option.  To recover the RAID set, one must first
     remedy whatever physical problem caused the multiple-component failure.
     After that is done, the RAID set can be restored by forcibly configuring
     the raid set without the component that failed first.  For example, if
     /dev/sd1e and /dev/sd2e fail (in that order) in a RAID set of the follow-
     ing configuration:

           START array
           4 0

           START disks
           /dev/sd1e
           /dev/sd2e
           /dev/sd3e
           /dev/sd4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
           64 1 1 5

           START queue
           fifo 100


     then the following configuration (say "recover_raid0.conf")

           START array
           4 0

           START disks
           absent
           /dev/sd2e
           /dev/sd3e
           /dev/sd4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
           64 1 1 5

           START queue
           fifo 100

     can be used with

           raidctl -C recover_raid0.conf raid0

     to force the configuration of raid0.  A

           raidctl -I 12345 raid0

     will be required in order to synchronize the component labels.  At this
     point the file systems on the RAID set can then be checked and corrected.
     To complete the re-construction of the RAID set, /dev/sd1e is simply hot-
     added back into the array, and reconstructed as described earlier.

   RAID on RAID
     RAID sets can be layered to create more complex and much larger RAID
     sets.  A RAID 0 set, for example, could be constructed from four RAID 5
     sets.  The following configuration file shows such a setup:

           START array
           # numCol numSpare
           4 0

           START disks
           /dev/raid1e
           /dev/raid2e
           /dev/raid3e
           /dev/raid4e

           START layout
           # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
           128 1 1 0

           START queue
           fifo 100

     A similar configuration file might be used for a RAID 0 set constructed
     from components on RAID 1 sets.  In such a configuration, the mirroring
     provides a high degree of redundancy, while the striping provides addi-
     tional speed benefits.

   Auto-configuration and Root on RAID
     RAID sets can also be auto-configured at boot.  To make a set auto-con-
     figurable, simply prepare the RAID set as above, and then do a:

           raidctl -A yes raid0

     to turn on auto-configuration for that set.  To turn off auto-configura-
     tion, use:

           raidctl -A no raid0

     RAID sets which are auto-configurable will be configured before the root
     file system is mounted.  These RAID sets are thus available for use as a
     root file system, or for any other file system.  A primary advantage of
     using the auto-configuration is that RAID components become more indepen-
     dent of the disks they reside on.  For example, SCSI ID's can change, but
     auto-configured sets will always be configured correctly, even if the
     SCSI ID's of the component disks have become scrambled.

     Having a system's root file system (/) on a RAID set is also allowed,
     with the `a' partition of such a RAID set being used for /.  To use
     raid0a as the root file system, simply use:

           raidctl -A forceroot raid0

     To return raid0a to be just an auto-configuring set simply use the -A yes
     arguments.

     Note that kernels can only be directly read from RAID 1 components on
     architectures that support that (currently alpha, i386, pmax, sandpoint,
     sparc, sparc64, and vax).  On those architectures, the FS_RAID file sys-
     tem is recognized by the bootblocks, and will properly load the kernel
     directly from a RAID 1 component.  For other architectures, or to support
     the root file system on other RAID sets, some other mechanism must be
     used to get a kernel booting.  For example, a small partition containing
     only the secondary boot-blocks and an alternate kernel (or two) could be
     used.  Once a kernel is booting however, and an auto-configuring RAID set
     is found that is eligible to be root, then that RAID set will be auto-
     configured and used as the root device.  If two or more RAID sets claim
     to be root devices, then the user will be prompted to select the root
     device.  At this time, RAID 0, 1, 4, and 5 sets are all supported as root
     devices.

     A typical RAID 1 setup with root on RAID might be as follows:

     1.   wd0a - a small partition, which contains a complete, bootable, basic
          NetBSD installation.

     2.   wd1a - also contains a complete, bootable, basic NetBSD installa-
          tion.

     3.   wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.

     4.   wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
          swap space.

     5.   wd0g and wd1g - a RAID 1 set, raid2, used for /usr, /home, or other
          data, if desired.

     6.   wd0h and wd1h - a RAID 1 set, raid3, if desired.

     RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
     raid0 is marked as being a root file system.  When new kernels are
     installed, the kernel is not only copied to /, but also to wd0a and wd1a.
     The kernel on wd0a is required, since that is the kernel the system boots
     from.  The kernel on wd1a is also required, since that will be the kernel
     used should wd0 fail.  The important point here is to have redundant
     copies of the kernel available, in the event that one of the drives fail.

     There is no requirement that the root file system be on the same disk as
     the kernel.  For example, obtaining the kernel from wd0a, and using sd0e
     and sd1e for raid0, and the root file system, is fine.  It is critical,
     however, that there be multiple kernels available, in the event of media
     failure.

     Multi-layered RAID devices (such as a RAID 0 set made up of RAID 1 sets)
     are not supported as root devices or auto-configurable devices at this
     point.  (Multi-layered RAID devices are supported in general, however, as
     mentioned earlier.)  Note that in order to enable component auto-detec-
     tion and auto-configuration of RAID devices, the line:

           options    RAID_AUTOCONFIG

     must be in the kernel configuration file.  See raid(4) for more details.

   Swapping on RAID
     A RAID device can be used as a swap device.  In order to ensure that a
     RAID device used as a swap device is correctly unconfigured when the sys-
     tem is shutdown or rebooted, it is recommended that the line

           swapoff=YES

     be added to /etc/rc.conf.

   Unconfiguration
     The final operation performed by raidctl is to unconfigure a raid(4)
     device.  This is accomplished via a simple:

           raidctl -u raid0

     at which point the device is ready to be reconfigured.

   Performance Tuning
     Selection of the various parameter values which result in the best per-
     formance can be quite tricky, and often requires a bit of trial-and-error
     to get those values most appropriate for a given system.  A whole range
     of factors come into play, including:

     1.   Types of components (e.g., SCSI vs. IDE) and their bandwidth

     2.   Types of controller cards and their bandwidth

     3.   Distribution of components among controllers

     4.   IO bandwidth

     5.   file system access patterns

     6.   CPU speed

     As with most performance tuning, benchmarking under real-life loads may
     be the only way to measure expected performance.  Understanding some of
     the underlying technology is also useful in tuning.  The goal of this
     section is to provide pointers to those parameters which may make signif-
     icant differences in performance.

     For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
     Since data in a RAID 1 set is arranged in a linear fashion on each compo-
     nent, selecting an appropriate stripe size is somewhat less critical than
     it is for a RAID 5 set.  However: a stripe size that is too small will
     cause large IO's to be broken up into a number of smaller ones, hurting
     performance.  At the same time, a large stripe size may cause problems
     with concurrent accesses to stripes, which may also affect performance.
     Thus values in the range of 32 to 128 are often the most effective.

     Tuning RAID 5 sets is trickier.  In the best case, IO is presented to the
     RAID set one stripe at a time.  Since the entire stripe is available at
     the beginning of the IO, the parity of that stripe can be calculated
     before the stripe is written, and then the stripe data and parity can be
     written in parallel.  When the amount of data being written is less than
     a full stripe worth, the `small write' problem occurs.  Since a `small
     write' means only a portion of the stripe on the components is going to
     change, the data (and parity) on the components must be updated slightly
     differently.  First, the `old parity' and `old data' must be read from
     the components.  Then the new parity is constructed, using the new data
     to be written, and the old data and old parity.  Finally, the new data
     and new parity are written.  All this extra data shuffling results in a
     serious loss of performance, and is typically 2 to 4 times slower than a
     full stripe write (or read).  To combat this problem in the real world,
     it may be useful to ensure that stripe sizes are small enough that a
     `large IO' from the system will use exactly one large stripe write.  As
     is seen later, there are some file system dependencies which may come
     into play here as well.

     Since the size of a `large IO' is often (currently) only 32K or 64K, on a
     5-drive RAID 5 set it may be desirable to select a SectPerSU value of 16
     blocks (8K) or 32 blocks (16K).  Since there are 4 data sectors per
     stripe, the maximum data per stripe is 64 blocks (32K) or 128 blocks
     (64K).  Again, empirical measurement will provide the best indicators of
     which values will yield better performance.

     The parameters used for the file system are also critical to good perfor-
     mance.  For newfs(8), for example, increasing the block size to 32K or
     64K may improve performance dramatically.  As well, changing the cylin-
     ders-per-group parameter from 16 to 32 or higher is often not only neces-
     sary for larger file systems, but may also have positive performance
     implications.

   Summary
     Despite the length of this man-page, configuring a RAID set is a rela-
     tively straight-forward process.  All that needs to be done is the fol-
     lowing steps:

     1.   Use disklabel(8) to create the components (of type RAID).

     2.   Construct a RAID configuration file: e.g., raid0.conf

     3.   Configure the RAID set with:

                raidctl -C raid0.conf raid0

     4.   Initialize the component labels with:

                raidctl -I 123456 raid0

     5.   Initialize other important parts of the set with:

                raidctl -i raid0

     6.   Get the default label for the RAID set:

                disklabel raid0 > /tmp/label

     7.   Edit the label:

                vi /tmp/label

     8.   Put the new label on the RAID set:

                disklabel -R -r raid0 /tmp/label

     9.   Create the file system:

                newfs /dev/rraid0e

     10.  Mount the file system:

                mount /dev/raid0e /mnt

     11.  Use:

                raidctl -c raid0.conf raid0

          To re-configure the RAID set the next time it is needed, or put
          raid0.conf into /etc where it will automatically be started by the
          /etc/rc.d scripts.


SEE ALSO

     ccd(4), raid(4), rc(8)


HISTORY

     RAIDframe is a framework for rapid prototyping of RAID structures devel-
     oped by the folks at the Parallel Data Laboratory at Carnegie Mellon Uni-
     versity (CMU).  A more complete description of the internals and func-
     tionality of RAIDframe is found in the paper "RAIDframe: A Rapid Proto-
     typing Tool for RAID Systems", by William V. Courtright II, Garth Gibson,
     Mark Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
     Parallel Data Laboratory of Carnegie Mellon University.  The raidctl com-
     mand first appeared as a program in CMU's RAIDframe v1.1 distribution.
     This version of raidctl is a complete re-write, and first appeared in
     NetBSD 1.4.


COPYRIGHT

     The RAIDframe Copyright is as follows:

     Copyright (c) 1994-1996 Carnegie-Mellon University.
     All rights reserved.

     Permission to use, copy, modify and distribute this software and
     its documentation is hereby granted, provided that both the copyright
     notice and this permission notice appear in all copies of the
     software, derivative works or modified versions, and any portions
     thereof, and that both notices appear in supporting documentation.

     CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
     CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
     FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.

     Carnegie Mellon requests users of this software to return to

      Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
      School of Computer Science
      Carnegie Mellon University
      Pittsburgh PA 15213-3890

     any improvements or extensions that they make and grant Carnegie the
     rights to redistribute these changes.


WARNINGS

     Certain RAID levels (1, 4, 5, 6, and others) can protect against some
     data loss due to component failure.  However the loss of two components
     of a RAID 4 or 5 system, or the loss of a single component of a RAID 0
     system will result in the entire file system being lost.  RAID is NOT a
     substitute for good backup practices.

     Recomputation of parity MUST be performed whenever there is a chance that
     it may have been compromised.  This includes after system crashes, or
     before a RAID device has been used for the first time.  Failure to keep
     parity correct will be catastrophic should a component ever fail -- it is
     better to use RAID 0 and get the additional space and speed, than it is
     to use parity, but not keep the parity correct.  At least with RAID 0
     there is no perception of increased data security.

     When replacing a failed component of a RAID set, it is a good idea to
     zero out the first 64 blocks of the new component to insure the RAIDframe
     driver doesn't erroneously detect a component label in the new component.
     This is particularly true on RAID 1 sets because there is at most one
     correct component label in a failed RAID 1 installation, and the RAID-
     frame driver picks the component label with the highest serial number and
     modification value as the authoritative source for the failed RAID set
     when choosing which component label to use to configure the RAID set.


BUGS

     Hot-spare removal is currently not available.

NetBSD 10.1                   September 20, 2023                   NetBSD 10.1