Skip to content

Commit

Permalink
Merge pull request ceph#51572 from zdover23/wip-doc-2023-04-19-rados-…
Browse files Browse the repository at this point in the history
…operations-devices

doc/rados: line-edit devices.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
  • Loading branch information
zdover23 committed May 18, 2023
2 parents 0345083 + 8d589b4 commit 0b7d770
Showing 1 changed file with 17 additions and 19 deletions.
36 changes: 17 additions & 19 deletions doc/rados/operations/devices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ the following forms:
ceph device ls-by-daemon <daemon>
ceph device ls-by-host <host>

To see information about the location of an individual device and about how the
To see information about the location of an specific device and about how the
device is being consumed, run a command of the following form:

.. prompt:: bash $
Expand All @@ -43,9 +43,9 @@ command of the following form::

device light on|off <devid> [ident|fault] [--force]

.. note:: In some situations (depending on your kernel revision or your SES
firmware or the setup of your HBA), using this command to blink the lights
will not work.
.. note:: Using this command to blink the lights might not work. Whether it
works will depend upon such factors as your kernel revision, your SES
firmware, or the setup of your HBA.

The ``<devid>`` parameter is the device identification. To retrieve this
information, run the following command:
Expand Down Expand Up @@ -118,8 +118,7 @@ form:

By default, device metrics are scraped once every 24 hours.


To manually scrape all devices , run the following command:
To manually scrape all devices, run the following command:

.. prompt:: bash $

Expand Down Expand Up @@ -151,7 +150,7 @@ Ceph can predict drive life expectancy and device failures by analyzing the
health metrics that it collects. The prediction modes are as follows:

* *none*: disable device failure prediction.
* *local*: use a pre-trained prediction model from the ``ceph-mgr`` daemon
* *local*: use a pre-trained prediction model from the ``ceph-mgr`` daemon.

To configure the prediction mode, run a command of the following form:

Expand All @@ -174,16 +173,16 @@ To see the metadata of a specific device, run a command of the following form:

ceph device info <devid>

To explicitly force prediction of a device's life expectancy, run a command of
the following form:
To explicitly force prediction of a specific device's life expectancy, run a
command of the following form:

.. prompt:: bash $

ceph device predict-life-expectancy <devid>

In addition to Ceph's internal device failure prediction, you might have an
external source of information about device failures. To inform Ceph of a
device's life expectancy, run a command of the following form:
specific device's life expectancy, run a command of the following form:

.. prompt:: bash $

Expand All @@ -198,8 +197,8 @@ Health alerts
-------------

The ``mgr/devicehealth/warn_threshold`` configuration option controls the
health check for an expected device failure. If the device failure is expected
to occur within the specified time interval, an alert is raised.
health check for an expected device failure. If the device is expected to fail
within the specified time interval, an alert is raised.

To check the stored life expectancy of all devices and generate any appropriate
health alert, run the following command:
Expand All @@ -216,13 +215,12 @@ migrates data away from devices that are expected to fail soon. If this option
is enabled, the module marks such devices ``out`` so that automatic migration
will occur.

.. note:: The ``mon_osd_min_up_ratio`` can help to prevent this process from
cascading to total failure. In a situation in which the "self heal" module
marks out a number of OSDs sufficient to exceed the ratio set by
``mon_osd_min_up_ratio``, the cluster raises the ``DEVICE_HEALTH_TOOMANY``
health state. See
:ref:`DEVICE_HEALTH_TOOMANY<rados_health_checks_device_health_toomany>` for
instructions on what to do in this situation.
.. note:: The ``mon_osd_min_up_ratio`` configuration option can help prevent
this process from cascading to total failure. If the "self heal" module
marks ``out`` so many OSDs that the ratio value of ``mon_osd_min_up_ratio``
is exceeded, then the cluster raises the ``DEVICE_HEALTH_TOOMANY`` health
check. For instructions on what to do in this situation, see
:ref:`DEVICE_HEALTH_TOOMANY<rados_health_checks_device_health_toomany>`.

The ``mgr/devicehealth/mark_out_threshold`` configuration option specifies the
time interval for automatic migration. If a device is expected to fail within
Expand Down

0 comments on commit 0b7d770

Please sign in to comment.