мониторинг smart hdd zabbix windows

S.M.A.R.T + Zabbix | Windows

Наткнулся на статью сис.админа Zerox на счёт мониторинга S.M.A.R.T. диска средствами Zabbix. Но что-то у меня никак не получалось по его записи. Поэтому я опишу свой опыт настройки это необходимой вещи.

Если вы здесь потому что вы начали майнить Chia можете написать мне все мои Контакты, могу предложить услугу по внедрению этого решения на ваши фермы, уже есть опыт.

Будем разворачивать решение с Github. По сути, эта запись просто перевод с небольшими пояснениями. 🙂

Все необходимые компоненты я сложил в архив, который можно скачать с Я.Диска (если ссылка сломалась пишите к комменты, стучите на почту, смотрите на github).

Возможности решения

Данное решает такие задачи:

Подготовка Zabbix-Server

Всё, что Вам потребуется, это добавить замечательный шаблон в свой Zabbix.

Подготовка Zabbix-Agent Windows

Установка smartmontools

Ничего необычного, просто устанавливаем smartmontools, как обычную программу. Единственный момент, не рекомендую менять путь, иначе его надо будет менять в конфиге агента и в скрипте.

Конфигурирование агента

Создаем папку scripts и помещаем туда наш скрипт smartctl-disks-discovery.ps1

Открываем zabbix_agentd.conf и правим

И добавляем пользовательскую проверку

Осталось перезапустить агента и привязать наш хост к шаблону.

Примерно через час прилетят данные. (Для отладки можно поменять время обнаружения, я обычно ставлю 10 минут, меняем 1h на 10m. Главное, не забыть обратно вернуть).

мониторинг smart hdd zabbix windows. 2021 05 17 11 00 31. мониторинг smart hdd zabbix windows фото. мониторинг smart hdd zabbix windows-2021 05 17 11 00 31. картинка мониторинг smart hdd zabbix windows. картинка 2021 05 17 11 00 31. Наткнулся на статью сис.админа Zerox на счёт мониторинга S.M.A.R.T. диска средствами Zabbix. Но что-то у меня никак не получалось по его записи. Поэтому я опишу свой опыт настройки это необходимой вещи.

Результат

Таким образом мы настроили мониторинг SSD и HDD дисков. Данное решение отлично показывает себя в проде. По критически важным дискам можно строить вот такие информативные графики. Мне нравится 🙂

мониторинг smart hdd zabbix windows. 2021 05 17 11 05 10. мониторинг smart hdd zabbix windows фото. мониторинг smart hdd zabbix windows-2021 05 17 11 05 10. картинка мониторинг smart hdd zabbix windows. картинка 2021 05 17 11 05 10. Наткнулся на статью сис.админа Zerox на счёт мониторинга S.M.A.R.T. диска средствами Zabbix. Но что-то у меня никак не получалось по его записи. Поэтому я опишу свой опыт настройки это необходимой вещи.

ТраблШутинг

У меня такая проблема возникала, когда забыл ставить smartmontools

Источник

Zabbix + S.M.A.R.T.

мониторинг smart hdd zabbix windows. smartctl. мониторинг smart hdd zabbix windows фото. мониторинг smart hdd zabbix windows-smartctl. картинка мониторинг smart hdd zabbix windows. картинка smartctl. Наткнулся на статью сис.админа Zerox на счёт мониторинга S.M.A.R.T. диска средствами Zabbix. Но что-то у меня никак не получалось по его записи. Поэтому я опишу свой опыт настройки это необходимой вещи.

S.M.A.R.T.

S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system included in computer hard disk drives (HDDs), solid-state drives (SSDs), and eMMC drives

Available solutions

Also available for: 5.0

SMART by Zabbix agent 2

Overview

For Zabbix version: 5.4 and higher
The template for monitoring S.M.A.R.T. attributes of physical disk that works without any external scripts.
It collects metrics by Zabbix agent 2 version 5.0 and later with Smartmontools version 7.1 and later.
Disk discovery LLD rule finds all HDD, SSD, NVMe disks with S.M.A.R.T. enabled. Attribute discovery LLD rule finds all Vendor Specific Attributes
for each disk. If you want to skip some attributes, please set regular expressions with disk names in <$SMART.DISK.NAME.MATCHES>
and with attribute IDs in <$SMART.ATTRIBUTE.ID.MATCHES>macros on the host level.

This template was tested on:

Setup

Install the Zabbix agent 2 and Smartmontools 7.1.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

Template links

There are no template links in this template.

Discovery rules

Discovery SMART disks.

ZABBIX_PASSIVEsmart.disk.discovery

Overrides:

Discovery SMART Vendor Specific Attributes of disks.

ZABBIX_PASSIVEsmart.attribute.discovery

Overrides:

Items collected

GroupNameDescriptionTypeKey and additional info
Zabbix_raw_itemsSMART: Get attributesDEPENDENTsmart.disk.model[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.sn[<#NAME>]

Preprocessing:

The disk is passed the SMART self-test or not.

DEPENDENTsmart.disk.test[<#NAME>]

Preprocessing:

Current drive temperature.

DEPENDENTsmart.disk.temperature[<#NAME>]

Preprocessing:

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. «By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours.» On some pre-2005 drives, this raw value may advance erratically and/or «wrap around» (reset to zero periodically). https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

DEPENDENTsmart.disk.hours[<#NAME>]

Preprocessing:

Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

DEPENDENTsmart.disk.percentage_used[<#NAME>]

Preprocessing:

This field indicates critical warnings for the state of the controller.

DEPENDENTsmart.disk.critical_warning[<#NAME>]

Preprocessing:

Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.

DEPENDENTsmart.disk.media_errors[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.error[<#NAME>,<#ID>]

Preprocessing:

DEPENDENTsmart.disk.attr.raw[<#NAME>,<#ID>]

Preprocessing:

Triggers

Device serial number has changed. Ack to close.

Depends on:

— SMART [<#NAME>]: Average disk temperature is critical (over <$SMART.TEMPERATURE.MAX.CRIT>°C for 5m)

The value should be greater than THRESH.

NameDescriptionExpressionSeverityDependencies and additional info
SMART [<#NAME>]: Disk has been replaced (new serial number received)last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>],#1)<>last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>],#2) and length(last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>]))>0INFOlast(/SMART by Zabbix agent 2/smart.disk.test[<#NAME>])=»false»HIGH
SMART [<#NAME>]: Average disk temperature is too high (over <$SMART.TEMPERATURE.MAX.WARN>°C for 5m)avg(/SMART by Zabbix agent 2/smart.disk.temperature[<#NAME>],5m)>

AVERAGE
SMART [<#NAME>]: NVMe disk percentage using is over 90% of estimated endurancelast(/SMART by Zabbix agent 2/smart.disk.percentage_used[<#NAME>])>90AVERAGE
SMART [<#NAME>]: Attribute <#ID> <#ATTRNAME>is failedlast(/SMART by Zabbix agent 2/smart.disk.error[<#NAME>,<#ID>])WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

Also available for: 5.4

Template Module SMART by Zabbix agent 2

Overview

For Zabbix version: 5.0 and higher
The template for monitoring S.M.A.R.T. attributes of physical disk that works without any external scripts.
It collects metrics by Zabbix agent 2 version 5.0 and later with Smartmontools version 7.1 and later.
Disk discovery LLD rule finds all HDD, SSD, NVMe disks with S.M.A.R.T. enabled. Attribute discovery LLD rule finds all Vendor Specific Attributes
for each disk. If you want to skip some attributes, please set regular expressions with disk names in <$SMART.DISK.NAME.MATCHES>
and with attribute IDs in <$SMART.ATTRIBUTE.ID.MATCHES>macros on the host level.

This template was tested on:

Setup

Install the Zabbix agent 2 and Smartmontools 7.1.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

Template links

There are no template links in this template.

Discovery rules

Discovery SMART disks.

ZABBIX_PASSIVEsmart.disk.discovery

Overrides:

Discovery SMART Vendor Specific Attributes of disks.

ZABBIX_PASSIVEsmart.attribute.discovery

Overrides:

Items collected

GroupNameDescriptionTypeKey and additional info
Zabbix_raw_itemsSMART: Get attributesDEPENDENTsmart.disk.model[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.sn[<#NAME>]

Preprocessing:

The disk is passed the SMART self-test or not.

DEPENDENTsmart.disk.test[<#NAME>]

Preprocessing:

Current drive temperature.

DEPENDENTsmart.disk.temperature[<#NAME>]

Preprocessing:

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. «By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours.» On some pre-2005 drives, this raw value may advance erratically and/or «wrap around» (reset to zero periodically). https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

DEPENDENTsmart.disk.hours[<#NAME>]

Preprocessing:

Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

DEPENDENTsmart.disk.percentage_used[<#NAME>]

Preprocessing:

This field indicates critical warnings for the state of the controller.

DEPENDENTsmart.disk.critical_warning[<#NAME>]

Preprocessing:

Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.

DEPENDENTsmart.disk.media_errors[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.error[<#NAME>,<#ID>]

Preprocessing:

DEPENDENTsmart.disk.attr.raw[<#NAME>,<#ID>]

Preprocessing:

Triggers

Device serial number has changed. Ack to close.

Depends on:

— SMART [<#NAME>]: Average disk temperature is critical (over <$SMART.TEMPERATURE.MAX.CRIT>°C for 5m)

The value should be greater than THRESH.

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

Also available for: 5.0

SMART by Zabbix agent 2

Overview

For Zabbix version: 5.4 and higher
The template for monitoring S.M.A.R.T. attributes of physical disk that works without any external scripts.
It collects metrics by Zabbix agent 2 version 5.0 and later with Smartmontools version 7.1 and later.
Disk discovery LLD rule finds all HDD, SSD, NVMe disks with S.M.A.R.T. enabled. Attribute discovery LLD rule finds all Vendor Specific Attributes
for each disk. If you want to skip some attributes, please set regular expressions with disk names in <$SMART.DISK.NAME.MATCHES>
and with attribute IDs in <$SMART.ATTRIBUTE.ID.MATCHES>macros on the host level.

This template was tested on:

Setup

Install the Zabbix agent 2 and Smartmontools 7.1.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

Template links

There are no template links in this template.

Discovery rules

Discovery SMART disks.

NameDescriptionExpressionSeverityDependencies and additional info
SMART [<#NAME>]: Disk has been replaced (new serial number received)].last()>=»false»HIGH
SMART [<#NAME>]: Average disk temperature is too high (over <$SMART.TEMPERATURE.MAX.WARN>°C for 5m)].avg(5m)>>

AVERAGE
SMART [<#NAME>]: NVMe disk percentage using is over 90% of estimated enduranceZABBIX_PASSIVEsmart.disk.discovery

Overrides:

Discovery SMART Vendor Specific Attributes of disks.

ZABBIX_PASSIVEsmart.attribute.discovery

Overrides:

Items collected

GroupNameDescriptionTypeKey and additional info
Zabbix_raw_itemsSMART: Get attributesDEPENDENTsmart.disk.model[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.sn[<#NAME>]

Preprocessing:

The disk is passed the SMART self-test or not.

DEPENDENTsmart.disk.test[<#NAME>]

Preprocessing:

Current drive temperature.

DEPENDENTsmart.disk.temperature[<#NAME>]

Preprocessing:

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. «By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours.» On some pre-2005 drives, this raw value may advance erratically and/or «wrap around» (reset to zero periodically). https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

DEPENDENTsmart.disk.hours[<#NAME>]

Preprocessing:

Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

DEPENDENTsmart.disk.percentage_used[<#NAME>]

Preprocessing:

This field indicates critical warnings for the state of the controller.

DEPENDENTsmart.disk.critical_warning[<#NAME>]

Preprocessing:

Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.

DEPENDENTsmart.disk.media_errors[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.error[<#NAME>,<#ID>]

Preprocessing:

DEPENDENTsmart.disk.attr.raw[<#NAME>,<#ID>]

Preprocessing:

Triggers

Device serial number has changed. Ack to close.

Depends on:

— SMART [<#NAME>]: Average disk temperature is critical (over <$SMART.TEMPERATURE.MAX.CRIT>°C for 5m)

The value should be greater than THRESH.

NameDescriptionExpressionSeverityDependencies and additional info
SMART [<#NAME>]: Disk has been replaced (new serial number received)last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>],#1)<>last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>],#2) and length(last(/SMART by Zabbix agent 2/smart.disk.sn[<#NAME>]))>0INFOlast(/SMART by Zabbix agent 2/smart.disk.test[<#NAME>])=»false»HIGH
SMART [<#NAME>]: Average disk temperature is too high (over <$SMART.TEMPERATURE.MAX.WARN>°C for 5m)avg(/SMART by Zabbix agent 2/smart.disk.temperature[<#NAME>],5m)>

AVERAGE
SMART [<#NAME>]: NVMe disk percentage using is over 90% of estimated endurancelast(/SMART by Zabbix agent 2/smart.disk.percentage_used[<#NAME>])>90AVERAGE
SMART [<#NAME>]: Attribute <#ID> <#ATTRNAME>is failedlast(/SMART by Zabbix agent 2/smart.disk.error[<#NAME>,<#ID>])WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

Also available for: 5.4

Template Module SMART by Zabbix agent 2

Overview

For Zabbix version: 5.0 and higher
The template for monitoring S.M.A.R.T. attributes of physical disk that works without any external scripts.
It collects metrics by Zabbix agent 2 version 5.0 and later with Smartmontools version 7.1 and later.
Disk discovery LLD rule finds all HDD, SSD, NVMe disks with S.M.A.R.T. enabled. Attribute discovery LLD rule finds all Vendor Specific Attributes
for each disk. If you want to skip some attributes, please set regular expressions with disk names in <$SMART.DISK.NAME.MATCHES>
and with attribute IDs in <$SMART.ATTRIBUTE.ID.MATCHES>macros on the host level.

This template was tested on:

Setup

Install the Zabbix agent 2 and Smartmontools 7.1.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used in overrides of attribute discovery for filtering IDs. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

This macro is used for trigger expression. It can be overridden on the host or linked template level.

Template links

There are no template links in this template.

Discovery rules

Discovery SMART disks.

ZABBIX_PASSIVEsmart.disk.discovery

Overrides:

Discovery SMART Vendor Specific Attributes of disks.

ZABBIX_PASSIVEsmart.attribute.discovery

Overrides:

Items collected

GroupNameDescriptionTypeKey and additional info
Zabbix_raw_itemsSMART: Get attributesDEPENDENTsmart.disk.model[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.sn[<#NAME>]

Preprocessing:

The disk is passed the SMART self-test or not.

DEPENDENTsmart.disk.test[<#NAME>]

Preprocessing:

Current drive temperature.

DEPENDENTsmart.disk.temperature[<#NAME>]

Preprocessing:

Count of hours in power-on state. The raw value of this attribute shows total count of hours (or minutes, or seconds, depending on manufacturer) in power-on state. «By default, the total expected lifetime of a hard disk in perfect condition is defined as 5 years (running every day and night on all days). This is equal to 1825 days in 24/7 mode or 43800 hours.» On some pre-2005 drives, this raw value may advance erratically and/or «wrap around» (reset to zero periodically). https://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

DEPENDENTsmart.disk.hours[<#NAME>]

Preprocessing:

Contains a vendor specific estimate of the percentage of NVM subsystem life used based on the actual usage and the manufacturer’s prediction of NVM life. A value of 100 indicates that the estimated endurance of the NVM in the NVM subsystem has been consumed, but may not indicate an NVM subsystem failure. The value is allowed to exceed 100. Percentages greater than 254 shall be represented as 255. This value shall be updated once per power-on hour (when the controller is not in a sleep state).

DEPENDENTsmart.disk.percentage_used[<#NAME>]

Preprocessing:

This field indicates critical warnings for the state of the controller.

DEPENDENTsmart.disk.critical_warning[<#NAME>]

Preprocessing:

Contains the number of occurrences where the controller detected an unrecovered data integrity error. Errors such as uncorrectable ECC, CRC checksum failure, or LBA tag mismatch are included in this field.

DEPENDENTsmart.disk.media_errors[<#NAME>]

Preprocessing:

DEPENDENTsmart.disk.error[<#NAME>,<#ID>]

Preprocessing:

DEPENDENTsmart.disk.attr.raw[<#NAME>,<#ID>]

Preprocessing:

Triggers

Device serial number has changed. Ack to close.

Depends on:

— SMART [<#NAME>]: Average disk temperature is critical (over <$SMART.TEMPERATURE.MAX.CRIT>°C for 5m)

The value should be greater than THRESH.

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

Источник

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *

NameDescriptionExpressionSeverityDependencies and additional info
SMART [<#NAME>]: Disk has been replaced (new serial number received)].last()>=»false»HIGH
SMART [<#NAME>]: Average disk temperature is too high (over <$SMART.TEMPERATURE.MAX.WARN>°C for 5m)].avg(5m)>>

AVERAGE
SMART [<#NAME>]: NVMe disk percentage using is over 90% of estimated endurance