はじめに
Ceph クラスタの監視
- 備考
Rook での準備
- Ceph Cluster の monitoring を有効化する
- Prometheus 用リソースの作成
Prometheus での準備
Grafana での準備
ダッシュボードの表示
おわりに

はじめに

この記事は，Rookと仲間たち、クラウドネイティブなストレージの Advent Calendar 2020 20日目の記事です．

元々参加する予定はなかったのですが，悪い大人に見つかってしまったのと，昨年の Rook アドベントカレンダーには読者として大変お世話になったこともあり，参加することになりました．

今年も来た！Rookといえば、 @AokiTenzen さんや @takutaka1220 さん、 @zaki_hmkc さんもいるし、今年は楽しくなりそうだｗ https://t.co/Euay7WUvCq
— こば（右）- Koba as a DB engineer (@tzkb) 2020年11月4日

本記事では，以前 DCGM Exporter を用いた Kubernetes における NVIDIA GPU 監視環境の構築の記事で Kubernetes 上に構築した Prometheus Grafana 環境に Rook Ceph の監視環境を構築していきます．Kubernetes 上への Prometheus や Grafana の構築はそちらを参照してください．

Ceph クラスタの監視

Ceph にはダッシュボードが用意されており，もちろん Rook Ceph でも使用することができます．

github.com

しかしながら既に他コンポーネントの監視を Prometheus + Grafana で行なっている場合，Ceph クラスタも Prometheus + Grafana で監視環境を構築できれば非常に便利です． Rook のドキュメントには monitoring 用のドキュメントが用意されており，そこで構築方法などが公開されています．

github.com

今回はこのドキュメントを参考に監視環境を構築してみます．なお，今回は Alert Manager を使った通知などを行いません．

備考

Rook の公式ドキュメントの prometheus-monitoring では，既存の Prometheus を使用する場合はannotations を使って scrape 対象とすることができると記述があります．

If your cluster already contains a Prometheus instance, it will automatically discover Rooks scrape endpoint using the standard prometheus.io/scrape and prometheus.io/port annotations.

しかしながら，prometheus operator のドキュメントの prometheusioscrape では，以下のように annotation を用いた scrape はサポートされておらず，PodMonitor や ServiceMonitor を使用するように案内されています．

The prometheus operator does not support annotation-based discovery of services, using the PodMonitor or ServiceMonitor CRD in its place as they provide far more configuration options.

そこで，今回は Prometheus Operator のドキュメントにしたがって構築していきます．

Pprometheus.io/scrape Annotation に関する問題は@kameneko1004 氏の以下のブログが参考になるので，興味のある人は参照すると良いでしょう．

prometheus.io/scrape などのAnnotations について調べてみた | by kameneko | penguin-lab | Medium

Rook での準備

Rook 側で行うことは以下の4点のみです．

CephCluster カスタムリソースで monitoring を有効にする
監視用 RBAC ルールを作成する
rook-ceph-mgr 用の ServiceMonitor カスタムリソースをデプロイする
csi 用の ServiceMonitoring カスタムリソースをデプロイする

Ceph Cluster の monitoring を有効化する

以下の部分を false から true へ変更するのみで完了です．

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
...(省略)
  # enable prometheus alerting for cluster
  monitoring:
-    enabled: false
+    enabled: true
    rulesNamespace: rook-ceph
  network:

K8s クラスタへ適用します．

$ kubectl apply -f cluster.yaml

Prometheus 用リソースの作成

2, 3, 4 は ceph/monitoring ディレクトリにマニフェストが用意されているのでこれらを使用していきます．

$ # RBAC ルールの作成
$ kubectl apply -f ceph/monitoring/rbac.yaml
$ # rook-ceph-mgr 用 ServiceMonitoring カスタムリソースの作成
$ kubectl apply -f ceph/monitoring/csi-metrics-service-monitor.yaml
$ # csi 用 ServiceMonitoring カスタムリソースの作成
$ kubectl apply -f ceph/monitoring/service-monitor.yaml

Prometheus での準備

前回の Prometheus 環境構築の記事にも記載しましたが，以下のように全ての K8s namespace 内の ServiceMonitor および PodMonitor を検出できるようにするため，values.yaml にserviceMonitorSelectorNilUsesHelmValues: false と podMonitorSelectorNilUsesHelmValues: false を設定します．

By default, Prometheus discovers PodMonitors and ServiceMonitors within its namespace, that are labeled with the same release tag as the prometheus-operator release. Sometimes, you may need to discover custom PodMonitors/ServiceMonitors, for example used to scrape data from third-party applications. An easy way of doing this, without compromising the default PodMonitors/ServiceMonitors discovery, is allowing Prometheus to discover all PodMonitors/ServiceMonitors within its namespace, without applying label filtering. To do so, you can set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues and prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues to false.

github.com

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

Grafana での準備

Grafana 用 Ceph ダッシュボードはいくつか提供されていますが，Rook Ceph では以下 3 つのダッシュボードが対応しています．

github.com

Grafana のダッシュボードの import は GUI で行う方法がありますが，今回はこれら対応済みダッシュボードをマニフェストに組み込んで Grafana が起動したら自動で import されるように設定します．

以下のドキュメントにあるように，Grafana Labs で公開されているダッシュボードは ID や revision を指定するだけで簡単に import することができます．今回は，rook-ceph ディレクトリを作成して 3つのダッシュボードをそこへ import します．

github.com

DCGM Exporter を用いた Kubernetes における NVIDIA GPU 監視環境の構築で作成した Grafana 用の values.yaml を以下のように変更を加えます．

grafana:
...(省略)
+  dashboardProviders:
+   dashboardproviders.yaml:
+     apiVersion: 1
+     providers:
+     - name: 'rook-ceph'
+       orgId: 1
+       folder: 'rook-ceph'
+       type: file
+       disableDeletion: false
+       editable: true
+       options:
+         path: /var/lib/grafana/dashboards/rook-ceph
+  dashboards:
+    rook-ceph:
+      ceph-cluster:
+        gnetId: 2842
+        revision: 14
+        datasource: default
+      ceph-osd-single:
+        gnetId: 5336
+        revision: 5
+        datasource: default
+      ceph-pools:
+        gnetId: 5342
+        revision: 5
+        datasource: default
...(省略)

ダッシュボードの表示

ここまでできたら，kube-prometheus-stack Helm Chart を Upgrade します．

$ helm upgrade kube-prometheus prometheus-community/kube-prometheus-stack --values values.yaml

以下のようにCeph のダッシュボードは /rook-ceph に import されていることがわかります．

ダッシュボードを選択すると，正しく情報の取得と表示が行えていることがわかります．

grafana-rook-cephcluster-dashboard — CephCluster ダッシュボード

grafana-rook-cephosd-dashboard — Ceph OSD ダッシュボード

おわりに

Prometheus + Grafana で Rook Ceph Cluster を監視する際の参考になれば幸いです．また，記事公開時間がアドベントカレンダーの担当日に間に合ってよかったです．

tenzenの生存日誌

Prometheus + Grafana を用いた Rook Ceph クラスタの監視

はじめに

Ceph クラスタの監視

備考

Rook での準備

Ceph Cluster の monitoring を有効化する

Prometheus 用リソースの作成

Prometheus での準備

Grafana での準備

ダッシュボードの表示

おわりに

はじめに

Ceph クラスタの監視

備考

Rook での準備

Ceph Cluster の monitoring を有効化する

Prometheus 用 リソースの作成

Prometheus での準備

Grafana での準備

ダッシュボードの表示

おわりに

Prometheus 用リソースの作成