接外包,有相关需求的可以联系我:Telegram | Email

Prometheus和Grafana完整教程

该文章创建(更新)于09/12/2021,请注意文章的时效性!

最近公司开始接触这两个东西,加上看到了一张告警框架的区域分布图。发现还是挺有意思的,亚洲基本都喜欢搞Zabbix这一套系统,而欧美等国家用Prometheus比较多。之前尝试搞过,没太懂,现在了解了基本怎么搞。比较难的是自己去写语句来搞监控,zabbix会shell即可,这个目前理解都是一些接口查询语句,自定义也能开发,把值传递给接口即可。目前使用下来感觉就个人少量服务器告警还是尝试用一下NETDATA,我这搞了一下,服务器(2GB /1 core)带不动。

概览

  • 我都是用docker搞得,都说说每个组件都是干啥的吧?
组件 作用 监控端(需要监控的主机) 展示端(数据展示) 补充说明
Node Exporter 收集Host硬件和操作系统信息 YES NO 主机信息
cAdvisor 负责收集Host上运行的容器信息 YES NO docker 信息采集
Prometheus Server 普罗米修斯监控主服务器 NO NO 收集上面两个组件的数据并存储提供给Grafana来采集,随便安装到哪个机器上都行。
Grafana 展示普罗米修斯监控界面 NO YES 把数据可视化出来
Alertmanager 告警发送 非必须 NO 可在Grafana配置,比Grafana好一些
Pushgetway 自定义告警 自定义需要 No 自定义
  • 注意一点就是各个组件的关系、对应端口以及配置(注意容器中的localhost不能访问容器外的信息)。

Node Exporter

  • 安装
docker run -d -p 90:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
--name=node-exporter \
prom/node-exporter

cAdvisor

  • 安装
docker run -d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=80:8080 \
--detach=true \
--name=cadvisor \
-v "/etc/localtime:/etc/localtime" \
google/cadvisor:latest

Prometheus Server

  • prometheus 配置文件
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).


# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093


# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  # - /etc/prometheus/alert_rules.yml
  - /etc/prometheus/alert_rules.yml

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'


    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.


    static_configs:
    #监听的地址
    - targets: ['localhost:80','localhost:90']

  - job_name: 'mail-base'
    static_configs:
    - targets: ['xxx.xxx.xxx.xxx:80','xxx.xxx.xxx.xxx:90']

  - job_name: 'mail-docker'
    static_configs:
    - targets: ['xxx.xxx.xxx.xxx:80','xxx.xxx.xxx.xxx:90']
  • 告警配置文件
groups:
- name: ali
  rules:


  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."


  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
  • 安装
docker run  -d \
-p 9090:9090 \
-v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml  \
-v /etc/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml \
--name prometheus \
prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--web.enable-lifecycle

Grafana

  • 建立文件夹并授权(没有授权启动不了)
mkdir /etc/grafana
chmod 777 /etc/grafana
  • 安装
docker run -d \
-p 3000:3000 \
--name=grafana \
-v /etc/grafana:/var/lib/grafana \
grafana/grafana

Alertmanager

  • 配置文件
global:
  resolve_timeout: 5m
  smtp_smarthost: 'xxxxxx.emperinter.info:465'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'xxxxxxxxxxx^'
  smtp_require_tls: false

route:
  receiver: team-test-mails
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 2m


receivers:
  - name: 'team-test-mails'
    email_configs:
    - to: '[email protected]'
      send_resolved: true
  • 安装
docker run -d -p 59093:9093 --name Alertmanager -v /etc/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml docker.io/prom/alertmanager:latest

telegram 告警

需要安装Alertmanager,注意搞好后Bot启动一下,命令是/start

docker run -d \
    -e 'ALERTMANAGER_URL=http://xxx.xxx.xxx.xxx:59093' \
    -e 'BOLT_PATH=/data/bot.db' \
    -e 'STORE=bolt' \
    -e 'TELEGRAM_ADMIN=1234567' \
    -e 'TELEGRAM_TOKEN=XXX' \
    -v '/srv/monitoring/alertmanager-bot:/data' \
    --name alertmanager-bot \
    metalmatze/alertmanager-bot:0.4.3

Pushgetway 自定义告警

用于自定义告警监控项目;

安装

docker run -d --name pushgateway -p 59091:9091 --restart=always prom/pushgateway
  • 安装后注意去配置Permetheus的文件并重启;
  - job_name: 'pushgateway'
    static_configs:
    - targets: ['xxx.xxx.xxx.xxx:59091']
    honor_labels: true        #作用:如果没有设置instance标签,Prometheus服务器也会附加标签,否则instance标签值会为空

自定义方法

  • 常见shell用法,用docker_runtime即可查询到该数据
cat <<EOF | curl --data-binary @- http://127.0.0.1:59091/metrics/job/docker_runtime/instance/xa-lsr-billubuntu
    # TYPE docker_runtime counter
    docker_runtime{log="aa bb cc cadvisor"} 33
    docker_runtime{log="nginx"} 331
    docker_runtime{log="abc"} 332
EOF


  • python方法
#!/usr/bin/python3
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
registry = CollectorRegistry()
g = Gauge('ping', '检测最大响应时间',['dst_ip','city'], registry=registry) #Guage(metric_name,HELP,labels_name,registry=registry)
g.labels('192.168.1.10','shenzhen').set(42.2) #set设定值
g.labels('192.168.1.11','shenzhen').dec(2)  #dec递减2
g.labels('192.168.1.12','shenzhen').inc()  #inc递增,默认增1
push_to_gateway('localhost:59091', job='ping_status', registry=registry)

参考



👇 Share | 分享 👇


要不赞赏一下?

微信
支付宝
PayPal
Bitcoin

版权声明 | Copyright

除非特别说明,本博客所有作品均采用知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议进行许可。转载请注明转自-
https://www.emperinter.info/2021/09/12/prometheus-and-grafana/


要不聊聊?

我相信你准备留下的内容是经过思考的!【勾选防爬虫,未勾选无法留言】

*

*



微信公众号

👉 NewsLetter ❤️ 邮箱订阅 👈

优惠码


阿里云国际版20美元
Vultr10美元
搬瓦工 | Bandwagon应该有折扣吧?
域名 | namesiloemperinter(1美元)