Administrator
发布于 2025-10-10 / 5 阅读
0
0

victorialogs存储和查询日志

从 ELK 到 Loki,再到 VictoriaLogs 的日志管理演进

在日志管理领域,技术栈的选择直接影响系统的性能、资源消耗以及运维的复杂度。从 ELK(Elasticsearch, Logstash, Kibana)到 Grafana Loki,再到 VictoriaMetrics 提供的 VictoriaLogs,这一演进过程反映了对更高效、更轻量化解决方案的追求。

ELK 的局限性

ELK 是早期日志管理的经典组合,但由于其架构的复杂性和资源消耗问题,逐渐被许多团队弃用。具体来说:

  • 资源消耗过高:Elasticsearch 对内存和磁盘的需求较高,尤其是在大规模日志场景下,容易导致硬件成本激增。

  • 生态问题:尽管 ELK 生态丰富,但其组件之间的耦合度较高,配置和维护复杂,学习曲线陡峭。

  • 扩展性不足:随着日志量的增长,ELK 的性能瓶颈愈发明显,尤其是在查询和存储方面。

Loki 的优势与不足

Grafana Loki 被视为 ELK 的替代方案之一,主要因其轻量化的设计和高效的日志处理能力而受到青睐:

  • 相对高效:Loki 采用无索引设计,仅对日志的元数据进行索引,大幅减少了存储开销。

  • 与 Grafana 深度集成:Loki 与 Grafana 的无缝集成,使得日志可视化更加便捷。
    然而,Loki 在资源消耗上仍然无法满足极致优化的需求,尤其是在超大规模日志场景下,其性能表现仍显不足。

VictoriaLogs 的卓越表现

VictoriaLogs 是 VictoriaMetrics 推出的日志管理解决方案,以其极致的资源利用效率和卓越的查询性能脱颖而出:

  • 资源消耗极低:与 Elasticsearch 和 Loki 相比,VictoriaLogs 的 RAM 消耗减少了 30 倍,磁盘空间减少了 15 倍。这种高效的资源利用率使其成为大规模日志场景的理想选择。

  • 查询性能优异:根据 ClickBench 基准测试,在针对十亿级别 JSON 文档的分析查询中,VictoriaLogs 的查询速度平均比 Elasticsearch 快 9.3 倍。这得益于其创新的存储和查询引擎设计。

  • 轻量级架构:VictoriaLogs 的架构设计简洁,部署和维护成本低,适合现代云原生环境。

  • 单机架构
    更多关于 VictoriaLogs 的详细信息,请参考官方文档:VictoriaLogs 官方文档


部署实践:以 Kubernetes 中 Java 日志收集为例

以下内容将介绍如何在 Kubernetes 环境中部署 VictoriaLogs,使用vector收集 Java 应用的日志。其他语言或框架的日志收集可参考此方法进行适当调整。

1. 设置日志格式

为了便于提取日志级别和多行异常日志的收集,建议统一 Java 服务的日志输出格式。推荐的格式为:

日期 时间 级别 消息内容

示例日志:

2025-09-12 20:52:53.004 DEBUG 7 --- [   scheduling-1] c.m.p.mapper.AuthLogMapper.selectList    : ==> Parameters: ongoing(String)
2025-09-12 20:58:40.006  INFO 7 --- [   scheduling-1] com.mirayai.platform.task.AuthLoginTask  : Found 0 ongoing auth logs to process
2025-09-12 21:00:04.462  WARN 7 --- [   scheduling-1] o.a.pdfbox.pdmodel.font.PDType1Font      : Using fallback font LiberationSans for base font Symbol
2025-09-12 21:00:08.341 ERROR 7 --- [   scheduling-1] cn.creekmoon.pdf.core.HtmlToPdfUtils     : 字体文件不存在, fontFamily:cjk, 系统尝试使用默认字体

此格式的优势在于:

  • 清晰易读:时间戳、日志级别和消息内容一目了然。

  • 便于解析:结构化日志方便后续的正则表达式匹配和日志分析。

2. 统一容器标签

为了便于区分不同语言或框架的服务日志,建议为每个 Pod 添加统一的标签。例如:

  • Java 服务:app/component: java

  • Go 服务:app/component: go
    如果后续新增其他语言的服务,且其日志格式与现有格式不一致,则需要重新定义标签规则。例如:

metadata:
  labels:
    app/component: java

3. VictoriaLogs 部署

以下是 VictoriaLogs 的基本部署步骤,重点参数说明如下:

  • retentionPeriod:日志保留时间,单位为天。例如,设置为 5d 表示日志保留 5 天。

  • 资源配置:根据实际日志量调整 CPU 和内存分配。
    示例部署 YAML 文件:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: '1'
    field.cattle.io/publicEndpoints: >-
      [{"port":30331,"protocol":"TCP","serviceName":"kubedoor:victorialogs","allNodes":true},{"addresses":["10.102.82.251"],"port":80,"protocol":"HTTP","serviceName":"kubedoor:victorialogs","ingressName":"kubedoor:kubedoor","hostname":"kubedoor.mirayai.com","path":"/select/vmui/","allNodes":false},{"addresses":["10.102.82.251"],"port":80,"protocol":"HTTP","serviceName":"kubedoor:victorialogs","ingressName":"kubedoor:kubedoor","hostname":"kubedoor.mirayai.com","path":"/select/","allNodes":false}]
    meta.helm.sh/release-name: kubedoor
    meta.helm.sh/release-namespace: kubedoor
  labels:
    app: victorialogs
    app.kubernetes.io/managed-by: Helm
  name: victorialogs
  namespace: kubedoor
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      app: victorialogs
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: victorialogs
    spec:
      containers:
        - args:
            - '--storageDataPath=/victoria-logs-data'
            - '--httpListenAddr=:9428'
            - '--retentionPeriod=5d'
            - '--loggerFormat=json'
            - '--loggerOutput=stderr'
          env:
            - name: TZ
              value: Asia/Shanghai
          image: >-
            registry.cn-hangzhou.aliyuncs.com/rhub/victorialogs:v1.23.3-victorialogs-scratch
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /metrics
              port: 9428
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 3
            successThreshold: 1
            timeoutSeconds: 5
          name: victorialogs
          ports:
            - containerPort: 9428
              name: http
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /metrics
              port: 9428
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 3
            successThreshold: 2
            timeoutSeconds: 5
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi
          volumeMounts:
            - mountPath: /victoria-logs-data
              name: victoria-logs-data
      restartPolicy: Always
      volumes:
        - name: victoria-logs-data
          persistentVolumeClaim:
            claimName: victorialogs-pvc
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    field.cattle.io/publicEndpoints: >-
      [{"port":30331,"protocol":"TCP","serviceName":"kubedoor:victorialogs","allNodes":true}]
    meta.helm.sh/release-name: kubedoor
    meta.helm.sh/release-namespace: kubedoor
  labels:
    app: victorialogs
    app.kubernetes.io/managed-by: Helm
  name: victorialogs
  namespace: kubedoor
spec:
  ports:
    - name: http
      nodePort: 30331
      port: 9428
      protocol: TCP
      targetPort: 9428
  selector:
    app: victorialogs
  type: NodePort

4. Vector部署

vector会根据pod的标签进行日志设置不同的lable然后推送至VictoriaLogs。

示例部署 YAML 文件:

---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vector
  namespace: kubedoor
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: vector
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - nodes  
      - pods
    verbs: ["get", "list", "watch"]
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: vector
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: vector
subjects:
  - kind: ServiceAccount
    name: vector
    namespace: kubedoor
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: vector
  namespace: kubedoor
  labels:
    app: vector
spec:
  selector:
    matchLabels:
      app: vector
  template:
    metadata:
      labels:
        app: vector
    spec:
      serviceAccountName: vector
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: vector
        image: registry.cn-hangzhou.aliyuncs.com/rhub/timberio.vector:0.49.X
        imagePullPolicy: Always
        env:
        - name: VECTOR_SELF_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: VECTOR_LOG
          value: "warn"
        - name: TZ
          value: "Asia/Shanghai"
        - name: PROCFS_ROOT
          value: "/host/proc"
        - name: SYSFS_ROOT
          value: "/host/sys"
        ports:
        - name: api
          containerPort: 8686
        resources:
          requests:
            memory: "128Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "1000m"
        volumeMounts:
        - name: config
          mountPath: /etc/vector
        - name: data
          mountPath: /var/lib/vector
        - name: var-log
          mountPath: /var/log
          readOnly: true
        - name: run-containerd
          mountPath: /run/containerd/containerd.sock
          readOnly: true
        - name: var-lib-containerd
          mountPath: /var/lib/containerd
          readOnly: true
        - name: proc
          mountPath: /host/proc
          readOnly: true
        - name: sys
          mountPath: /host/sys
          readOnly: true
        livenessProbe:
          httpGet:
            path: /health
            port: api
          initialDelaySeconds: 30
          periodSeconds: 30
      initContainers:
      - command:
        - sh
        - -c
        - "until nc -z victorialogs.kubedoor 9428; do echo waiting for basic service `date`; sleep 3; done;"
        image: registry.cn-hangzhou.aliyuncs.com/rhub/busybox:1.36
        imagePullPolicy: IfNotPresent
        name: wait-basic
      volumes:
      - name: config
        configMap:
          name: vector-config
      - name: data
        hostPath:
          path: /var/lib/vector
          type: DirectoryOrCreate
      - name: var-log
        hostPath:
          path: /var/log
      - name: run-containerd
        hostPath:
          path: /run/containerd/containerd.sock
          type: Socket
      - name: var-lib-containerd
        hostPath:
          path: /var/lib/containerd
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys

vector相关配置例1

收集标签为app/component=javaapp.kubernetes.io/instance=ingress-nginx的pod stdout日志,然后推送到victorialogs服务上。

apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-config
  namespace: kubedoor
data:
  vector.yaml: |
    # 数据目录
    data_dir: "/var/lib/vector"
    
    # API
    api:
      enabled: true
      address: "0.0.0.0:8686"
    
    # 使用官方推荐的kubernetes_logs源
    sources:
      java_k8s_logs:
        type: "kubernetes_logs"
        # 官方推荐:自动处理多行日志合并
        auto_partial_merge: true
        # 只收集Java应用,通过标签过滤
        extra_label_selector: "app/component=java"
      ng_k8s_access:
        type: "kubernetes_logs"
        # 只收集ingress应用,通过标签过滤
        extra_label_selector: "app.kubernetes.io/instance=ingress-nginx"
    
    # 1、只过滤健康检查
    transforms:
      java_filter_logs:
        type: filter
        inputs:
          - java_k8s_logs
        condition: |
          # 只排除明显的健康检查
          !contains(string!(.message), "/actuator/health") && 
          !contains(string!(.message), "/health")
      # 2、Java多行日志增强处理
      java_merge_multiline_logs:
        type: reduce
        inputs: [java_filter_logs]
        group_by:
          - pod_name
          - container_name
        merge_strategies:
          message: concat_newline
        starts_when:
          type: vrl
          source: |
            match(string!(.message), r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[.,]\d{3}')
        expire_after_ms: 5000
        flush_period_ms: 1000
      
      # 3、添加VictoriaLogs所需的_msg字段
      java_add_msg_field:
        type: remap
        inputs: [java_merge_multiline_logs]
        source: |
          # 保留message字段并重命名为_msg
          ._msg = .message

          # 用正则匹配日志级别
          match, err = parse_regex(._msg, r'^(?P<ts>\S+\s+\S+)\s+(?P<level>[A-Z]+)\s+.*')
          if err == null {
            .level = match.level
          } else {
            .level = "unknown"
          }

          # 创建新对象只包含需要的字段
          .log_type  = "java"
          . = {
            "_msg": ._msg,
            "level": .level,
            "log_type": .log_type,
            "kubernetes": {
              "pod_namespace": .kubernetes.pod_namespace,
              "pod_name": .kubernetes.pod_name,
              "container_name": .kubernetes.container_name,
              "pod_node_name": .kubernetes.pod_node_name
            }
          }
          .namespace = .kubernetes.pod_namespace
          del(.kubernetes.pod_namespace)
          .pod = .kubernetes.pod_name
          del(.kubernetes.pod_name)
          .service = .kubernetes.container_name
          del(.kubernetes.container_name)
          .node = .kubernetes.pod_node_name
          del(.kubernetes.pod_node_name)

      nginx_format_access:
        drop_on_error: true
        reroute_dropped: true
        type: remap
        inputs:
          - ng_k8s_access
        source: |
          ._msg = .message
          # 解析JSON到临时变量,避免覆盖整个事件
          parsed_json = parse_json!(replace(.message, r'([^\x00-\x7F])', "\\\\$$1") ?? .message)
          if exists(parsed_json.message) {
            parsed_json = parse_json!(replace(parsed_json.message, "\\x", "\\\\x") ?? parsed_json.message)
          }
          # 将解析的字段合并到当前事件,保留_msg
          . = merge!(., parsed_json)
          .createdtime = to_unix_timestamp(now(), unit: "milliseconds")
          .timestamp = to_unix_timestamp(parse_timestamp!(.timestamp , format: "%+"), unit: "milliseconds")
          .url_list = split!(.url, "?", 2)
          .path = .url_list[0]
          .query = .url_list[1]
          .path_list = split!(.path, "/", 3)
          if length(.path_list) > 2 {.top_path = join!(["/", .path_list[1]])} else {.top_path = "/"}
          .upstreamtime = to_float(.upstreamtime) ?? 0
          .duration = round((to_float(.responsetime) ?? 0) - to_float(.upstreamtime),3)
          if .xff == "-" { .xff = .remote_ip }
          .client_ip = split!(.xff, ",", 2)[0]
          .ua = parse_user_agent!(.http_user_agent , mode: "enriched")
          .client_browser_family = .ua.browser.family
          .client_browser_major = .ua.browser.major
          .client_os_family = .ua.os.family
          .client_os_major = .ua.os.major
          .client_device_brand = .ua.device.brand
          .client_device_model = .ua.device.model
          
          # GeoIP 查询必须传字符串表名
          .geoip = get_enrichment_table_record("geoip_table", {"ip": .client_ip}) ?? {"city_name":"unknown","region_name":"unknown","country_name":"unknown"}
          .client_city = .geoip.city_name
          .client_region = .geoip.region_name
          .client_country = .geoip.country_name
          .client_latitude = .geoip.latitude
          .client_longitude = .geoip.longitude
          # 只保留nginx字段
          .log_type  = "nginx"
          . = {
            "_msg": ._msg,
            "log_type": .log_type,
            "timestamp": .timestamp,
            "createdtime": .createdtime,
            "server_ip": .server_ip,
            "remote_ip": .remote_ip,
            "xff": .xff,
            "remote_user": .remote_user,
            "domain": .domain,
            "url": .url,
            "path": .path,
            "query": .query,
            "top_path": .top_path,
            "referer": .referer,
            "upstreamtime": .upstreamtime,
            "responsetime": .responsetime,
            "duration": .duration,
            "request_method": .request_method,
            "status": .status,
            "response_length": .response_length,
            "request_length": .request_length,
            "protocol": .protocol,
            "upstreamhost": .upstreamhost,
            "http_user_agent": .http_user_agent,
            "client_ip": .client_ip,
            "client_browser_family": .client_browser_family,
            "client_browser_major": .client_browser_major,
            "client_os_family": .client_os_family,
            "client_os_major": .client_os_major,
            "client_device_brand": .client_device_brand,
            "client_device_model": .client_device_model,
            "client_city": .client_city,
            "client_region": .client_region,
            "client_country": .client_country,
            "client_latitude": .client_latitude,
            "client_longitude": .client_longitude
          }
    
    # 输出
    sinks:
      # VictoriaLogs
      victorialogs:
        type: loki
        inputs: 
          - java_add_msg_field
          - nginx_format_access
        endpoint: "http://victorialogs.kubedoor:9428"
        path: /insert/loki/api/v1/push
        labels:
          origin_prometheus: dev-k8s
          env: dev-k8s
        encoding:
          codec: json
        # 添加批处理和缓冲配置
        batch:
          max_bytes: 102400
          timeout_secs: 5
        buffer:
          type: memory
          max_events: 1000
          when_full: drop_newest
        # 禁用健康检查避免400错误
        healthcheck: false
        request:
          timeout_secs: 30
      dropped_console:
        type: console
        inputs: ["nginx_format_access.dropped"]
        encoding:
          codec: json
    enrichment_tables:
      geoip_table:
        path: "/usr/share/GeoIP/GeoLite2-City.mmdb"
        type: geoip
        locale: "zh-CN" #获取到的地域信息使用中文显示,删掉这行默认是英文显示,能解析数据量会比中文多一点点

vector相关配置例2

收集标签为app/component=javaapp.kubernetes.io/instance=ingress-nginx的pod stdout日志,将java日志推送到victorialogs服务上,将ingress日志推送到clickhouse。

apiVersion: v1
data:
  vector.yaml: |
    # 数据目录
    data_dir: "/var/lib/vector"
    # API
    api:
      enabled: true
      address: "0.0.0.0:8686"

    # 使用官方推荐的kubernetes_logs源
    sources:
      java_k8s_logs:
        type: "kubernetes_logs"
        # 官方推荐:自动处理多行日志合并
        auto_partial_merge: true
        # 只收集Java应用,通过标签过滤
        extra_label_selector: "app/component=java"
      ng_k8s_access:
        type: "kubernetes_logs"
        # 只收集Java应用,通过标签过滤
        extra_label_selector: "app.kubernetes.io/instance=ingress-nginx"

    # 1、只过滤健康检查
    transforms:
      java_filter_logs:
        type: filter
        inputs:
          - java_k8s_logs
        condition: |
          # 只排除明显的健康检查
          !contains(string!(.message), "/actuator/health") && 
          !contains(string!(.message), "/health")
      # 2、Java多行日志增强处理
      java_merge_multiline_logs:
        type: reduce
        inputs: [java_filter_logs]
        group_by:
          - pod_name
          - container_name
        merge_strategies:
          message: concat_newline
        starts_when:
          type: vrl
          source: |
            match(string!(.message), r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[.,]\d{3}')
        expire_after_ms: 5000
        flush_period_ms: 1000
      
      # 3、添加VictoriaLogs所需的_msg字段
      java_add_msg_field:
        type: remap
        inputs: [java_merge_multiline_logs]
        source: |
          # 保留message字段并重命名为_msg
          ._msg = .message

          # 用正则匹配日志级别
          match, err = parse_regex(._msg, r'^(?P<ts>\S+\s+\S+)\s+(?P<level>[A-Z]+)\s+.*')
          if err == null {
            .level = match.level
          } else {
            .level = "unknown"
          }

          # 创建新对象只包含需要的字段
          .log_type  = "java"
          . = {
            "_msg": ._msg,
            "level": .level,
            "log_type": .log_type,
            "kubernetes": {
              "pod_namespace": .kubernetes.pod_namespace,
              "pod_name": .kubernetes.pod_name,
              "container_name": .kubernetes.container_name,
              "pod_node_name": .kubernetes.pod_node_name
            }
          }
          .namespace = .kubernetes.pod_namespace
          del(.kubernetes.pod_namespace)
          .pod = .kubernetes.pod_name
          del(.kubernetes.pod_name)
          .service = .kubernetes.container_name
          del(.kubernetes.container_name)
          .node = .kubernetes.pod_node_name
          del(.kubernetes.pod_node_name)

      nginx_format_access:
        drop_on_error: true
        reroute_dropped: true
        type: remap
        inputs:
          - ng_k8s_access
        source: |
          . = parse_json!(replace(.message, r'([^\x00-\x7F])', "\\\\$$1") ?? .message)
          if exists(.message) {
            . = parse_json!(replace(.message, "\\x", "\\\\x") ?? .message)
          }
          .createdtime = to_unix_timestamp(now(), unit: "milliseconds")
          .timestamp = to_unix_timestamp(parse_timestamp!(.timestamp , format: "%+"), unit: "milliseconds")
          .url_list = split!(.url, "?", 2)
          .path = .url_list[0]
          .query = .url_list[1]
          .path_list = split!(.path, "/", 3)
          if length(.path_list) > 2 {.top_path = join!(["/", .path_list[1]])} else {.top_path = "/"}
          .upstreamtime = to_float(.upstreamtime) ?? 0
          .duration = round((to_float(.responsetime) ?? 0) - to_float(.upstreamtime),3)
          if .xff == "-" { .xff = .remote_ip }
          .client_ip = split!(.xff, ",", 2)[0]
          .ua = parse_user_agent!(.http_user_agent , mode: "enriched")
          .client_browser_family = .ua.browser.family
          .client_browser_major = .ua.browser.major
          .client_os_family = .ua.os.family
          .client_os_major = .ua.os.major
          .client_device_brand = .ua.device.brand
          .client_device_model = .ua.device.model
          .geoip = get_enrichment_table_record("geoip_table", {"ip": .client_ip}) ?? {"city_name":"unknown","region_name":"unknown","country_name":"unknown"}
          .client_city = .geoip.city_name
          .client_region = .geoip.region_name
          .client_country = .geoip.country_name
          .client_latitude = .geoip.latitude
          .client_longitude = .geoip.longitude
          del(.path_list)
          del(.url_list)
          del(.ua)
          del(.geoip)
          del(.url)

    # 输出
    sinks:
      # VictoriaLogs
      java_victorialogs_logs:
        type: loki
        inputs: 
          - java_add_msg_field
        endpoint: "http://victorialogs.kubedoor:9428"
        path: /insert/loki/api/v1/push
        labels:
          origin_prometheus: dev-k8s
          env: dev-k8s
        encoding:
          codec: json
        # 添加批处理和缓冲配置
        batch:
          max_bytes: 102400
          timeout_secs: 5
        buffer:
          type: memory
          max_events: 1000
          when_full: drop_newest
        # 禁用健康检查避免400错误
        healthcheck: false
        request:
          timeout_secs: 30
      nginx_clickhouse_access:
        type: clickhouse
        inputs:
          - nginx_format_access
        endpoint: http://clickhouse.kubedoor:8123  #clickhouse http接口
        database: nginxlogs  #clickhouse 库
        table: nginx_access  #clickhouse 表
        auth:
          strategy: basic
          user: default  #clickhouse 用户名
          password: di88fg2k  #clickhouse 密码
        compression: gzip
      dropped_console:
        type: console
        inputs: ["nginx_format_access.dropped"]
        encoding:
          codec: json
    enrichment_tables:
      geoip_table:
        path: "/usr/share/GeoIP/GeoLite2-City.mmdb"
        type: geoip
        locale: "zh-CN" #获取到的地域信息使用中文显示,删掉这行默认是英文显示,能解析数据量会比中文多一点点
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: kubedoor-agent
    meta.helm.sh/release-namespace: kubedoor
  labels:
    app.kubernetes.io/managed-by: Helm
  name: vector-config
  namespace: kubedoor

vector中的ingres日志格式配置参考: ingress-nginx 配置日志格式

配置ingress-nginx-controllerhttp-snippetlog-format-upstream

apiVersion: v1
data:
  http-snippet: >-
    map "$time_iso8601 # $msec" $time_iso8601_ms {"~(^[^+]+)(\+[0-9:]+) #
    \d+\.(\d+)$" $1.$3$2;}
  log-format-upstream: >-
    {"timestamp":"$time_iso8601_ms","server_ip":"$server_addr","remote_ip":"$remote_addr","xff":"$http_x_forwarded_for","remote_user":"$remote_user","domain":"$host","url":"$request_uri","referer":"$http_referer","upstreamtime":"$upstream_response_time","responsetime":"$request_time","request_method":"$request_method","status":"$status","response_length":"$bytes_sent","request_length":"$request_length","protocol":"$server_protocol","upstreamhost":"$upstream_addr","http_user_agent":"$http_user_agent"}
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: ingress-nginx
    meta.helm.sh/release-namespace: ingress-nginx
  creationTimestamp: '2025-07-21T03:54:32Z'
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.13.0
    helm.sh/chart: ingress-nginx-4.13.0
  managedFields:
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app.kubernetes.io/component: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/part-of: {}
            f:app.kubernetes.io/version: {}
            f:helm.sh/chart: {}
      manager: helm
      operation: Update
      time: '2025-07-21T03:54:32Z'
    - apiVersion: v1
      fieldsType: FieldsV1
      fieldsV1:
        f:data:
          .: {}
          f:http-snippet: {}
          f:log-format-upstream: {}
      manager: agent
      operation: Update
      time: '2025-09-13T10:34:19Z'
  name: ingress-nginx-controller
  namespace: ingress-nginx
  resourceVersion: '11734961'
  uid: 56637872-c797-4a25-9662-a09528eec094

grafana日志看板

Grafana看板ID:24078

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "description": "Dashboard to explore Victoria Logs\r\n",
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 4,
  "links": [
    {
      "asDropdown": false,
      "icon": "bolt",
      "includeVars": true,
      "keepTime": true,
      "tags": [],
      "targetBlank": true,
      "title": "View In Explore",
      "tooltip": "",
      "type": "link",
      "url": "/explore?orgId=1&left={\"datasource\":\"$DS_VICTORIALOGS\",\"queries\":[{\"expr\":\"log_type: \\\"java\\\" AND env: \\\"$env\\\" AND service: in($service) AND level: in($level)\"}]}"
    },
    {
      "asDropdown": false,
      "icon": "external link",
      "includeVars": false,
      "keepTime": false,
      "tags": [],
      "targetBlank": true,
      "title": "Learn LogsQL",
      "tooltip": "",
      "type": "link",
      "url": "https://docs.victoriametrics.com/victorialogs/logsql/"
    }
  ],
  "panels": [
    {
      "datasource": {
        "type": "victoriametrics-logs-datasource",
        "uid": "${DS_VICTORIALOGS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisBorderShow": false,
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "barWidthFactor": 0.6,
            "drawStyle": "bars",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "insertNulls": false,
            "lineInterpolation": "linear",
            "lineWidth": 2,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "short"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "ERROR"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#FF0000",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "WARN"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#FFA500",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "INFO"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#0066CC",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "DEBUG"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#00AA00",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 6,
        "w": 18,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [
            "sum"
          ],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "",
      "targets": [
        {
          "datasource": {
            "type": "victoriametrics-logs-datasource",
            "uid": "Kubedoor-victorialogs"
          },
          "editorMode": "code",
          "expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1) | stats by (level) count()",
          "legendFormat": "{{level}}",
          "queryType": "statsRange",
          "refId": "A"
        }
      ],
      "title": "日志统计 - java (${env})",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "victoriametrics-logs-datasource",
        "uid": "${DS_VICTORIALOGS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "fixed"
          },
          "custom": {
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            }
          },
          "mappings": [],
          "unit": "short"
        },
        "overrides": [
          {
            "matcher": {
              "id": "byName",
              "options": "ERROR"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#FF0000",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "WARN"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#FFA500",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "INFO"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#0066CC",
                  "mode": "fixed"
                }
              }
            ]
          },
          {
            "matcher": {
              "id": "byName",
              "options": "DEBUG"
            },
            "properties": [
              {
                "id": "color",
                "value": {
                  "fixedColor": "#00AA00",
                  "mode": "fixed"
                }
              }
            ]
          }
        ]
      },
      "gridPos": {
        "h": 6,
        "w": 6,
        "x": 18,
        "y": 0
      },
      "id": 3,
      "options": {
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "pieType": "pie",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "pluginVersion": "",
      "targets": [
        {
          "datasource": {
            "type": "victoriametrics-logs-datasource",
            "uid": "Kubedoor-victorialogs"
          },
          "editorMode": "code",
          "expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1) | stats by (level) count() c | sort by (c desc) | limit 10",
          "legendFormat": "{{level}}",
          "queryType": "stats",
          "refId": "A"
        }
      ],
      "title": "日志级别分布",
      "type": "piechart"
    },
    {
      "datasource": {
        "type": "victoriametrics-logs-datasource",
        "uid": "${DS_VICTORIALOGS}"
      },
      "fieldConfig": {
        "defaults": {},
        "overrides": []
      },
      "gridPos": {
        "h": 23,
        "w": 24,
        "x": 0,
        "y": 6
      },
      "id": 1,
      "options": {
        "dedupStrategy": "none",
        "enableLogDetails": true,
        "prettifyLogMessage": false,
        "showCommonLabels": false,
        "showLabels": false,
        "showTime": true,
        "sortOrder": "Descending",
        "wrapLogMessage": false
      },
      "pluginVersion": "",
      "targets": [
        {
          "datasource": {
            "type": "victoriametrics-logs-datasource",
            "uid": "Kubedoor-victorialogs"
          },
          "editorMode": "code",
          "expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1)",
          "queryType": "range",
          "refId": "A"
        }
      ],
      "title": "日志详情 -  (${env})环境",
      "type": "logs"
    }
  ],
  "preload": false,
  "refresh": "",
  "schemaVersion": 40,
  "tags": [
    "VictoriaLogs",
    "nhub.site",
    "vector"
  ],
  "templating": {
    "list": [
      {
        "current": {
          "text": "",
          "value": ""
        },
        "description": "Datasource for logs",
        "hide": 2,
        "includeAll": false,
        "label": "Logs Datasource",
        "name": "DS_VICTORIALOGS",
        "options": [],
        "query": "victoriametrics-logs-datasource",
        "refresh": 1,
        "regex": "",
        "type": "datasource"
      },
      {
        "current": {
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
        "datasource": {
          "type": "victoriametrics-logs-datasource",
          "uid": "${DS_VICTORIALOGS}"
        },
        "definition": "log_type: \"java\"",
        "includeAll": true,
        "label": "服务名",
        "multi": true,
        "name": "service",
        "options": [],
        "query": {
          "field": "service",
          "limit": 100,
          "query": "log_type: \"java\"",
          "refId": "VictoriaLogsVariableQueryEditor-VariableQuery",
          "type": "fieldValue"
        },
        "refresh": 2,
        "regex": "",
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
        "datasource": {
          "type": "victoriametrics-logs-datasource",
          "uid": "Kubedoor-victorialogs"
        },
        "definition": "log_type: \"java\"",
        "includeAll": true,
        "label": "日志级别",
        "multi": true,
        "name": "level",
        "options": [],
        "query": {
          "field": "level",
          "limit": 100,
          "query": "log_type: \"java\"",
          "refId": "VictoriaLogsVariableQueryEditor-VariableQuery",
          "type": "fieldValue"
        },
        "refresh": 2,
        "regex": "",
        "sort": 1,
        "type": "query"
      },
      {
        "current": {
          "text": "dev",
          "value": "dev"
        },
        "includeAll": false,
        "label": "环境",
        "name": "env",
        "options": [
          {
            "selected": true,
            "text": "dev",
            "value": "dev"
          },
          {
            "selected": false,
            "text": "test",
            "value": "test"
          },
          {
            "selected": false,
            "text": "prod",
            "value": "prod"
          }
        ],
        "query": "dev,test,prod",
        "type": "custom"
      },
      {
        "current": {
          "text": "",
          "value": ""
        },
        "label": "查询条件",
        "name": "query",
        "options": [
          {
            "selected": true,
            "text": "",
            "value": ""
          }
        ],
        "query": "",
        "type": "textbox"
      }
    ]
  },
  "time": {
    "from": "now-30m",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "browser",
  "title": "VictoriaLogs日志监控仪表板",
  "uid": "bdpzp3w3jkt8gas",
  "version": 1,
  "weekStart": ""
}

vmalert日志告警

vmalert通过LogsQL语句日志数据满足要求后向alertmanager发送告警。

示例部署 YAML 文件:

---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: vmalert-logs-config
  namespace: kubedoor
data:
  logs.yaml: |-
    groups:
    - name: VictoriaLogs_Java_Alerts-dev-k8s
      type: vlogs
      labels:
        origin_prometheus: dev-k8s
      rules:
      - alert: Java应用ERROR日志过多
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "java" AND level: "ERROR" | stats by (service) count() as error_count | filter error_count :> 100
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- ERROR日志频率:{{ $value | printf \"%.2f\" }}/秒"

      - alert: Java应用异常堆栈日志
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "java" AND (level: "ERROR" OR level: "WARN") |~ "Exception|ERROR|Throwable" | stats by (service) count() as exception_count | filter exception_count :> 150
        for: 2m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 异常堆栈频率:{{ $value | printf \"%.2f\" }}/秒"

      - alert: Java应用OutOfMemoryError
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "java" |~ "OutOfMemoryError" | stats by (service) count() as oom_count | filter oom_count :> 0
        for: 1m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 检测到OutOfMemoryError"

      - alert: Java应用数据库连接异常
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "java" |~ "Connection.*timeout|Connection.*refused|Connection.*reset" | stats by (service) count() as conn_error_count | filter conn_error_count :> 2
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 数据库连接异常频率:{{ $value | printf \"%.2f\" }}/秒"

    - name: VictoriaLogs_Java_Alerts-test-k8s
      type: vlogs
      labels:
        origin_prometheus: test-k8s
      rules:
      - alert: Java应用ERROR日志过多
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "java" AND level: "ERROR" | stats by (service) count() as error_count | filter error_count :> 100
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- ERROR日志频率:{{ $value | printf \"%.2f\" }}/秒"

      - alert: Java应用异常堆栈日志
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "java" AND (level: "ERROR" OR level: "WARN") |~ "Exception|ERROR|Throwable" | stats by (service) count() as exception_count | filter exception_count :> 150
        for: 2m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 异常堆栈频率:{{ $value | printf \"%.2f\" }}/秒"

      - alert: Java应用OutOfMemoryError
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "java" |~ "OutOfMemoryError" | stats by (service) count() as oom_count | filter oom_count :> 0
        for: 1m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 检测到OutOfMemoryError"

      - alert: Java应用数据库连接异常
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "java" |~ "Connection.*timeout|Connection.*refused|Connection.*reset" | stats by (service) count() as conn_error_count | filter conn_error_count :> 2
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 数据库连接异常频率:{{ $value | printf \"%.2f\" }}/秒"

    - name: VictoriaLogs_Nginx_Alerts-dev-k8s
      type: vlogs
      labels:
        origin_prometheus: dev-k8s
      rules:
      - alert: Nginx_5xx错误率过高
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~5.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
        for: 3m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 5xx错误率:{{ $value | printf \"%.2f\" }}%"

      - alert: Nginx_4xx错误率过高
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~4.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 4xx错误率:{{ $value | printf \"%.2f\" }}%"

      - alert: Nginx响应时间过长
        expr: |
          _time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "responsetime=<responsetime>" | stats by (domain) quantile(0.95, responsetime) as p95_response_time | filter p95_response_time :> 2
        for: 5m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 95%响应时间:{{ $value | printf \"%.2f\" }}秒"

    - name: VictoriaLogs_Nginx_Alerts-test-k8s
      type: vlogs
      labels:
        origin_prometheus: test-k8s
      rules:
      - alert: Nginx_5xx错误率过高
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~5.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
        for: 3m
        labels:
          severity: Critical
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 5xx错误率:{{ $value | printf \"%.2f\" }}%"

      - alert: Nginx_4xx错误率过高
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~4.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
        for: 3m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 4xx错误率:{{ $value | printf \"%.2f\" }}%"

      - alert: Nginx响应时间过长
        expr: |
          _time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "responsetime=<responsetime>" | stats by (domain) quantile(0.95, responsetime) as p95_response_time | filter p95_response_time :> 2
        for: 5m
        labels:
          severity: Warning
        annotations:
          description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 95%响应时间:{{ $value | printf \"%.2f\" }}秒"
---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: v1
kind: Service
metadata:
  name: vmalert-logs
  namespace: kubedoor
  labels:
    app: vmalert-logs
spec:
  ports:
    - name: vmalert
      port: 8080
      targetPort: 8080
  type: NodePort
  selector:
    app: vmalert-logs
---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vmalert-logs
  namespace: kubedoor
  labels:
    app: vmalert-logs
spec:
  selector:
    matchLabels:
      app: vmalert-logs
  template:
    metadata:
      labels:
        app: vmalert-logs
    spec:
      containers:
        - name: vmalert
          image: registry.cn-shenzhen.aliyuncs.com/starsl/vmalert:stable
          imagePullPolicy: IfNotPresent
          args:
            # VictoriaLogs数据源配置
            - -datasource.url=http://victorialogs.kubedoor:9428
            - -notifier.url=http://alertmanager.kubedoor:9093
            - -remoteWrite.url=http://monit:dduF1E3sj@victoria-metrics.kubedoor:8428
            - -remoteRead.url=http://monit:dduF1E3sj@victoria-metrics.kubedoor:8428
            - -rule=/etc/ruler/*.yaml
            - -evaluationInterval=15s
            - -rule.defaultRuleType=vlogs
            - -httpListenAddr=0.0.0.0:8080
          env:
            - name: TZ
              value: Asia/Shanghai
          resources:
            limits:
              cpu: '1'
              memory: 1Gi
            requests:
              cpu: 50m
              memory: 128Mi
          ports:
            - containerPort: 8080
              name: http
          volumeMounts:
            - mountPath: /etc/ruler/
              name: ruler
              readOnly: true
      volumes:
        - configMap:
            name: vmalert-logs-config
          name: ruler

总结

从 ELK 到 Loki,再到 VictoriaLogs,日志管理技术的演进体现了对性能、资源消耗和易用性的不断追求。VictoriaLogs 凭借其极致的资源利用率和卓越的查询性能,成为大规模日志场景下的理想选择。通过合理配置日志格式、统一标签以及部署 VictoriaLogs,可以显著提升日志管理的效率和可靠性。


评论