从 ELK 到 Loki,再到 VictoriaLogs 的日志管理演进
在日志管理领域,技术栈的选择直接影响系统的性能、资源消耗以及运维的复杂度。从 ELK(Elasticsearch, Logstash, Kibana)到 Grafana Loki,再到 VictoriaMetrics 提供的 VictoriaLogs,这一演进过程反映了对更高效、更轻量化解决方案的追求。
ELK 的局限性
ELK 是早期日志管理的经典组合,但由于其架构的复杂性和资源消耗问题,逐渐被许多团队弃用。具体来说:
资源消耗过高:Elasticsearch 对内存和磁盘的需求较高,尤其是在大规模日志场景下,容易导致硬件成本激增。
生态问题:尽管 ELK 生态丰富,但其组件之间的耦合度较高,配置和维护复杂,学习曲线陡峭。
扩展性不足:随着日志量的增长,ELK 的性能瓶颈愈发明显,尤其是在查询和存储方面。
Loki 的优势与不足
Grafana Loki 被视为 ELK 的替代方案之一,主要因其轻量化的设计和高效的日志处理能力而受到青睐:
相对高效:Loki 采用无索引设计,仅对日志的元数据进行索引,大幅减少了存储开销。
与 Grafana 深度集成:Loki 与 Grafana 的无缝集成,使得日志可视化更加便捷。
然而,Loki 在资源消耗上仍然无法满足极致优化的需求,尤其是在超大规模日志场景下,其性能表现仍显不足。
VictoriaLogs 的卓越表现
VictoriaLogs 是 VictoriaMetrics 推出的日志管理解决方案,以其极致的资源利用效率和卓越的查询性能脱颖而出:
资源消耗极低:与 Elasticsearch 和 Loki 相比,VictoriaLogs 的 RAM 消耗减少了 30 倍,磁盘空间减少了 15 倍。这种高效的资源利用率使其成为大规模日志场景的理想选择。
查询性能优异:根据 ClickBench 基准测试,在针对十亿级别 JSON 文档的分析查询中,VictoriaLogs 的查询速度平均比 Elasticsearch 快 9.3 倍。这得益于其创新的存储和查询引擎设计。
轻量级架构:VictoriaLogs 的架构设计简洁,部署和维护成本低,适合现代云原生环境。
单机架构
更多关于 VictoriaLogs 的详细信息,请参考官方文档:VictoriaLogs 官方文档
部署实践:以 Kubernetes 中 Java 日志收集为例
以下内容将介绍如何在 Kubernetes 环境中部署 VictoriaLogs,使用vector收集 Java 应用的日志。其他语言或框架的日志收集可参考此方法进行适当调整。
1. 设置日志格式
为了便于提取日志级别和多行异常日志的收集,建议统一 Java 服务的日志输出格式。推荐的格式为:
日期 时间 级别 消息内容示例日志:
2025-09-12 20:52:53.004 DEBUG 7 --- [ scheduling-1] c.m.p.mapper.AuthLogMapper.selectList : ==> Parameters: ongoing(String)
2025-09-12 20:58:40.006 INFO 7 --- [ scheduling-1] com.mirayai.platform.task.AuthLoginTask : Found 0 ongoing auth logs to process
2025-09-12 21:00:04.462 WARN 7 --- [ scheduling-1] o.a.pdfbox.pdmodel.font.PDType1Font : Using fallback font LiberationSans for base font Symbol
2025-09-12 21:00:08.341 ERROR 7 --- [ scheduling-1] cn.creekmoon.pdf.core.HtmlToPdfUtils : 字体文件不存在, fontFamily:cjk, 系统尝试使用默认字体此格式的优势在于:
清晰易读:时间戳、日志级别和消息内容一目了然。
便于解析:结构化日志方便后续的正则表达式匹配和日志分析。
2. 统一容器标签
为了便于区分不同语言或框架的服务日志,建议为每个 Pod 添加统一的标签。例如:
Java 服务:
app/component: javaGo 服务:
app/component: go
如果后续新增其他语言的服务,且其日志格式与现有格式不一致,则需要重新定义标签规则。例如:
metadata:
labels:
app/component: java3. VictoriaLogs 部署
以下是 VictoriaLogs 的基本部署步骤,重点参数说明如下:
retentionPeriod:日志保留时间,单位为天。例如,设置为
5d表示日志保留 5 天。资源配置:根据实际日志量调整 CPU 和内存分配。
示例部署 YAML 文件:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: '1'
field.cattle.io/publicEndpoints: >-
[{"port":30331,"protocol":"TCP","serviceName":"kubedoor:victorialogs","allNodes":true},{"addresses":["10.102.82.251"],"port":80,"protocol":"HTTP","serviceName":"kubedoor:victorialogs","ingressName":"kubedoor:kubedoor","hostname":"kubedoor.mirayai.com","path":"/select/vmui/","allNodes":false},{"addresses":["10.102.82.251"],"port":80,"protocol":"HTTP","serviceName":"kubedoor:victorialogs","ingressName":"kubedoor:kubedoor","hostname":"kubedoor.mirayai.com","path":"/select/","allNodes":false}]
meta.helm.sh/release-name: kubedoor
meta.helm.sh/release-namespace: kubedoor
labels:
app: victorialogs
app.kubernetes.io/managed-by: Helm
name: victorialogs
namespace: kubedoor
spec:
progressDeadlineSeconds: 600
replicas: 1
selector:
matchLabels:
app: victorialogs
strategy:
type: Recreate
template:
metadata:
labels:
app: victorialogs
spec:
containers:
- args:
- '--storageDataPath=/victoria-logs-data'
- '--httpListenAddr=:9428'
- '--retentionPeriod=5d'
- '--loggerFormat=json'
- '--loggerOutput=stderr'
env:
- name: TZ
value: Asia/Shanghai
image: >-
registry.cn-hangzhou.aliyuncs.com/rhub/victorialogs:v1.23.3-victorialogs-scratch
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9428
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 5
name: victorialogs
ports:
- containerPort: 9428
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9428
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 2
timeoutSeconds: 5
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /victoria-logs-data
name: victoria-logs-data
restartPolicy: Always
volumes:
- name: victoria-logs-data
persistentVolumeClaim:
claimName: victorialogs-pvc
---
apiVersion: v1
kind: Service
metadata:
annotations:
field.cattle.io/publicEndpoints: >-
[{"port":30331,"protocol":"TCP","serviceName":"kubedoor:victorialogs","allNodes":true}]
meta.helm.sh/release-name: kubedoor
meta.helm.sh/release-namespace: kubedoor
labels:
app: victorialogs
app.kubernetes.io/managed-by: Helm
name: victorialogs
namespace: kubedoor
spec:
ports:
- name: http
nodePort: 30331
port: 9428
protocol: TCP
targetPort: 9428
selector:
app: victorialogs
type: NodePort4. Vector部署
vector会根据pod的标签进行日志设置不同的lable然后推送至VictoriaLogs。
示例部署 YAML 文件:
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: vector
namespace: kubedoor
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vector
rules:
- apiGroups: [""]
resources:
- namespaces
- nodes
- pods
verbs: ["get", "list", "watch"]
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: vector
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: vector
subjects:
- kind: ServiceAccount
name: vector
namespace: kubedoor
---
# Source: kubedoor/templates/01.monit/1.vector.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vector
namespace: kubedoor
labels:
app: vector
spec:
selector:
matchLabels:
app: vector
template:
metadata:
labels:
app: vector
spec:
serviceAccountName: vector
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: vector
image: registry.cn-hangzhou.aliyuncs.com/rhub/timberio.vector:0.49.X
imagePullPolicy: Always
env:
- name: VECTOR_SELF_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: VECTOR_LOG
value: "warn"
- name: TZ
value: "Asia/Shanghai"
- name: PROCFS_ROOT
value: "/host/proc"
- name: SYSFS_ROOT
value: "/host/sys"
ports:
- name: api
containerPort: 8686
resources:
requests:
memory: "128Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "1000m"
volumeMounts:
- name: config
mountPath: /etc/vector
- name: data
mountPath: /var/lib/vector
- name: var-log
mountPath: /var/log
readOnly: true
- name: run-containerd
mountPath: /run/containerd/containerd.sock
readOnly: true
- name: var-lib-containerd
mountPath: /var/lib/containerd
readOnly: true
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
livenessProbe:
httpGet:
path: /health
port: api
initialDelaySeconds: 30
periodSeconds: 30
initContainers:
- command:
- sh
- -c
- "until nc -z victorialogs.kubedoor 9428; do echo waiting for basic service `date`; sleep 3; done;"
image: registry.cn-hangzhou.aliyuncs.com/rhub/busybox:1.36
imagePullPolicy: IfNotPresent
name: wait-basic
volumes:
- name: config
configMap:
name: vector-config
- name: data
hostPath:
path: /var/lib/vector
type: DirectoryOrCreate
- name: var-log
hostPath:
path: /var/log
- name: run-containerd
hostPath:
path: /run/containerd/containerd.sock
type: Socket
- name: var-lib-containerd
hostPath:
path: /var/lib/containerd
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sysvector相关配置例1
收集标签为app/component=java和app.kubernetes.io/instance=ingress-nginx的pod stdout日志,然后推送到victorialogs服务上。
apiVersion: v1
kind: ConfigMap
metadata:
name: vector-config
namespace: kubedoor
data:
vector.yaml: |
# 数据目录
data_dir: "/var/lib/vector"
# API
api:
enabled: true
address: "0.0.0.0:8686"
# 使用官方推荐的kubernetes_logs源
sources:
java_k8s_logs:
type: "kubernetes_logs"
# 官方推荐:自动处理多行日志合并
auto_partial_merge: true
# 只收集Java应用,通过标签过滤
extra_label_selector: "app/component=java"
ng_k8s_access:
type: "kubernetes_logs"
# 只收集ingress应用,通过标签过滤
extra_label_selector: "app.kubernetes.io/instance=ingress-nginx"
# 1、只过滤健康检查
transforms:
java_filter_logs:
type: filter
inputs:
- java_k8s_logs
condition: |
# 只排除明显的健康检查
!contains(string!(.message), "/actuator/health") &&
!contains(string!(.message), "/health")
# 2、Java多行日志增强处理
java_merge_multiline_logs:
type: reduce
inputs: [java_filter_logs]
group_by:
- pod_name
- container_name
merge_strategies:
message: concat_newline
starts_when:
type: vrl
source: |
match(string!(.message), r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[.,]\d{3}')
expire_after_ms: 5000
flush_period_ms: 1000
# 3、添加VictoriaLogs所需的_msg字段
java_add_msg_field:
type: remap
inputs: [java_merge_multiline_logs]
source: |
# 保留message字段并重命名为_msg
._msg = .message
# 用正则匹配日志级别
match, err = parse_regex(._msg, r'^(?P<ts>\S+\s+\S+)\s+(?P<level>[A-Z]+)\s+.*')
if err == null {
.level = match.level
} else {
.level = "unknown"
}
# 创建新对象只包含需要的字段
.log_type = "java"
. = {
"_msg": ._msg,
"level": .level,
"log_type": .log_type,
"kubernetes": {
"pod_namespace": .kubernetes.pod_namespace,
"pod_name": .kubernetes.pod_name,
"container_name": .kubernetes.container_name,
"pod_node_name": .kubernetes.pod_node_name
}
}
.namespace = .kubernetes.pod_namespace
del(.kubernetes.pod_namespace)
.pod = .kubernetes.pod_name
del(.kubernetes.pod_name)
.service = .kubernetes.container_name
del(.kubernetes.container_name)
.node = .kubernetes.pod_node_name
del(.kubernetes.pod_node_name)
nginx_format_access:
drop_on_error: true
reroute_dropped: true
type: remap
inputs:
- ng_k8s_access
source: |
._msg = .message
# 解析JSON到临时变量,避免覆盖整个事件
parsed_json = parse_json!(replace(.message, r'([^\x00-\x7F])', "\\\\$$1") ?? .message)
if exists(parsed_json.message) {
parsed_json = parse_json!(replace(parsed_json.message, "\\x", "\\\\x") ?? parsed_json.message)
}
# 将解析的字段合并到当前事件,保留_msg
. = merge!(., parsed_json)
.createdtime = to_unix_timestamp(now(), unit: "milliseconds")
.timestamp = to_unix_timestamp(parse_timestamp!(.timestamp , format: "%+"), unit: "milliseconds")
.url_list = split!(.url, "?", 2)
.path = .url_list[0]
.query = .url_list[1]
.path_list = split!(.path, "/", 3)
if length(.path_list) > 2 {.top_path = join!(["/", .path_list[1]])} else {.top_path = "/"}
.upstreamtime = to_float(.upstreamtime) ?? 0
.duration = round((to_float(.responsetime) ?? 0) - to_float(.upstreamtime),3)
if .xff == "-" { .xff = .remote_ip }
.client_ip = split!(.xff, ",", 2)[0]
.ua = parse_user_agent!(.http_user_agent , mode: "enriched")
.client_browser_family = .ua.browser.family
.client_browser_major = .ua.browser.major
.client_os_family = .ua.os.family
.client_os_major = .ua.os.major
.client_device_brand = .ua.device.brand
.client_device_model = .ua.device.model
# GeoIP 查询必须传字符串表名
.geoip = get_enrichment_table_record("geoip_table", {"ip": .client_ip}) ?? {"city_name":"unknown","region_name":"unknown","country_name":"unknown"}
.client_city = .geoip.city_name
.client_region = .geoip.region_name
.client_country = .geoip.country_name
.client_latitude = .geoip.latitude
.client_longitude = .geoip.longitude
# 只保留nginx字段
.log_type = "nginx"
. = {
"_msg": ._msg,
"log_type": .log_type,
"timestamp": .timestamp,
"createdtime": .createdtime,
"server_ip": .server_ip,
"remote_ip": .remote_ip,
"xff": .xff,
"remote_user": .remote_user,
"domain": .domain,
"url": .url,
"path": .path,
"query": .query,
"top_path": .top_path,
"referer": .referer,
"upstreamtime": .upstreamtime,
"responsetime": .responsetime,
"duration": .duration,
"request_method": .request_method,
"status": .status,
"response_length": .response_length,
"request_length": .request_length,
"protocol": .protocol,
"upstreamhost": .upstreamhost,
"http_user_agent": .http_user_agent,
"client_ip": .client_ip,
"client_browser_family": .client_browser_family,
"client_browser_major": .client_browser_major,
"client_os_family": .client_os_family,
"client_os_major": .client_os_major,
"client_device_brand": .client_device_brand,
"client_device_model": .client_device_model,
"client_city": .client_city,
"client_region": .client_region,
"client_country": .client_country,
"client_latitude": .client_latitude,
"client_longitude": .client_longitude
}
# 输出
sinks:
# VictoriaLogs
victorialogs:
type: loki
inputs:
- java_add_msg_field
- nginx_format_access
endpoint: "http://victorialogs.kubedoor:9428"
path: /insert/loki/api/v1/push
labels:
origin_prometheus: dev-k8s
env: dev-k8s
encoding:
codec: json
# 添加批处理和缓冲配置
batch:
max_bytes: 102400
timeout_secs: 5
buffer:
type: memory
max_events: 1000
when_full: drop_newest
# 禁用健康检查避免400错误
healthcheck: false
request:
timeout_secs: 30
dropped_console:
type: console
inputs: ["nginx_format_access.dropped"]
encoding:
codec: json
enrichment_tables:
geoip_table:
path: "/usr/share/GeoIP/GeoLite2-City.mmdb"
type: geoip
locale: "zh-CN" #获取到的地域信息使用中文显示,删掉这行默认是英文显示,能解析数据量会比中文多一点点vector相关配置例2
收集标签为app/component=java和app.kubernetes.io/instance=ingress-nginx的pod stdout日志,将java日志推送到victorialogs服务上,将ingress日志推送到clickhouse。
apiVersion: v1
data:
vector.yaml: |
# 数据目录
data_dir: "/var/lib/vector"
# API
api:
enabled: true
address: "0.0.0.0:8686"
# 使用官方推荐的kubernetes_logs源
sources:
java_k8s_logs:
type: "kubernetes_logs"
# 官方推荐:自动处理多行日志合并
auto_partial_merge: true
# 只收集Java应用,通过标签过滤
extra_label_selector: "app/component=java"
ng_k8s_access:
type: "kubernetes_logs"
# 只收集Java应用,通过标签过滤
extra_label_selector: "app.kubernetes.io/instance=ingress-nginx"
# 1、只过滤健康检查
transforms:
java_filter_logs:
type: filter
inputs:
- java_k8s_logs
condition: |
# 只排除明显的健康检查
!contains(string!(.message), "/actuator/health") &&
!contains(string!(.message), "/health")
# 2、Java多行日志增强处理
java_merge_multiline_logs:
type: reduce
inputs: [java_filter_logs]
group_by:
- pod_name
- container_name
merge_strategies:
message: concat_newline
starts_when:
type: vrl
source: |
match(string!(.message), r'^\d{4}-\d{2}-\d{2}[ T]\d{2}:\d{2}:\d{2}[.,]\d{3}')
expire_after_ms: 5000
flush_period_ms: 1000
# 3、添加VictoriaLogs所需的_msg字段
java_add_msg_field:
type: remap
inputs: [java_merge_multiline_logs]
source: |
# 保留message字段并重命名为_msg
._msg = .message
# 用正则匹配日志级别
match, err = parse_regex(._msg, r'^(?P<ts>\S+\s+\S+)\s+(?P<level>[A-Z]+)\s+.*')
if err == null {
.level = match.level
} else {
.level = "unknown"
}
# 创建新对象只包含需要的字段
.log_type = "java"
. = {
"_msg": ._msg,
"level": .level,
"log_type": .log_type,
"kubernetes": {
"pod_namespace": .kubernetes.pod_namespace,
"pod_name": .kubernetes.pod_name,
"container_name": .kubernetes.container_name,
"pod_node_name": .kubernetes.pod_node_name
}
}
.namespace = .kubernetes.pod_namespace
del(.kubernetes.pod_namespace)
.pod = .kubernetes.pod_name
del(.kubernetes.pod_name)
.service = .kubernetes.container_name
del(.kubernetes.container_name)
.node = .kubernetes.pod_node_name
del(.kubernetes.pod_node_name)
nginx_format_access:
drop_on_error: true
reroute_dropped: true
type: remap
inputs:
- ng_k8s_access
source: |
. = parse_json!(replace(.message, r'([^\x00-\x7F])', "\\\\$$1") ?? .message)
if exists(.message) {
. = parse_json!(replace(.message, "\\x", "\\\\x") ?? .message)
}
.createdtime = to_unix_timestamp(now(), unit: "milliseconds")
.timestamp = to_unix_timestamp(parse_timestamp!(.timestamp , format: "%+"), unit: "milliseconds")
.url_list = split!(.url, "?", 2)
.path = .url_list[0]
.query = .url_list[1]
.path_list = split!(.path, "/", 3)
if length(.path_list) > 2 {.top_path = join!(["/", .path_list[1]])} else {.top_path = "/"}
.upstreamtime = to_float(.upstreamtime) ?? 0
.duration = round((to_float(.responsetime) ?? 0) - to_float(.upstreamtime),3)
if .xff == "-" { .xff = .remote_ip }
.client_ip = split!(.xff, ",", 2)[0]
.ua = parse_user_agent!(.http_user_agent , mode: "enriched")
.client_browser_family = .ua.browser.family
.client_browser_major = .ua.browser.major
.client_os_family = .ua.os.family
.client_os_major = .ua.os.major
.client_device_brand = .ua.device.brand
.client_device_model = .ua.device.model
.geoip = get_enrichment_table_record("geoip_table", {"ip": .client_ip}) ?? {"city_name":"unknown","region_name":"unknown","country_name":"unknown"}
.client_city = .geoip.city_name
.client_region = .geoip.region_name
.client_country = .geoip.country_name
.client_latitude = .geoip.latitude
.client_longitude = .geoip.longitude
del(.path_list)
del(.url_list)
del(.ua)
del(.geoip)
del(.url)
# 输出
sinks:
# VictoriaLogs
java_victorialogs_logs:
type: loki
inputs:
- java_add_msg_field
endpoint: "http://victorialogs.kubedoor:9428"
path: /insert/loki/api/v1/push
labels:
origin_prometheus: dev-k8s
env: dev-k8s
encoding:
codec: json
# 添加批处理和缓冲配置
batch:
max_bytes: 102400
timeout_secs: 5
buffer:
type: memory
max_events: 1000
when_full: drop_newest
# 禁用健康检查避免400错误
healthcheck: false
request:
timeout_secs: 30
nginx_clickhouse_access:
type: clickhouse
inputs:
- nginx_format_access
endpoint: http://clickhouse.kubedoor:8123 #clickhouse http接口
database: nginxlogs #clickhouse 库
table: nginx_access #clickhouse 表
auth:
strategy: basic
user: default #clickhouse 用户名
password: di88fg2k #clickhouse 密码
compression: gzip
dropped_console:
type: console
inputs: ["nginx_format_access.dropped"]
encoding:
codec: json
enrichment_tables:
geoip_table:
path: "/usr/share/GeoIP/GeoLite2-City.mmdb"
type: geoip
locale: "zh-CN" #获取到的地域信息使用中文显示,删掉这行默认是英文显示,能解析数据量会比中文多一点点
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: kubedoor-agent
meta.helm.sh/release-namespace: kubedoor
labels:
app.kubernetes.io/managed-by: Helm
name: vector-config
namespace: kubedoorvector中的ingres日志格式配置参考: ingress-nginx 配置日志格式。
配置ingress-nginx-controller的http-snippet和log-format-upstream。
apiVersion: v1
data:
http-snippet: >-
map "$time_iso8601 # $msec" $time_iso8601_ms {"~(^[^+]+)(\+[0-9:]+) #
\d+\.(\d+)$" $1.$3$2;}
log-format-upstream: >-
{"timestamp":"$time_iso8601_ms","server_ip":"$server_addr","remote_ip":"$remote_addr","xff":"$http_x_forwarded_for","remote_user":"$remote_user","domain":"$host","url":"$request_uri","referer":"$http_referer","upstreamtime":"$upstream_response_time","responsetime":"$request_time","request_method":"$request_method","status":"$status","response_length":"$bytes_sent","request_length":"$request_length","protocol":"$server_protocol","upstreamhost":"$upstream_addr","http_user_agent":"$http_user_agent"}
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: ingress-nginx
meta.helm.sh/release-namespace: ingress-nginx
creationTimestamp: '2025-07-21T03:54:32Z'
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.13.0
helm.sh/chart: ingress-nginx-4.13.0
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app.kubernetes.io/component: {}
f:app.kubernetes.io/instance: {}
f:app.kubernetes.io/managed-by: {}
f:app.kubernetes.io/name: {}
f:app.kubernetes.io/part-of: {}
f:app.kubernetes.io/version: {}
f:helm.sh/chart: {}
manager: helm
operation: Update
time: '2025-07-21T03:54:32Z'
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
.: {}
f:http-snippet: {}
f:log-format-upstream: {}
manager: agent
operation: Update
time: '2025-09-13T10:34:19Z'
name: ingress-nginx-controller
namespace: ingress-nginx
resourceVersion: '11734961'
uid: 56637872-c797-4a25-9662-a09528eec094grafana日志看板
Grafana看板ID:24078
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"description": "Dashboard to explore Victoria Logs\r\n",
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 4,
"links": [
{
"asDropdown": false,
"icon": "bolt",
"includeVars": true,
"keepTime": true,
"tags": [],
"targetBlank": true,
"title": "View In Explore",
"tooltip": "",
"type": "link",
"url": "/explore?orgId=1&left={\"datasource\":\"$DS_VICTORIALOGS\",\"queries\":[{\"expr\":\"log_type: \\\"java\\\" AND env: \\\"$env\\\" AND service: in($service) AND level: in($level)\"}]}"
},
{
"asDropdown": false,
"icon": "external link",
"includeVars": false,
"keepTime": false,
"tags": [],
"targetBlank": true,
"title": "Learn LogsQL",
"tooltip": "",
"type": "link",
"url": "https://docs.victoriametrics.com/victorialogs/logsql/"
}
],
"panels": [
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "${DS_VICTORIALOGS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"barWidthFactor": 0.6,
"drawStyle": "bars",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "ERROR"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FF0000",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "WARN"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FFA500",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "INFO"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#0066CC",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "DEBUG"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#00AA00",
"mode": "fixed"
}
}
]
}
]
},
"gridPos": {
"h": 6,
"w": 18,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [
"sum"
],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "",
"targets": [
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "Kubedoor-victorialogs"
},
"editorMode": "code",
"expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1) | stats by (level) count()",
"legendFormat": "{{level}}",
"queryType": "statsRange",
"refId": "A"
}
],
"title": "日志统计 - java (${env})",
"type": "timeseries"
},
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "${DS_VICTORIALOGS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "fixed"
},
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
}
},
"mappings": [],
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "ERROR"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FF0000",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "WARN"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FFA500",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "INFO"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#0066CC",
"mode": "fixed"
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "DEBUG"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#00AA00",
"mode": "fixed"
}
}
]
}
]
},
"gridPos": {
"h": 6,
"w": 6,
"x": 18,
"y": 0
},
"id": 3,
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"pieType": "pie",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "",
"targets": [
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "Kubedoor-victorialogs"
},
"editorMode": "code",
"expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1) | stats by (level) count() c | sort by (c desc) | limit 10",
"legendFormat": "{{level}}",
"queryType": "stats",
"refId": "A"
}
],
"title": "日志级别分布",
"type": "piechart"
},
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "${DS_VICTORIALOGS}"
},
"fieldConfig": {
"defaults": {},
"overrides": []
},
"gridPos": {
"h": 23,
"w": 24,
"x": 0,
"y": 6
},
"id": 1,
"options": {
"dedupStrategy": "none",
"enableLogDetails": true,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": false
},
"pluginVersion": "",
"targets": [
{
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "Kubedoor-victorialogs"
},
"editorMode": "code",
"expr": "log_type: \"java\" AND env: \"$env\" AND service: in($service) AND level: in($level) AND ($query != \"\" or 1==1)",
"queryType": "range",
"refId": "A"
}
],
"title": "日志详情 - (${env})环境",
"type": "logs"
}
],
"preload": false,
"refresh": "",
"schemaVersion": 40,
"tags": [
"VictoriaLogs",
"nhub.site",
"vector"
],
"templating": {
"list": [
{
"current": {
"text": "",
"value": ""
},
"description": "Datasource for logs",
"hide": 2,
"includeAll": false,
"label": "Logs Datasource",
"name": "DS_VICTORIALOGS",
"options": [],
"query": "victoriametrics-logs-datasource",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"current": {
"text": [
"All"
],
"value": [
"$__all"
]
},
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "${DS_VICTORIALOGS}"
},
"definition": "log_type: \"java\"",
"includeAll": true,
"label": "服务名",
"multi": true,
"name": "service",
"options": [],
"query": {
"field": "service",
"limit": 100,
"query": "log_type: \"java\"",
"refId": "VictoriaLogsVariableQueryEditor-VariableQuery",
"type": "fieldValue"
},
"refresh": 2,
"regex": "",
"sort": 1,
"type": "query"
},
{
"current": {
"text": [
"All"
],
"value": [
"$__all"
]
},
"datasource": {
"type": "victoriametrics-logs-datasource",
"uid": "Kubedoor-victorialogs"
},
"definition": "log_type: \"java\"",
"includeAll": true,
"label": "日志级别",
"multi": true,
"name": "level",
"options": [],
"query": {
"field": "level",
"limit": 100,
"query": "log_type: \"java\"",
"refId": "VictoriaLogsVariableQueryEditor-VariableQuery",
"type": "fieldValue"
},
"refresh": 2,
"regex": "",
"sort": 1,
"type": "query"
},
{
"current": {
"text": "dev",
"value": "dev"
},
"includeAll": false,
"label": "环境",
"name": "env",
"options": [
{
"selected": true,
"text": "dev",
"value": "dev"
},
{
"selected": false,
"text": "test",
"value": "test"
},
{
"selected": false,
"text": "prod",
"value": "prod"
}
],
"query": "dev,test,prod",
"type": "custom"
},
{
"current": {
"text": "",
"value": ""
},
"label": "查询条件",
"name": "query",
"options": [
{
"selected": true,
"text": "",
"value": ""
}
],
"query": "",
"type": "textbox"
}
]
},
"time": {
"from": "now-30m",
"to": "now"
},
"timepicker": {},
"timezone": "browser",
"title": "VictoriaLogs日志监控仪表板",
"uid": "bdpzp3w3jkt8gas",
"version": 1,
"weekStart": ""
}vmalert日志告警
vmalert通过LogsQL语句日志数据满足要求后向alertmanager发送告警。
示例部署 YAML 文件:
---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: vmalert-logs-config
namespace: kubedoor
data:
logs.yaml: |-
groups:
- name: VictoriaLogs_Java_Alerts-dev-k8s
type: vlogs
labels:
origin_prometheus: dev-k8s
rules:
- alert: Java应用ERROR日志过多
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "java" AND level: "ERROR" | stats by (service) count() as error_count | filter error_count :> 100
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- ERROR日志频率:{{ $value | printf \"%.2f\" }}/秒"
- alert: Java应用异常堆栈日志
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "java" AND (level: "ERROR" OR level: "WARN") |~ "Exception|ERROR|Throwable" | stats by (service) count() as exception_count | filter exception_count :> 150
for: 2m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 异常堆栈频率:{{ $value | printf \"%.2f\" }}/秒"
- alert: Java应用OutOfMemoryError
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "java" |~ "OutOfMemoryError" | stats by (service) count() as oom_count | filter oom_count :> 0
for: 1m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 检测到OutOfMemoryError"
- alert: Java应用数据库连接异常
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "java" |~ "Connection.*timeout|Connection.*refused|Connection.*reset" | stats by (service) count() as conn_error_count | filter conn_error_count :> 2
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 数据库连接异常频率:{{ $value | printf \"%.2f\" }}/秒"
- name: VictoriaLogs_Java_Alerts-test-k8s
type: vlogs
labels:
origin_prometheus: test-k8s
rules:
- alert: Java应用ERROR日志过多
expr: |
_time:5m AND env: "test-k8s" AND log_type: "java" AND level: "ERROR" | stats by (service) count() as error_count | filter error_count :> 100
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- ERROR日志频率:{{ $value | printf \"%.2f\" }}/秒"
- alert: Java应用异常堆栈日志
expr: |
_time:5m AND env: "test-k8s" AND log_type: "java" AND (level: "ERROR" OR level: "WARN") |~ "Exception|ERROR|Throwable" | stats by (service) count() as exception_count | filter exception_count :> 150
for: 2m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 异常堆栈频率:{{ $value | printf \"%.2f\" }}/秒"
- alert: Java应用OutOfMemoryError
expr: |
_time:5m AND env: "test-k8s" AND log_type: "java" |~ "OutOfMemoryError" | stats by (service) count() as oom_count | filter oom_count :> 0
for: 1m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 检测到OutOfMemoryError"
- alert: Java应用数据库连接异常
expr: |
_time:5m AND env: "test-k8s" AND log_type: "java" |~ "Connection.*timeout|Connection.*refused|Connection.*reset" | stats by (service) count() as conn_error_count | filter conn_error_count :> 2
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 服务:{{ $labels.service }}\n- 数据库连接异常频率:{{ $value | printf \"%.2f\" }}/秒"
- name: VictoriaLogs_Nginx_Alerts-dev-k8s
type: vlogs
labels:
origin_prometheus: dev-k8s
rules:
- alert: Nginx_5xx错误率过高
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~5.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
for: 3m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 5xx错误率:{{ $value | printf \"%.2f\" }}%"
- alert: Nginx_4xx错误率过高
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~4.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 4xx错误率:{{ $value | printf \"%.2f\" }}%"
- alert: Nginx响应时间过长
expr: |
_time:5m AND env: "dev-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "responsetime=<responsetime>" | stats by (domain) quantile(0.95, responsetime) as p95_response_time | filter p95_response_time :> 2
for: 5m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 95%响应时间:{{ $value | printf \"%.2f\" }}秒"
- name: VictoriaLogs_Nginx_Alerts-test-k8s
type: vlogs
labels:
origin_prometheus: test-k8s
rules:
- alert: Nginx_5xx错误率过高
expr: |
_time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~5.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
for: 3m
labels:
severity: Critical
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 5xx错误率:{{ $value | printf \"%.2f\" }}%"
- alert: Nginx_4xx错误率过高
expr: |
_time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "status=<status>;" | stats by (domain) count() if (status:~4.*) as failed, count() as total| math failed / total as failed_percentage| filter failed_percentage :> 0.01 | fields domain,failed_percentage
for: 3m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 4xx错误率:{{ $value | printf \"%.2f\" }}%"
- alert: Nginx响应时间过长
expr: |
_time:5m AND env: "test-k8s" AND log_type: "nginx" | extract "domain=<domain> " | extract "responsetime=<responsetime>" | stats by (domain) quantile(0.95, responsetime) as p95_response_time | filter p95_response_time :> 2
for: 5m
labels:
severity: Warning
annotations:
description: "K8S:{{ $labels.origin_prometheus }}\n- 域名:{{ $labels.domain }}\n- 95%响应时间:{{ $value | printf \"%.2f\" }}秒"
---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: v1
kind: Service
metadata:
name: vmalert-logs
namespace: kubedoor
labels:
app: vmalert-logs
spec:
ports:
- name: vmalert
port: 8080
targetPort: 8080
type: NodePort
selector:
app: vmalert-logs
---
# Source: kubedoor/templates/01.monit/2.vmalert-logs.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: vmalert-logs
namespace: kubedoor
labels:
app: vmalert-logs
spec:
selector:
matchLabels:
app: vmalert-logs
template:
metadata:
labels:
app: vmalert-logs
spec:
containers:
- name: vmalert
image: registry.cn-shenzhen.aliyuncs.com/starsl/vmalert:stable
imagePullPolicy: IfNotPresent
args:
# VictoriaLogs数据源配置
- -datasource.url=http://victorialogs.kubedoor:9428
- -notifier.url=http://alertmanager.kubedoor:9093
- -remoteWrite.url=http://monit:dduF1E3sj@victoria-metrics.kubedoor:8428
- -remoteRead.url=http://monit:dduF1E3sj@victoria-metrics.kubedoor:8428
- -rule=/etc/ruler/*.yaml
- -evaluationInterval=15s
- -rule.defaultRuleType=vlogs
- -httpListenAddr=0.0.0.0:8080
env:
- name: TZ
value: Asia/Shanghai
resources:
limits:
cpu: '1'
memory: 1Gi
requests:
cpu: 50m
memory: 128Mi
ports:
- containerPort: 8080
name: http
volumeMounts:
- mountPath: /etc/ruler/
name: ruler
readOnly: true
volumes:
- configMap:
name: vmalert-logs-config
name: ruler
总结
从 ELK 到 Loki,再到 VictoriaLogs,日志管理技术的演进体现了对性能、资源消耗和易用性的不断追求。VictoriaLogs 凭借其极致的资源利用率和卓越的查询性能,成为大规模日志场景下的理想选择。通过合理配置日志格式、统一标签以及部署 VictoriaLogs,可以显著提升日志管理的效率和可靠性。