Shipping every container line to OpenSearch is the fastest way to win a large bill and a noisy dashboard.
π This recipe keeps what operators actually triage: warning and error severity logs, plus Kubernetes Warning events—with Fluent Bit, explicit RBAC, and an OpenSearch output you can paste into a cluster today.
π― What you will build
- DaemonSet Fluent Bit on each node: tails
/var/log/containers/*.log, enriches with the Kubernetes filter, drops everything that is not warn/error, ships to OpenSearch. - Deployment (one replica):
kubernetes_eventsinput watches the API, keeps onlytype = Warningevents (Kubernetes does not use an “Error” event type—those show up as warnings with error-like reasons). - Cost angle: fewer documents, smaller daily indices, less hot-tier churn—before you even tune retention or ILM.
π You will apply manifests in order: namespace → RBAC → Secrets → ConfigMaps → workloads.
π Prerequisites
- A Kubernetes cluster with workload logs under
/var/log/containers(standard container runtime paths). - An OpenSearch endpoint reachable from the cluster (TLS on 443 is typical). For Amazon OpenSearch Service, use port
443, TLS on, and IAM SigV4 (covered below). kubectlconfigured and permission to create ClusterRoles, DaemonSets, and Deployments.- Fluent Bit image with the OpenSearch output and kubernetes_events input—examples use
fluent/fluent-bit:3.1(adjust if your policy pins another 3.x patch).
π¨π³ Step 0 — Namespace
kubectl create namespace logging
π Step 1 — OpenSearch credentials (Secret)
For basic authentication (self-managed OpenSearch or fine-grained security with a user/password), create a Secret. Replace placeholders with your endpoint and credentials.
kubectl -n logging create secret generic opensearch-credentials \
--from-literal=opensearch_host="your-opensearch.example.com" \
--from-literal=opensearch_user="your-user" \
--from-literal=opensearch_password="your-password"
You will mount these as environment variables in both Fluent Bit workloads. For Amazon OpenSearch with **IAM only**, skip username/password and use the SigV4 variant in Step 7b instead.
π‘️ Step 2 — RBAC for log collection (DaemonSet)
The Kubernetes filter needs read access to API metadata used for enrichment. This ClusterRole follows the usual Fluent Bit pattern: namespaces, pods, and (optional but common) nodes.
Save as fluent-bit-logs-rbac.yaml and apply: kubectl apply -f fluent-bit-logs-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit-logs
namespace: logging
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-logs
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
- nodes
verbs: ["get", "list", "watch"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-logs
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-logs
subjects:
- kind: ServiceAccount
name: fluent-bit-logs
namespace: logging
π‘️ Step 3 — RBAC for Kubernetes events (single Deployment)
The kubernetes_events input talks to the API server. Official docs require get, list, and watch on namespaces and pods for the namespaces you watch, plus access appropriate for event streaming. This manifest grants watch on events and the usual discovery reads.
Save as fluent-bit-events-rbac.yaml and apply.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit-events
namespace: logging
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-events
rules:
- apiGroups: [""]
resources:
- events
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-events
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-events
subjects:
- kind: ServiceAccount
name: fluent-bit-events
namespace: logging
If the events pod logs show 403 Forbidden from the API server on some distributions, add a second rule for events.k8s.io events (same verbs)—clients vary between core v1 Event and the newer API group.
π Run one replica only for events so every Warning is indexed once—not once per node.
π Step 4 — Lua script: keep warn / error only
Structured logs often expose level after the Kubernetes filter merges JSON. Unstructured logs keep a log string. The Lua callback keeps the record when it sees allowed severities in common fields or patterns, and drops everything else (return -1, 0, 0 per Fluent Bit’s Lua filter contract).
Save the script body into a ConfigMap in the next step; file name on disk: /fluent-bit/scripts/severity.lua.
function cb_warn_error(tag, timestamp, record)
local function ok_level(v)
if v == nil then return false end
local l = string.lower(tostring(v))
return l == "warn" or l == "warning" or l == "error" or l == "fatal"
or l == "critical" or l == "err"
end
if ok_level(record["level"]) or ok_level(record["Level"])
or ok_level(record["severity"]) or ok_level(record["severityText"]) then
return 0, timestamp, record
end
local log = record["log"]
if type(log) == "string" then
local jl = string.match(log, '"level"%s*:%s*"([^"]+)"')
if ok_level(jl) then return 0, timestamp, record end
local s = string.lower(log)
if string.find(s, '"level":"warning"', 1, true)
or string.find(s, '"level":"warn"', 1, true)
or string.find(s, '"level":"error"', 1, true)
or string.find(s, '"level":"fatal"', 1, true) then
return 0, timestamp, record
end
if string.match(log, "%f[%a]WARN%f[^%a]")
or string.match(log, "%f[%a]ERROR%f[^%a]")
or string.match(log, "%f[%a]FATAL%f[^%a]")
or string.find(s, " warning ", 1, true)
or string.find(s, " error ", 1, true) then
return 0, timestamp, record
end
end
return -1, 0, 0
end
π If your apps use a different key (for example severity as an integer), extend ok_level once—still cheaper than indexing trace noise.
⚙️ Step 5 — Fluent Bit config: DaemonSet (logs → OpenSearch)
Key points: Merge_Log On so JSON levels lift to top-level fields; lua filter calls cb_warn_error; OpenSearch output uses Logstash_Format for daily indices and Suppress_Type_Name On for OpenSearch 2.x.
Save as fluent-bit-logs-config.yaml. Replace nothing if you use the Secret keys from Step 1—the DaemonSet below injects OPENSEARCH_HOST, OPENSEARCH_USER, OPENSEARCH_PASSWORD.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-logs-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level info
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
[FILTER]
Name kubernetes
Match kube.*
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[FILTER]
Name lua
Match kube.*
script /fluent-bit/scripts/severity.lua
call cb_warn_error
[OUTPUT]
Name opensearch
Match kube.*
Host ${OPENSEARCH_HOST}
Port 443
HTTP_User ${OPENSEARCH_USER}
HTTP_Passwd ${OPENSEARCH_PASSWORD}
Logstash_Format On
Logstash_Prefix k8s-warn-error
Logstash_DateFormat %Y.%m.%d
Suppress_Type_Name On
tls On
tls.verify On
Compress gzip
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name cri
Format regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
severity.lua: |
function cb_warn_error(tag, timestamp, record)
local function ok_level(v)
if v == nil then return false end
local l = string.lower(tostring(v))
return l == "warn" or l == "warning" or l == "error" or l == "fatal"
or l == "critical" or l == "err"
end
if ok_level(record["level"]) or ok_level(record["Level"])
or ok_level(record["severity"]) or ok_level(record["severityText"]) then
return 0, timestamp, record
end
local log = record["log"]
if type(log) == "string" then
local jl = string.match(log, '"level"%s*:%s*"([^"]+)"')
if ok_level(jl) then return 0, timestamp, record end
local s = string.lower(log)
if string.find(s, '"level":"warning"', 1, true)
or string.find(s, '"level":"warn"', 1, true)
or string.find(s, '"level":"error"', 1, true)
or string.find(s, '"level":"fatal"', 1, true) then
return 0, timestamp, record
end
if string.match(log, "%f[%a]WARN%f[^%a]")
or string.match(log, "%f[%a]ERROR%f[^%a]")
or string.match(log, "%f[%a]FATAL%f[^%a]")
or string.find(s, " warning ", 1, true)
or string.find(s, " error ", 1, true) then
return 0, timestamp, record
end
end
return -1, 0, 0
end
Apply: kubectl apply -f fluent-bit-logs-config.yaml
Self-managed OpenSearch on HTTP often uses port 9200—set Port and tls Off (and point Host at your service) when not using 443.
⚙️ Step 6 — Fluent Bit config: Deployment (Kubernetes events → OpenSearch)
This pipeline tags events as k8s_events.*, keeps only records whose type field equals Warning, and writes to a separate index prefix so you can attach a shorter retention policy in OpenSearch if you want.
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-events-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level info
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2021
[INPUT]
Name kubernetes_events
Tag k8s_events
kube_url https://kubernetes.default.svc
kube_ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
kube_token_file /var/run/secrets/kubernetes.io/serviceaccount/token
db /var/fluent-bit/state/kube-events.db
[FILTER]
Name grep
Match k8s_events*
Regex type Warning
[OUTPUT]
Name opensearch
Match k8s_events*
Host ${OPENSEARCH_HOST}
Port 443
HTTP_User ${OPENSEARCH_USER}
HTTP_Passwd ${OPENSEARCH_PASSWORD}
Logstash_Format On
Logstash_Prefix k8s-events-warn
Logstash_DateFormat %Y.%m.%d
Suppress_Type_Name On
tls On
tls.verify On
Compress gzip
Apply: kubectl apply -f fluent-bit-events-config.yaml
π’ Step 7a — DaemonSet (logs) + Deployment (events)
The DaemonSet mounts host logs and the ConfigMap. Environment variables come from the Secret in Step 1.
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit-logs
namespace: logging
spec:
selector:
matchLabels:
app.kubernetes.io/name: fluent-bit-logs
template:
metadata:
labels:
app.kubernetes.io/name: fluent-bit-logs
spec:
serviceAccountName: fluent-bit-logs
containers:
- name: fluent-bit
image: fluent/fluent-bit:3.1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 2020
env:
- name: OPENSEARCH_HOST
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_host
- name: OPENSEARCH_USER
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_user
- name: OPENSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_password
volumeMounts:
- name: config
mountPath: /fluent-bit/etc/
- name: scripts
mountPath: /fluent-bit/scripts/
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: config
configMap:
name: fluent-bit-logs-config
items:
- key: fluent-bit.conf
path: fluent-bit.conf
- key: parsers.conf
path: parsers.conf
- name: scripts
configMap:
name: fluent-bit-logs-config
items:
- key: severity.lua
path: severity.lua
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
apiVersion: apps/v1
kind: Deployment
metadata:
name: fluent-bit-events
namespace: logging
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: fluent-bit-events
template:
metadata:
labels:
app.kubernetes.io/name: fluent-bit-events
spec:
serviceAccountName: fluent-bit-events
containers:
- name: fluent-bit
image: fluent/fluent-bit:3.1
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 2021
env:
- name: OPENSEARCH_HOST
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_host
- name: OPENSEARCH_USER
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_user
- name: OPENSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: opensearch-credentials
key: opensearch_password
volumeMounts:
- name: config
mountPath: /fluent-bit/etc/
- name: state
mountPath: /var/fluent-bit/state
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256Mi
volumes:
- name: config
configMap:
name: fluent-bit-events-config
items:
- key: fluent-bit.conf
path: fluent-bit.conf
- name: state
emptyDir: {}
Apply the combined file: kubectl apply -f fluent-bit-workloads.yaml
Validate pods: kubectl -n logging get pods and tail Fluent Bit: kubectl -n logging logs daemonset/fluent-bit-logs and kubectl -n logging logs deploy/fluent-bit-events.
☁️ Step 7b — Amazon OpenSearch (IAM / SigV4) variant
If you use Amazon OpenSearch Service with IAM instead of basic auth, adjust the OUTPUT sections in both ConfigMaps: remove HTTP_User / HTTP_Passwd, set AWS_Auth On, AWS_Region to your Region, keep Port 443, Tls On, and Suppress_Type_Name On. Grant the Pod identity (IRSA or instance profile) permission to call the domain’s data-plane APIs.
Official plugin parameters are documented under the OpenSearch output in the Fluent Bit manual (including AWS_Service_Name for OpenSearch Serverless when applicable).
π° Cost optimizations (OpenSearch side)
- Smaller ingest first: This guide already cuts document volume at the agent—usually the largest win.
- Index templates: Map
k8s-warn-error-*andk8s-events-warn-*with sensible shard counts (avoid hundreds of tiny shards per day). - ISM / ILM: Move daily indices to warm, then delete after N days—warn/error corpora age quickly.
- Compression:
Compress gzipon the output reduces network egress; CPU cost is usually negligible next to storage. - Watch shard pressure: If creates fail with vague bulk errors, check shard counts against cluster limits—Fluent Bit can succeed on HTTP while retries exhaust if indices cannot be created.
π₯ CloudChef Pro Tip
Filtering before OpenSearch is cheaper than storing everything and “searching less.”
π Pair this pipeline with dashboard alerts on rate of warn/error per namespace—your signal-to-noise ratio finally makes sense.
π Continue Your CloudChef Journey
π References
- Fluent Bit — Kubernetes events input
- Fluent Bit — OpenSearch output
- Fluent Bit — Lua filter
- Kubernetes API reference — Event (
events.k8s.io/v1) - AWS Documentation — Authentication and access for Amazon OpenSearch Service
π Final Thoughts
You now have two focused pipelines: node collectors that only admit warn/error severities, and a single events watcher that indexes Kubernetes Warning events—both landing in OpenSearch with predictable daily index names for retention and cost policies.
π Tune the Lua allow-list once for your log schema, then let ILM and alerting do the rest.
No comments:
Post a Comment