安全最佳实践:RBAC + NetworkPolicy + 镜像安全

深入理解 Kubernetes 安全体系,学习 RBAC 权限控制、网络策略、Pod 安全策略以及镜像安全加固。

概述

安全是 Kubernetes 生产环境的重要议题。本文将深入探讨云原生安全体系的各个层面:

学习目标

  • 理解 Kubernetes 安全模型(纵深防御)
  • 掌握 RBAC 权限控制配置
  • 学会使用 NetworkPolicy 网络隔离
  • 掌握 Pod 安全策略(PSA/PSP)
  • 了解镜像安全与扫描实践

安全模型概述

纵深防御

┌─────────────────────────────────────────────────────────────────┐
│                    Kubernetes 安全层次                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                     Cluster 边界                          │   │
│  │                                                          │   │
│  │  ┌──────────────────────────────────────────────────┐    │   │
│  │  │                   命名空间隔离                    │    │   │
│  │  │                                                  │    │   │
│  │  │  ┌──────────────────────────────────────────┐    │    │   │
│  │  │  │              NetworkPolicy              │    │    │   │
│  │  │  │         (微服务间网络隔离)                │    │    │   │
│  │  │  └──────────────────────────────────────────┘    │    │   │
│  │  │                     │                           │    │   │
│  │  │  ┌──────────────────────────────────────────┐    │    │   │
│  │  │  │              Pod Security                │    │    │   │
│  │  │  │          (容器的运行限制)                │    │    │   │
│  │  │  └──────────────────────────────────────────┘    │    │   │
│  │  │                     │                           │    │   │
│  │  │  ┌──────────────────────────────────────────┐    │    │   │
│  │  │  │              RBAC                        │    │    │   │
│  │  │  │         (身份和权限控制)                  │    │    │   │
│  │  │  └──────────────────────────────────────────┘    │    │   │
│  │  │                     │                           │    │   │
│  │  │  ┌──────────────────────────────────────────┐    │    │   │
│  │  │  │              Secrets                     │    │    │   │
│  │  │  │            (敏感数据保护)                 │    │    │   │
│  │  │  └──────────────────────────────────────────┘    │    │   │
│  │  │                     │                           │    │   │
│  │  └─────────────────────┼───────────────────────────┘    │   │
│  │                        │                                 │   │
│  └────────────────────────┼─────────────────────────────────┘   │
│                           │                                     │
│                           ▼                                     │
│                     物理/云安全                                 │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

安全原则

┌─────────────────────────────────────────────────────────────────┐
│                    安全设计原则                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  最小权限原则(Principle of Least Privilege)                    │
│  - 只授予完成任务所需的最小权限                                  │
│  - 避免使用 cluster-admin                                        │
│                                                                 │
│  深度防御(Defense in Depth)                                    │
│  - 多层安全控制                                                 │
│  - 单点失败不影响整体                                           │
│                                                                 │
│  零信任(Zero Trust)                                           │
│  - 不信任任何请求                                               │
│  - 验证所有来源                                                 │
│                                                                 │
│  默认安全(Secure by Default)                                   │
│  - 使用安全的默认值                                             │
│  - 显式配置而非隐式                                             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

RBAC 权限控制

RBAC 模型

┌─────────────────────────────────────────────────────────────────┐
│                    RBAC 核心概念                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    │
│   │   Verb  │    │ Resource│    │   Role  │    │Subject │    │
│   ├─────────┤    ├─────────┤    ├─────────┤    ├─────────┤    │
│   │  get    │    │ pods    │    │         │    │ User   │    │
│   │  list   │───▶│ services│───▶│  Role   │◀───│ Group  │    │
│   │  create │    │ configmaps│  │ Cluster │    │ SA     │    │
│   │  update │    │ secrets │    │ Role    │    │        │    │
│   │  delete │    │ ...    │    │         │    │        │    │
│   └─────────┘    └─────────┘    └────┬────┘    └─────────┘    │
│                                     │                           │
│                                     ▼                           │
│                             ┌───────────────┐                  │
│                             │ RoleBinding   │                  │
│                             │ ClusterRole   │                  │
│                             │   Binding     │                  │
│                             └───────────────┘                  │
│                                                                 │
│   Role:命名空间级别权限                                        │
│   ClusterRole:集群级别权限                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Role 与 RoleBinding

# namespace-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: namespace-reader
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["get", "list", "watch"]

---
# namespace-reader-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: namespace-reader
  namespace: production
subjects:
- kind: User
  name: alice@example.com
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: developers
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: namespace-reader

ClusterRole 与 ClusterRoleBinding

# cluster-admin-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
# 读取节点信息(用于监控)
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["get", "list", "watch"]

---
# cluster-admin-binding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-reader-binding
subjects:
- kind: ServiceAccount
  name: monitoring-agent
  namespace: monitoring
roleRef:
  kind: ClusterRole
  name: node-reader

常用权限模式

# 只读权限(审计、监控)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: readonly
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["get", "list", "watch"]

# 应用开发者权限(部署、扩缩容)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets", "daemonsets"]
  verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

# 命名空间管理员
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: namespace-admin
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]
  # 注意:排除某些权限
- apiGroups: ["rbac.authorization.k8s.io"]
  resources: ["clusterroles", "clusterrolebindings"]
  verbs: ["get", "list"]

聚合 ClusterRole

# 聚合多个 Role 到一个 ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aggregate-viewer
aggregationRule:
  clusterRoleSelectors:
  - matchLabels:
      rbac.example.com/aggregate-to-view: "true"
---
# 使用标签聚合
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    rbac.example.com/aggregate-to-view: "true"
  name: myapp-reader
rules:
- apiGroups: ["myapp.example.com"]
  resources: ["myapps"]
  verbs: ["get", "list", "watch"]
# 自动聚合到 aggregate-viewer

NetworkPolicy

默认网络策略

# 禁止所有入站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}           # 选择所有 Pod
  policyTypes:
  - Ingress                # 拒绝所有入站

---
# 禁止所有出站流量
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
spec:
  podSelector: {}
  policyTypes:
  - Egress

---
# 同时禁止入站和出站
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

微服务网络策略

# frontend-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
  - Ingress
  - Egress

  # 允许来自 Ingress 的流量
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080

  # 允许访问 backend
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 8080

  # 允许 DNS
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
# backend-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  - Egress

  # 允许来自 frontend 的流量
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

  # 允许访问 database
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432

  # 允许访问 Redis
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: redis
    ports:
    - protocol: TCP
      port: 6379

  # 允许 DNS
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53

命名空间隔离

# namespace-isolation.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: namespace-isolation
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

  # 只允许同命名空间流量
  ingress:
  - from:
    - namespaceSelector: {}

  # 允许 DNS
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53

  # 允许外部 API(白名单)
  egress:
  - to:
    - ipBlock:
        cidr: 10.0.0.0/8
    - ipBlock:
        cidr: 192.168.0.0/16

Pod 安全策略

Pod Security Standards(PSS)

# baseline-policy.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: baseline
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/warn: baseline
    pod-security.kubernetes.io/warn-version: latest

---
# restricted-policy.yaml(更严格)
apiVersion: v1
kind: Namespace
metadata:
  name: secure-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.29
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

Pod Security Context

# secure-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true          # 必须以非 root 运行
    runAsUser: 1000             # 指定用户
    runAsGroup: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault     # 使用默认 seccomp
    supplementalGroups:
    - 1000

  containers:
  - name: app
    image: myapp:1.0
    securityContext:
      allowPrivilegeEscalation: false   # 不允许提权
      readOnlyRootFilesystem: true      # 只读根文件系统
      capabilities:
        drop:
        - ALL                     # 移除所有能力
      seccompProfile:
        type: RuntimeDefault

    resources:
      limits:
        memory: "256Mi"
        cpu: "500m"
      requests:
        memory: "128Mi"
        cpu: "100m"

SecurityContext 对比

┌─────────────────────────────────────────────────────────────────┐
│                    Pod vs Container SecurityContext              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Pod SecurityContext(Pod 级别)                                │
│  ├── runAsUser / runAsGroup / fsGroup                         │
│  ├── supplementalGroups                                        │
│  ├── seccompProfile                                            │
│  └── sysctls                                                   │
│                                                                 │
│  Container SecurityContext(容器级别)                          │
│  ├── runAsUser(覆盖 Pod 级别)                                │
│  ├── capabilities                                             │
│  ├── allowPrivilegeEscalation                                  │
│  ├── readOnlyRootFilesystem                                    │
│  └── seccompProfile(覆盖 Pod 级别)                          │
│                                                                 │
│  优先级:Container > Pod                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

镜像安全

镜像扫描

# 使用 Trivy 扫描镜像
brew install aquasecurity/trivy/trivy

# 扫描镜像漏洞
trivy image myapp:1.0

# 扫描 CI/CD 流水线
trivy fs --security-checks vuln,config /path/to/project

# 按严重性过滤
trivy image --severity HIGH,CRITICAL myapp:1.0

# 输出 JSON 格式
trivy image --format json --output report.json myapp:1.0

# 扫描已知漏洞
trivy image --ignore-unfixed myapp:1.0

安全镜像策略

# 安全 Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:1.0
        imagePullPolicy: Always   # 始终拉取最新镜像

        # 使用 ImagePullPolicy + tag 策略
        # 推荐:使用 SHA 而非 tag
        # image: myapp@sha256:abc123...

      # 只允许来自指定仓库的镜像
      imagePullSecrets:
      - name: my-registry-secret

---
# 限制可使用的镜像
apiVersion: v1
kind: ConfigMap
metadata:
  name: allowed-images
  namespace: production
data:
  allowed-repositories.yaml: |
    allowed:
    - myregistry.com/*
    - docker.io/bitnami/*
    - gcr.io/distroless/*

安全上下文示例

# production-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        seccompProfile:
          type: RuntimeDefault

      containers:
      - name: app
        image: myapp:1.0
        imagePullPolicy: Always

        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL

        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

        volumeMounts:
        - name: tmp
          mountPath: /tmp

      volumes:
      - name: tmp
        emptyDir: {}

      # Pod 中禁止特权容器
      hostPID: false
      hostNetwork: false
      hostIPC: false

密钥安全

Secret 管理

# 使用 SOPS 加密 Secret
# .sops.yaml
creation_rules:
  - age: <public-key>
    namespaces:
    - production
    path_regex: secrets/.*

---
# 加密后的 Secret
apiVersion: v1
kind: Secret
metadata:
  name: encrypted-db-credentials
  namespace: production
  annotations:
    sops: "true"
data:
  password: ENC[AESGCM,...]
sops:
  kms: []
  gcp_kms: []
  azure_kms: []
  age:
  - recipient: <public-key>
    enc: |
      ---- BEGIN AGE ENCRYPTED FILE ----
      ...
      ---- END AGE ENCRYPTED FILE ----

外部密钥管理

# External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: vault-backend
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: db-credentials
    creationPolicy: Owner
  data:
  - secretKey: username
    remoteRef:
      key: secret/data/db
      property: username
  - secretKey: password
    remoteRef:
      key: secret/data/db
      property: password

审计日志

审计策略

# audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# 不记录只读请求
- level: None
  users: ["system:kube-proxy"]
  verbs: ["watch"]
  resources:
  - group: ""
    resources: ["endpoints"]

# 记录元数据级别
- level: Metadata
  resources:
  - group: ""
    resources: ["pods", "services"]
  - group: "apps"
    resources: ["deployments"]

# 记录请求体
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
    verbs: ["create", "update", "patch", "delete"]

# 记录所有命名空间级变更
- level: RequestResponse
  namespaces: ["production", "staging"]
  resources:
  - group: "apps"
    resources: ["deployments", "statefulsets"]
    verbs: ["create", "update", "patch", "delete"]

审计配置

# kube-apiserver 配置
# --audit-policy-file=/etc/kubernetes/audit-policy.yaml
# --audit-log-path=/var/log/kubernetes/audit.log
# --audit-log-maxage=30
# --audit-log-maxbackup=10
# --audit-log-maxsize=100

apiVersion: v1
kind: ConfigMap
metadata:
  name: audit-policy
  namespace: kube-system
data:
  audit-policy.yaml: |
    apiVersion: audit.k8s.io/v1
    kind: Policy
    rules:
    - level: RequestResponse
      resources:
      - group: ""
        resources: ["secrets"]
        verbs: ["create", "update", "patch", "delete"]

安全工具集成

OPA Gatekeeper

# 限制容器以非 root 运行
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPAllowPrivilegeEscalationContainer
metadata:
  name: psp-allow-no-privilege-escalation
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    exemptImages:
    - docker.io/library/*
spec:
  enforcementAction: deny

---
# 强制使用只读根文件系统
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPReadOnlyRootFilesystem
metadata:
  name: psp-readonly-root-filesystem
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    exemptImages:
    - docker.io/library/*
spec:
  enforcementAction: deny

Falco 安全监控

# Falco 配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: falco-config
  namespace: falco
data:
  falco.yaml: |
    log_level: info
    program_output:
      enabled: true
      keep_alive: false
      program: "jq '{utc: .time, container: .outputfields.container_image_repository, command: .outputfields.container_image_command}'"

  falco_rules.yaml: |
    - rule: Terminal shell in container
      desc: A shell was spawned in a container
      condition: >
        spawned_process and
        container and
        shell in (proc.name)
      output: >
        Terminal shell in container
        (user=%user.name container=%container.image.repository
        cmd=%proc.cmdline)

    - rule: Privileged container
      desc: A privileged container was created
      condition: >
        container and
        privilege_container
      output: >
        Privileged container created
        (user=%user.name container=%container.image.repository
        pod=%k8s.pod.name)

常见问题与避坑指南

Q1:RBAC 权限不足?

# 排查步骤
# 1. 检查用户身份
kubectl auth whoami

# 2. 检查用户权限
kubectl auth can-i --list --as=user@example.com

# 3. 查看绑定
kubectl get rolebindings -n production
kubectl get clusterrolebindings

# 4. 模拟权限检查
kubectl auth can-i get pods --as=system:serviceaccount:default:sa-name

Q2:NetworkPolicy 不生效?

# 排查步骤
# 1. 检查 CNI 支持
kubectl get cni
# 需要 CNI 支持 NetworkPolicy(Calico/Cilium/Weave)

# 2. 检查 Policy 是否存在
kubectl get networkpolicy -n production

# 3. 检查 Pod 选择器
kubectl describe networkpolicy my-policy

# 4. 检查被选中的 Pod
kubectl get pods -n production -l app=myapp

Q3:镜像漏洞如何处理?

# 1. 扫描镜像
trivy image --severity HIGH,CRITICAL myapp:1.0

# 2. 更新基础镜像
docker pull alpine:3.19
docker build -t myapp:1.1 -f Dockerfile << EOF
FROM alpine:3.19
COPY app /app
RUN apk add --no-cache ca-certificates
CMD ["/app"]
EOF

# 3. 定期扫描 CI
# 在 CI 流水线中添加 Trivy 扫描步骤

Q4:如何审计集群变更?

# 启用审计日志
# kube-apiserver 启动参数
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10

# 使用 Falco 监控敏感操作
- rule: Modify Kubernetes Secrets
  desc: Attempt to modify Kubernetes secrets
  condition: >
    modify and
    container and
    (ka.target.resource == "secrets" or ka.target.resource == "configmaps")

总结

┌─────────────────────────────────────────────────────────────────┐
│                    核心要点回顾                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  安全原则                                                       │
│  ├── 最小权限                                                   │
│  ├── 深度防御                                                   │
│  └── 零信任                                                     │
│                                                                 │
│  RBAC                                                           │
│  ├── Role/RoleBinding:命名空间级别                             │
│  ├── ClusterRole/ClusterRoleBinding:集群级别                  │
│  └── 聚合规则                                                   │
│                                                                 │
│  NetworkPolicy                                                  │
│  ├── 默认拒绝所有流量                                           │
│  ├── 按需开放最小权限                                          │
│  └── 支持命名空间隔离                                          │
│                                                                 │
│  Pod 安全                                                       │
│  ├── PodSecurityStandards(baseline/restricted)               │
│  ├── SecurityContext 配置                                      │
│  └── 非 root 运行,禁止提权                                     │
│                                                                 │
│  镜像安全                                                       │
│  ├── 定期扫描漏洞                                              │
│  ├── 使用最小化基础镜像                                        │
│  └── 使用 SHA 而非 tag                                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

思考题

  1. 如何设计一个最小权限的 RBAC 策略?
  2. 在多租户场景下,如何实现网络隔离?
  3. 如何建立完善的镜像安全扫描流程?

引用与参考

  1. Kubernetes Security
  2. RBAC Documentation
  3. Network Policies
  4. OPA Gatekeeper

下篇预告

下一篇文章我们将探讨 多租户集群管理,包括:

  • 命名空间隔离
  • ResourceQuota 与 LimitRange
  • 集群联邦
  • 成本管理

敬请期待!