Pod：Kubernetes 的最小调度单元

概述

Pod 是 Kubernetes 中的最小调度单元，理解 Pod 的生命周期和运行机制是掌握 K8s 的关键。本文将深入探讨：

学习目标：

理解 Pod 的本质与设计哲学
掌握 Pod 生命周期状态流转
熟练配置探针实现健康检查
理解 Init Container 与 Sidecar 模式
掌握资源限制与 QoS 策略

Pod 本质：共享命名空间的容器组

为什么是 Pod 而不是容器？

┌─────────────────────────────────────────────────────────────────┐
│                    Pod 设计哲学                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   "Pod is the smallest deployable unit in K8s,                 │
│    not because containers can't be smaller,                     │
│    but because containers that need to share                    │
│    network and storage should be deployed together."           │
│                                                                 │
│   设计理念：                                                    │
│   ✓ 紧密耦合的服务共享同一个网络命名空间                        │
│   ✓ 容器之间可以通过 localhost 互相访问                        │
│   ✓ 共享存储卷实现数据交换                                      │
│   ✓ 原子性调度（同一节点）                                      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pod 与容器的关系

┌────────────────────────────────────────────────────────────────┐
│                         Pod                                    │
│                                                                │
│   ┌────────────────────────────────────────────────────────┐   │
│   │                  Pause 容器 (基础设施)                  │   │
│   │         保留网络和存储命名空间，生命周期最长             │   │
│   └────────────────────────────────────────────────────────┘   │
│                           ↑                                    │
│         ┌─────────────────┴─────────────────┐                 │
│         │           共享命名空间             │                 │
│         │  ┌──────────┐    ┌──────────┐    │                 │
│         │  │Network NS│◀───│Mount NS  │    │                 │
│         │  └──────────┘    └──────────┘    │                 │
│         │  ┌──────────┐    ┌──────────┐    │                 │
│         │  │   PID NS │    │   IPC NS  │    │                 │
│         │  └──────────┘    └──────────┘    │                 │
│         └──────────────────────────────────┘                 │
│                           │                                    │
│   ┌───────────────────────┼───────────────────────────┐      │
│   │                       │                           │      │
│   ▼                       ▼                           ▼      │
│ ┌──────────┐        ┌──────────┐               ┌──────────┐  │
│ │Container A│        │Container B│               │Container C│  │
│ │ (主应用)  │        │ (Sidecar) │               │ (监控)   │  │
│ │ nginx     │        │ log-agent │               │ exporter │  │
│ └──────────┘        └──────────┘               └──────────┘  │
│                                                                │
└────────────────────────────────────────────────────────────────┘

网络共享机制

# Pod 内的容器共享同一网络栈：
# - 共享 IP：所有容器使用同一个 Pod IP
# - 端口冲突：同一 Pod 内容器不能监听同一端口
# - localhost 通信：容器间可以直接通过 localhost:port 访问
# - 服务发现：无需额外配置，同 Pod 容器天然互联

# 示例：同一个 Pod 内的容器通信
# Container A (nginx:80) -> localhost:8080 -> Container B (app:8080)
apiVersion: v1
kind: Pod
metadata:
  name: web-with-logger
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
  - name: logger
    image: busybox
    command: ["sh", "-c", "tail -f /var/log/nginx/access.log"]
    # logger 可以直接访问 localhost:80 来调用 nginx

Pod 生命周期

状态流转

┌─────────────────────────────────────────────────────────────────┐
│                     Pod 生命周期                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌────────┐    创建     ┌────────┐    调度成功  ┌────────┐   │
│   │        │ ──────────▶ │        │ ───────────▶ │        │   │
│   │ Pending │             │ 初始化  │              │ Running │   │
│   │        │             │Container│             │        │   │
│   └────────┘             └────────┘              └────┬───┘   │
│                                                        │       │
│                                                        │ 成功   │
│                                                        ▼       │
│                                                  ┌─────────┐   │
│                                                  │         │   │
│                                                  │Succeeded│   │
│                                                  │  完成   │   │
│                                                  └─────────┘   │
│                                                        │       │
│                                                        │ 失败   │
│                                                        ▼       │
│                                                  ┌─────────┐   │
│                                                  │         │   │
│                                                  │ Failed  │   │
│                                                  │  失败   │   │
│                                                  └─────────┘   │
│   ┌────────────────────────────────────────────────────────┐   │
│   │                    特殊状态                              │   │
│   ├────────────────────────────────────────────────────────┤   │
│   │  Waiting    - 等待资源或镜像拉取                        │   │
│   │  Terminating- 正在删除                                  │   │
│   │  CrashLoopBackOff-容器反复崩溃                          │   │
│   │  ImagePullBackOff-镜像拉取失败                          │   │
│   └────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

阶段（Phase）

Phase	含义	说明
`Pending`	等待中	Pod 已被 K8s 系统接收，等待调度或初始化
`Running`	运行中	Pod 已绑定到节点，容器正在运行
`Succeeded`	成功完成	Pod 中的所有容器正常终止（退出码为0）
`Failed`	失败	Pod 中的容器异常终止（退出码非0）
`Unknown`	未知	无法获取 Pod 状态

容器状态

# 容器级别状态（ContainerStatus）
# 每个容器都有自己的状态：

type ContainerState struct {
    Waiting    *ContainerStateWaiting    # 等待中
    Running    *ContainerStateRunning     # 运行中
    Terminated *ContainerStateTerminated # 已终止
}

# 常见容器退出码含义：
# 0     - 正常退出
# 1     - 应用错误退出
# 137   - 被 SIGKILL 杀死（OOMKilled 或手动删除）
# 143   - 被 SIGTERM 优雅终止
# 255   - 外部错误（如端口冲突）

健康检查：探针配置

三种探针类型

┌─────────────────────────────────────────────────────────────────┐
│                      探针类型对比                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐      │
│  │  livenessProbe │  │readinessProbe │  │startupProbe   │      │
│  ├────────────────┤  ├────────────────┤  ├────────────────┤      │
│  │   存活探针      │  │   就绪探针      │  │   启动探针      │      │
│  │               │  │               │  │               │      │
│  │ 判断容器是否   │  │ 判断容器是否   │  │ 判断容器是否   │      │
│  │ 存活，失败会   │  │ 已准备好接收   │  │ 已启动完成，   │      │
│  │ 重启容器       │  │ 流量，失败会   │  │ 启动探针通过   │      │
│  │               │  │ 从 Service     │  │ 前禁止其他    │      │
│  │               │  │ 移除          │  │ 探针生效      │      │
│  └────────────────┘  └────────────────┘  └────────────────┘      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

探针检测方式

# 三种检测方式：

# 1. exec：执行命令
livenessProbe:
  exec:
    command: ["cat", "/tmp/healthy"]

# 2. httpGet：HTTP GET 请求
readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
    httpHeaders:
    - name: X-Custom-Header
      value: "Awesome"

# 3. tcpSocket：TCP 端口检测
livenessProbe:
  tcpSocket:
    port: 3306

探针参数配置

# 完整探针配置示例
livenessProbe:
  initialDelaySeconds: 30    # 容器启动后等待 30 秒再开始探针
  periodSeconds: 10          # 每 10 秒执行一次探针
  timeoutSeconds: 5          # 探针超时时间 5 秒
  successThreshold: 1         # 成功阈值（连续成功次数）
  failureThreshold: 3          # 失败阈值（连续失败次数后重启）

# 最佳实践建议：
# - initialDelaySeconds: 应大于应用启动时间
# - periodSeconds: 不宜过短（增加负载），不宜过长（响应慢）
# - failureThreshold: 根据业务容忍度调整

实战：Spring Boot 应用探针配置

# spring-boot-kubernetes.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spring-boot-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: spring-boot
  template:
    metadata:
      labels:
        app: spring-boot
    spec:
      containers:
      - name: app
        image: myregistry/spring-boot:1.0
        ports:
        - containerPort: 8080
        # 启动探针：等待应用完全启动
        startupProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          failureThreshold: 30      # 30 * 10s = 5分钟
          periodSeconds: 10
        # 存活探针：检测应用是否存活
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 10
          failureThreshold: 3
        # 就绪探针：检测是否可以接收流量
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 5
          failureThreshold: 3

探针状态与 Service 关系

┌─────────────────────────────────────────────────────────────────┐
│              探针状态对 Service 流量的影响                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│          Pod 创建                                                │
│             │                                                   │
│             ▼                                                   │
│     ┌────────────────┐                                         │
│     │ startupProbe   │                                          │
│     │   检测中...     │ ◀── 探针未通过时：                       │
│     └────────────────┘    - liveness 探针不生效                 │
│             │              - readiness 探针不生效                │
│      探针通过                                                    │
│             │                                                   │
│             ▼                                                   │
│     ┌────────────────┐                                         │
│     │ livenessProbe  │                                          │
│     │   检测中...     │ ◀── 失败 3 次后：重启容器               │
│     └────────────────┘                                         │
│             │                                                   │
│             ▼                                                   │
│     ┌────────────────┐                                         │
│     │readinessProbe  │                                          │
│     │   检测中...     │ ◀── 失败 3 次后：                       │
│     └────────────────┘    从 Service 端点移除                    │
│             │                                                   │
│             ▼                                                   │
│     ┌────────────────┐                                         │
│     │   正常接收流量  │                                         │
│     └────────────────┘                                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Init Container

什么是 Init Container？

┌─────────────────────────────────────────────────────────────────┐
│                   Init Container 工作原理                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                        Pod                               │   │
│  │                                                         │   │
│  │   ┌───────────────────────┐                             │   │
│  │   │     Init Container    │  ← 先于主容器执行           │   │
│  │   │   (配置/准备/初始化)   │    依次执行，全部成功才能   │   │
│  │   │                       │    启动主容器               │   │
│  │   └───────────────────────┘                             │   │
│  │             │                                            │   │
│  │             ▼ (全部成功后)                               │   │
│  │   ┌───────────────────────┐                             │   │
│  │   │     Main Container(s) │  ← 实际应用容器              │   │
│  │   │      (主应用)         │    并行运行                  │   │
│  │   └───────────────────────┘                             │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  特点：                                                         │
│  ✓ 顺序执行：一个完成后才执行下一个                             │
│  ✓ 失败会导致 Pod 重启（取决于 restartPolicy）                  │
│  ✓ 可以使用与应用容器不同的镜像                                  │
│  ✓ 适合初始化逻辑                                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

实战场景：应用启动前准备

# init-container-example.yaml
apiVersion: v1
kind: Pod
metadata:
  name: app-with-init
spec:
  initContainers:
  # 场景1：等待数据库就绪
  - name: wait-for-db
    image: busybox:1.36
    command:
    - sh
    - -c
    - |
      echo "Waiting for database to be ready..."
      until nc -z db-service 5432; do
        echo "Database not ready, waiting..."
        sleep 5
      done
      echo "Database is ready!"

  # 场景2：拉取配置
  - name: fetch-config
    image: curlimages/curl:latest
    command:
    - sh
    - -c
    - |
      curl -o /app/config/application.yaml http://config-server:8080/config
    volumeMounts:
    - name: config-volume
      mountPath: /app/config

  # 场景3：数据库迁移
  - name: db-migration
    image: myregistry/migrate:1.0
    command:
    - /app/migrate.sh
    env:
    - name: DB_HOST
      value: "db-service"
    - name: DB_PORT
      value: "5432"
    volumeMounts:
    - name: config-volume
      mountPath: /app/config

  containers:
  - name: app
    image: myregistry/myapp:1.0
    ports:
    - containerPort: 8080
    volumeMounts:
    - name: config-volume
      mountPath: /app/config
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5

  volumes:
  - name: config-volume
    emptyDir: {}

多 Init Container 执行流程

# 多个 Init Container 按顺序执行
spec:
  initContainers:
  - name: init-a  # 执行
  - name: init-b  # 等待 init-a 完成后执行
  - name: init-c  # 等待 init-b 完成后执行
  containers:
  - name: main    # 等所有 Init Container 完成才执行

# 任何 Init Container 失败：
# - restartPolicy: Always -> Pod 重启
# - restartPolicy: OnFailure -> 本 Pod 重试
# - restartPolicy: Never -> Pod 进入 Failed 状态

Sidecar 模式

Sidecar 定义

┌─────────────────────────────────────────────────────────────────┐
│                      Sidecar 模式                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Sidecar 是与主容器部署在同一 Pod 中的辅助容器，               │
│   用于扩展主容器的功能，典型用途包括：                           │
│                                                                 │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│   │   日志收集    │  │   代理转发    │  │   监控指标    │       │
│   │              │  │              │  │              │       │
│   │  filebeat    │  │   envoy      │  │  prometheus  │       │
│   │  fluentd     │  │   nginx      │  │  datadog     │       │
│   │  logstash    │  │   envoy      │  │  newrelic    │       │
│   └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                                 │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│   │   缓存代理    │  │   初始化      │  │   同步数据    │       │
│   │              │  │              │  │              │       │
│   │   redis      │  │  envconfig   │  │  xtrabackup  │       │
│   │   varnish    │  │  consul      │  │  rsync       │       │
│   └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

实战：日志收集 Sidecar

# sidecar-logging.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-with-logging
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      # 主容器：应用本身
      - name: webapp
        image: myregistry/webapp:2.0
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
        - name: nginx-logs
          mountPath: /var/log/nginx

      # Sidecar：日志收集器
      - name: log-shipper
        image: fluent/fluentd:v1.16
        env:
        - name: ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc"
        - name: ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: app-logs
          mountPath: /var/log/app
        - name: nginx-logs
          mountPath: /var/log/nginx
        - name: fluentd-config
          mountPath: /etc/fluent/conf.d

      volumes:
      # EmptyDir 用于 Pod 内容器共享日志
      - name: app-logs
        emptyDir: {}
      - name: nginx-logs
        emptyDir: {}
      - name: fluentd-config
        configMap:
          name: fluentd-config

实战：Nginx 代理 Sidecar

# sidecar-proxy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-proxy
spec:
  template:
    spec:
      containers:
      # 主应用（不直接暴露端口）
      - name: app
        image: myregistry/app:1.0
        env:
        - name: APP_PORT
          value: "8080"

      # Nginx 反向代理 Sidecar
      - name: nginx-proxy
        image: nginx:1.25-alpine
        ports:
        - containerPort: 80
          name: http
        - containerPort: 443
          name: https
        volumeMounts:
        - name: nginx-conf
          mountPath: /etc/nginx/conf.d
          readOnly: true
        - name: ssl-certs
          mountPath: /etc/nginx/ssl
          readOnly: true

      volumes:
      - name: nginx-conf
        configMap:
          name: nginx-config
          items:
          - key: default.conf
            path: default.conf
      - name: ssl-certs
        secret:
          secretName: app-tls-secret

Sidecar vs Init Container 对比

┌─────────────────────────────────────────────────────────────────┐
│              Sidecar vs Init Container                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────┐          ┌─────────────────┐              │
│  │   Init Container │          │    Sidecar      │              │
│  ├─────────────────┤          ├─────────────────┤              │
│  │ 执行时机：Pod   │          │ 执行时机：与主   │              │
│  │     创建时      │          │ 容器同期运行     │              │
│  ├─────────────────┤          ├─────────────────┤              │
│  │ 执行顺序：依次  │          │ 执行顺序：并行   │              │
│  │                 │          │                 │              │
│  ├─────────────────┤          ├─────────────────┤              │
│  │ 失败影响：Pod   │          │ 失败影响：只影响 │              │
│  │     可能重启    │          │ 自身，主容器继续 │              │
│  ├─────────────────┤          ├─────────────────┤              │
│  │ 典型用途：      │          │ 典型用途：       │              │
│  │ - 等待依赖      │          │ - 日志收集       │              │
│  │ - 准备资源      │          │ - 代理转发       │              │
│  │ - 配置加载      │          │ - 监控           │              │
│  │ - 数据迁移      │          │ - 缓存           │              │
│  └─────────────────┘          └─────────────────┘              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

资源限制与 QoS

资源类型

# Kubernetes 资源模型

# 1. 计算资源（Compute Resources）
#    - cpu: 计算资源，1 core = 1000m
#    - memory: 内存资源，1Gi = 1Gi

# 2. 存储资源（Storage Resources）
#    - ephemeral-storage: 临时存储
#    - persistent volumes: 持久化存储

# 3. 扩展资源（Extended Resources）
#    - nvidia.com/gpu: GPU 资源
#    - example.com/fpga: 自定义资源

资源请求与限制

# requests vs limits 的关系

┌─────────────────────────────────────────────────────────────────┐
│                     资源分配模型                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   requests（请求）                   limits（限制）              │
│       │                                │                       │
│       ▼                                ▼                       │
│   ┌────────┐                     ┌────────┐                   │
│   │Guaranteed│                    │Burstable│                  │
│   │  保障    │                     │  突发   │                  │
│   └────────┘                     └────────┘                    │
│       │                                │                       │
│       │                                ▼                       │
│       │                           ┌────────┐                   │
│       │                           │ Best   │                   │
│       │                           │Effort  │                   │
│       │                           └────────┘                   │
│       │                                                     │
│       └─────────────── 可压缩资源 ◀─────────────────────────┘ │
│                        (CPU/内存可压缩)                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

# requests: 调度依据，保证容器最低资源
# limits: 硬限制，超出会被限制或杀死

QoS 等级

# Kubernetes 根据 requests/limits 分配 QoS 等级

# 1. Guaranteed（保证级）
#    条件：所有容器的 requests == limits
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"
        cpu: "500m"

# 2. Burstable（突发级）
#    条件：requests < limits（有弹性空间）
spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "1000m"

# 3. BestEffort（尽力而为）
#    条件：未设置 requests 和 limits
spec:
  containers:
  - name: app
    # 没有设置 resources

QoS 与调度优先级

┌─────────────────────────────────────────────────────────────────┐
│                   QoS 驱逐优先级                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  高优先级 ◀───────────────────────────────▶ 低优先级            │
│                                                                 │
│  BestEffort  →  Burstable  →  Guaranteed                        │
│                                                                 │
│  资源不足时，K8s 先驱逐 BestEffort Pod                          │
│  驱逐顺序：                                                    │
│  1. BestEffort（资源最紧张）                                    │
│  2. Burstable（超用资源时）                                     │
│  3. Guaranteed（最后才驱逐）                                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

资源限制实战配置

# production-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-api
spec:
  template:
    spec:
      containers:
      - name: api
        image: myregistry/api:2.0
        resources:
          # 正常运行时需要的资源
          requests:
            cpu: "500m"
            memory: "512Mi"
            ephemeral-storage: "1Gi"
          # 最大允许使用的资源
          limits:
            cpu: "2000m"       # 2 核
            memory: "2Gi"
            ephemeral-storage: "5Gi"
        # 高级配置
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "2000m"
            memory: "2Gi"

LimitRange 强制资源约束

# namespace-defaults.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
spec:
  limits:
  # 容器级别限制
  - type: Container
    default:
      cpu: "200m"
      memory: "256Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    # 限制 requests 和 limits 的比例
    maxLimitRequestRatio:
      cpu: 4
      memory: 4

  # Pod 级别限制
  - type: Pod
    max:
      cpu: "8"
      memory: "16Gi"

  # PVC 级别限制
  - type: PersistentVolumeClaim
    min:
      storage: "1Gi"
    max:
      storage: "100Gi"

Pod 安全上下文

# security-context.yaml
apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 2000
    seLinuxOptions:
      level: "s0:c123,c456"
  containers:
  - name: app
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
    # 或者在容器级别覆盖
    securityContext:
      runAsUser: 2000

常见问题与避坑指南

Q1：Pod 一直处于 Pending 状态？

# 排查步骤
kubectl describe pod <name> | grep -A 10 "Events:"

# 常见原因：
# 1. 资源不足
kubectl describe node
# 检查 CPU/Memory 是否耗尽

# 2. 亲和性/反亲和性不满足
kubectl get pods -o wide

# 3. 污点不容忍
kubectl describe node | grep Taints

# 4. 镜像拉取问题
kubectl describe pod <name> | grep "ImagePull"

Q2：容器不断重启（CrashLoopBackOff）？

# 查看容器退出日志
kubectl logs <pod-name> --previous

# 常见原因：
# 1. 应用启动失败
# 2. 健康检查配置不当
# 3. 资源限制过严（OOMKilled）
kubectl describe pod <name> | grep -A 5 "Last State"

# 4. 配置错误
# 5. 依赖服务不可用

Q3：如何设置 Pod 的终止宽限期？

# 确保优雅关闭
spec:
  terminationGracePeriodSeconds: 30  # 默认 30 秒

# 在应用中添加信号处理
# SIGTERM -> 优雅关闭
# SIGKILL -> 强制终止（terminationGracePeriodSeconds 后）

# 应用中应处理：
# 1. 收到 SIGTERM 后停止接收新请求
# 2. 完成当前请求
# 3. 关闭数据库连接
# 4. 刷新缓冲数据

Q4：Init Container 失败导致 Pod 无法启动？

# 检查 Init Container 状态
kubectl describe pod <name> | grep -A 20 "Init Containers:"

# 查看日志
kubectl logs <pod-name> -c <init-container-name>

# 解决方案：
# 1. 调整 Init Container 逻辑
# 2. 增加重试机制
# 3. 检查依赖服务
# 4. 调整 restartPolicy

总结

┌─────────────────────────────────────────────────────────────────┐
│                    核心要点回顾                                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Pod 设计理念                                                   │
│  ├── 共享网络和存储命名空间                                      │
│  ├── 最小调度单元                                               │
│  └── 多个容器天然协作                                           │
│                                                                 │
│  生命周期                                                       │
│  ├── Pending → Running → Succeeded/Failed                      │
│  ├── 容器状态：Waiting/Running/Terminated                      │
│  └── 退出码含义：137(OOM)、143(SIGTERM)                        │
│                                                                 │
│  探针配置                                                       │
│  ├── livenessProbe：存活检查，失败重启                          │
│  ├── readinessProbe：就绪检查，失败摘流量                      │
│  ├── startupProbe：启动保护                                    │
│  └── exec/httpGet/tcpSocket 三种方式                           │
│                                                                 │
│  Init Container                                                 │
│  ├── 先于主容器执行                                             │
│  ├── 顺序执行，全成功才启动主容器                                │
│  └── 适合初始化准备                                             │
│                                                                 │
│  Sidecar 模式                                                   │
│  ├── 与主容器同期运行                                           │
│  ├── 扩展主容器功能                                             │
│  └── 日志/代理/监控等场景                                       │
│                                                                 │
│  资源与 QoS                                                     │
│  ├── requests：调度依据                                        │
│  ├── limits：硬限制                                            │
│  └── Guaranteed > Burstable > BestEffort                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

思考题

为什么 K8s 用 Pause 容器作为基础设施容器，而不是直接管理网络命名空间？
如果一个 Pod 有多个 Sidecar，其中一个失败会影响主应用吗？为什么？
如何设计一个健壮的探针配置，既能及时发现问题，又不会误判？

引用与参考

下篇预告

下一篇文章我们将探讨 ReplicaSet 与 Deployment，包括：

ReplicaSet 的作用与 Selector 机制
Deployment 的滚动更新策略
回滚与历史管理
多环境部署配置

敬请期待！