Workload Orchestration Model

Responsibility View

This document defines how applications are deployed and managed across the platform. It answers the question: How are workloads kept running and healthy?

Orchestration Lifecycle

The platform automatically manages the lifecycle of applications, ensuring they are placed on suitable nodes and restarted if they fail.

flowchart TD
  Def[Workload Definition] --> Desired[(Desired State Store)]
  
  subgraph ControlPlane["Control Plane (Decides)"]
    Recon[Reconciler / Controller]
    Sched[Scheduler]
  end

  subgraph DataPlane["Data Plane (Runs)"]
    subgraph Nodes["Nodes"]
      A[Node Agent]
      B[Node Agent]
      C[Node Agent]
    end
    WL[Running Workloads]
  end

  %% Observe
  Nodes --> Obs[Health & Telemetry Signals]
  WL --> Obs

  %% Decide
  Desired --> Recon
  Obs --> Recon
  Recon -->|needs placement| Sched
  Sched -->|bind workload| Nodes

  %% Actuate
  Recon -->|start/stop/restart| Nodes
  Nodes -->|run| WL

Key Capabilities

Automated Scheduling

Workloads are assigned to nodes based on resource availability (CPU/RAM) and affinity rules. This ensures that no single node is overwhelmed while others are idle.

Self-Healing

If a node or a specific workload fails, the scheduler automatically attempts to restart the workload on a healthy node, minimizing downtime.

Resource Governance

Every workload must have defined resource requests and limits. This prevents a single “noisy neighbor” from consuming all cluster resources.

Keyboard shortcuts