Skip to main content

A Semi-Big Update: RustFS, OmniTools, and Upgrading Talos + Kubernetes

Three things that happened on the cluster this weekend: migrating Loki from MinIO to RustFS, swapping Stirling PDF for omni-tools, and upgrading Talos Linux from v1.12.7 to v1.13.4 with Kubernetes 1.35.4 to 1.36.1.

A Semi-Big Update: RustFS, OmniTools, and Upgrading Talos + Kubernetes

Not every post is a single-focus deep dive. Sometimes a cluster sees several changes in the same stretch and it makes sense to write them all up together. This one covers three things: migrating Loki's object storage backend from MinIO to RustFS, replacing Stirling PDF with omni-tools, and upgrading Talos Linux from v1.12.7 to v1.13.4 along with Kubernetes from 1.35.4 to 1.36.1.

Loki: MinIO → RustFS

Loki has been running in monolithic mode here, backed by object storage for chunks, ruler, and admin data. The original deployment used MinIO — the Loki Helm chart ships with an embedded MinIO option, which is convenient for getting started but ends up as a separate process you own. MinIO also made a license shift to AGPL, which complicates things depending on the environment.

RustFS is a drop-in S3-compatible object storage system built in Rust and licensed Apache 2.0. It's designed to be API-compatible with the S3 interface MinIO implements, so swapping it in means changing the endpoint URL and credentials — Loki doesn't know or care that the backend changed. The Rust implementation brings a smaller memory footprint and a cleaner operational story.

The deployment follows the same pattern as everything else here: an Ansible role with all tunables in defaults/main.yml, driven by a top-level playbook.

The RustFS Playbook

---
# Deploy RustFS (S3-compatible object storage)
- name: Deploy RustFS (S3-compatible object storage)
  hosts: localhost
  connection: local
  gather_facts: false

  vars_files:
    - ../group_vars/talos_cluster.yml

  roles:
    - role: vault
    - role: rustfs

Role Defaults

---
rustfs_namespace: rustfs
rustfs_helm_repo_name: rustfs
rustfs_helm_repo_url: https://charts.rustfs.com
rustfs_chart_ref: "rustfs/rustfs"
rustfs_chart_version: "0.0.98"
rustfs_release_name: rustfs

# Storage — Longhorn RWO
rustfs_storage_class: longhorn
rustfs_data_storage_size: 100Gi
rustfs_logs_storage_size: 1Gi

rustfs_log_level: "error"
rustfs_secret_name: rustfs-credentials

rustfs_resources_requests_cpu: "100m"
rustfs_resources_requests_memory: "256Mi"
rustfs_resources_limits_cpu: "500m"
rustfs_resources_limits_memory: "1Gi"

# Buckets to create post-deploy (via mc Job)
rustfs_buckets:
  - chunks
  - ruler
  - admin

# Console and S3 HTTPRoutes
rustfs_ingress_host: "rustfs.example.com"
rustfs_s3_ingress_host: "s3.example.com"

Credentials

Credentials are loaded from a secrets vault at deploy time and written into a Kubernetes Secret that the RustFS pod references via existingSecret:

- name: Create RustFS credentials Secret
  kubernetes.core.k8s:
    state: present
    apply: true
    definition:
      apiVersion: v1
      kind: Secret
      metadata:
        name: "{{ rustfs_secret_name }}"
        namespace: "{{ rustfs_namespace }}"
      type: Opaque
      stringData:
        RUSTFS_ACCESS_KEY: "{{ rustfs_access_key }}"
        RUSTFS_SECRET_KEY: "{{ rustfs_secret_key }}"

Pointing Loki at RustFS

In the Loki role the storage config just points to the in-cluster RustFS service. The bucket names stay identical — RustFS creates them post-deploy via an mc job:

loki:
  storage:
    type: s3
    bucketNames:
      chunks: chunks
      ruler: ruler
      admin: admin
    s3:
      endpoint: http://rustfs-svc.rustfs.svc.cluster.local:9000
      region: us-east-1
      accessKeyId: "{{ rustfs_access_key }}"
      secretAccessKey: "{{ rustfs_secret_key }}"
      s3ForcePathStyle: true
      insecure: true

minio:
  enabled: false

The Migration Gotcha: WAL Cleanup

One thing to know if you're doing this migration: Loki's WAL (write-ahead log) on the PVC holds references to object keys in the old MinIO store. If you redeploy Loki pointing at the new RustFS instance, those old object references don't exist and Loki logs errors replaying them.

The fix is to delete the WAL PVC on the first run after migrating. The Loki role has a loki_fresh_start flag for exactly this:

# Set to true ONLY for the initial MinIO→RustFS migration.
# The Loki WAL references MinIO objects that no longer exist;
# replaying it against RustFS causes errors. Delete it once with
# loki_fresh_start: true, then set back to false to preserve log data.
loki_fresh_start: false

Run the playbook once with loki_fresh_start: true, let it come up clean against RustFS, then set it back to false and redeploy. You lose any buffered logs that hadn't been flushed to object storage yet, but in a homelab context that's an acceptable trade to get a clean state.

OmniTools: Replacing Stirling PDF

OmniTools logo

Stirling PDF was running fine. It does a lot — 50+ PDF operations, OCR via Tesseract, LibreOffice-powered conversions, pipelines, a full REST API. The problem is that it does a lot. Java plus LibreOffice plus Tesseract means a 2–3 GB memory ceiling, a 80 Gi PVC for scratch space and config, and a slow startup on ARM hardware. The free tier is genuinely powerful, but for the actual use case here — the occasional PDF merge, compress, or convert — it was significantly more than needed.

omni-tools covers the other direction: a lightweight, privacy-focused browser-based utility suite. Images, video, audio, PDF basics, text, JSON/CSV/XML, math, dates — all processed client-side in the browser. No data ever leaves the machine. The Docker image is 28 MB. It runs as a static nginx container with no database, no secrets, no persistent volume, no Java startup lag. MIT licensed.

The deployment is about as simple as it gets:

---
# Deploy Omni-Tools
# Usage: ansible-playbook playbooks/24-deploy_omni-tools.yml
#
# What this deploys:
#   - Omni-Tools (iib0011/omni-tools:latest)
#   - Pure static frontend — no database, no secrets required
#   - Gateway API HTTPRoute → https://omnitools.example.com

- name: Deploy Omni-Tools
  hosts: localhost
  connection: local
  gather_facts: false

  vars_files:
    - ../group_vars/talos_cluster.yml

  roles:
    - role: omni-tools

No vault role. No secrets. The role creates a namespace, deploys a single-replica Deployment, a LoadBalancer Service pinned to a kube-vip IP from the pool, and a Gateway API HTTPRoute. That's the entire deployment:

---
omnitools_namespace: omni-tools
omnitools_image: iib0011/omni-tools
omnitools_image_tag: latest
omnitools_replicas: 1
omnitools_port: 80
omnitools_ingress_host: omnitools.example.com
omnitools_loadbalancer_ip: "192.168.100.x"

omnitools_resources_requests_cpu: "50m"
omnitools_resources_requests_memory: "64Mi"
omnitools_resources_limits_cpu: "200m"
omnitools_resources_limits_memory: "128Mi"

From ansible-playbook to a working URL is under 30 seconds. Stirling PDF took minutes to start after the JVM and LibreOffice warmed up. For a tool you reach for occasionally, the difference in weight matters more than the difference in feature count.

The Stirling PDF role is still in the repository in case it's needed again, but it's no longer deployed.

Upgrading Talos Linux and Kubernetes

This is the part of the post where I get to say: upgrading Talos Linux and Kubernetes is genuinely one of the best things about running this stack. A few talosctl commands and the cluster rolls forward — no drama, no special procedures, no in-place OS package management to worry about.

The upgrade here was Talos Linux v1.12.7 → v1.13.4 and Kubernetes 1.35.4 → 1.36.1.

Talos Linux Upgrade: Workers First

The approach is to upgrade worker nodes before control plane nodes. Workers don't hold etcd state, so they can be cycled without any risk to cluster leadership or quorum. Upgrade one at a time, let it come back healthy, then move to the next.

# Upgrade workers first, one at a time
talosctl upgrade -n talostrk5 \
  --image ghcr.io/siderolabs/installer:v1.13.4

talosctl upgrade -n talostrk6 \
  --image ghcr.io/siderolabs/installer:v1.13.4

Each node drains, reboots into the new Talos version, and rejoins. Watch it come back:

kubectl get nodes -w

Once all workers are on v1.13.4, move to the control plane nodes — again one at a time:

# Control plane nodes, one at a time
talosctl upgrade -n talostrk1 \
  --image ghcr.io/siderolabs/installer:v1.13.4

talosctl upgrade -n talostrk2 \
  --image ghcr.io/siderolabs/installer:v1.13.4

talosctl upgrade -n talostrk3 \
  --image ghcr.io/siderolabs/installer:v1.13.4

While a control node is rebooting, kubectl requests go to the other two. etcd maintains quorum throughout as long as you're only cycling one node at a time. The VIP floats automatically via kube-vip so there's no address disruption during the switchover.

Kubernetes Upgrade

Once all nodes are on the new Talos version, upgrading Kubernetes is a single command pointed at any control plane node:

talosctl upgrade-k8s -n talostrk1 --to 1.36.1

upgrade-k8s handles the whole sequence — apiserver, controller-manager, scheduler, kube-proxy, coredns, all components. It updates each control plane component, then rolls through the node bootstrapper configs. The cluster stays available throughout. Total time for a cluster this size is around 10–15 minutes.

Verify when it's done:

kubectl get nodes
# NAME         STATUS   ROLES           AGE    VERSION
# talostrk1    Ready    control-plane   104d   v1.36.1
# talostrk2    Ready    control-plane   104d   v1.36.1
# talostrk3    Ready    control-plane   104d   v1.36.1
# talostrk5    Ready    worker          104d   v1.36.1
# talostrk6    Ready    worker          104d   v1.36.1

That's it. The whole OS and Kubernetes version bump, done cleanly without any node going unresponsive for more than a minute or two. It's a compelling reason to stay on Talos — this kind of upgrade would be significantly more painful on a traditional Linux distribution where you're managing package repos, kernel versions, and service restarts manually.

Wrap Up

Three separate changes, each straightforward on its own:

  • RustFS is a solid drop-in for MinIO — Apache 2.0, leaner footprint, and the Loki migration is essentially just an endpoint swap (with a one-time WAL flush).
  • omni-tools fills the "occasional file utility" role with a fraction of the resource overhead. The client-side processing model means there's nothing sensitive transiting through the cluster at all.
  • Talos + Kubernetes upgrades continue to be the easiest cluster maintenance task in the stack. Workers first, control plane after, upgrade-k8s to finish — the cluster is on new versions before anything has time to break.

All three are deployed and running. On to the next thing.

Get new posts delivered to your inbox