Practical AI Orchestration Without Cloud Dependency

Most AI project tutorials end the same way: deploy to AWS, add an OpenAI API key, pay monthly. That works, but it sidesteps the more interesting question — what does it look like to build something real without handing control to a cloud provider?

This article walks through how I built a fully autonomous multi-agent AI platform on a homelab machine. It generates realistic data, runs scheduled AI agents, exposes a live public dashboard, and costs nothing to operate beyond electricity.

The project is a simulated animal shelter network — five shelters, adoptable animals with AI-written bios, live adoption metrics, and a daily narrative summary generated by a local LLM. It’s a portfolio project, but the architecture patterns apply to anything that needs autonomous agents writing to a persistent data layer.

The constraint that shapes everything

No cloud inference. No managed databases. No hosted backends.

The only external services are GitHub Pages (free static hosting) and Cloudflare (free tunnel and DNS). Everything else runs on a single Ubuntu server with an NVIDIA RTX 3060 — the same machine already running OpenClaw, Ollama, and a Slack-connected security pipeline.

This constraint isn’t just about cost. It forces cleaner architecture decisions: agents can’t make expensive API calls, so prompts need to be focused. Latency matters, so you think carefully about which work actually needs an LLM and which is just scripted logic.

Architecture overview

GitHub Pages (dashboard)
        |
        | HTTPS
        v
Cloudflare Tunnel
        |
        v
FastAPI (shelter-api) ←── Agents (shelter-agents / scheduler)
        |                        |
        v                        v
PostgreSQL (shelter-db)     Ollama (shared ai-stack network)

A few things worth noting about this layout:

No ports are exposed on the home network. The Cloudflare Tunnel connects outbound from the homelab to Cloudflare’s edge — no port forwarding, no public IP exposure. Requests to shelter-api.cybergrind.org arrive at Cloudflare, travel through the tunnel, and hit the FastAPI container on the Docker internal network.

Ollama runs on a shared Docker network. The shelter agents live in an isolated docker-compose.yml separate from the main AI stack. Rather than duplicating the Ollama container, both stacks connect to a named external network (ai-shared). Agents reach Ollama at http://ollama:11434 via Docker’s internal DNS.

The dashboard is fully static. A single index.html on GitHub Pages makes fetch calls to the public API. No build step, no Node.js, no deployment pipeline beyond git push.

The agent design

Three agents, each with a distinct responsibility:

Generator agent

Runs once to seed the database. Creates five shelter profiles and 40 animals — realistic breeds, ages, health statuses, and intake dates. For each animal it calls Ollama (Mistral 7B) to write a warm adoption bio:

def generate_description(name, species, breed, age, sex, health_status):
    prompt = (
        f"Write a warm, 2-3 sentence adoption profile for a shelter {species} named {name}. "
        f"They are a {age:.1f} year old {sex} {breed} in {health_status} condition. "
        f"Make it friendly and encouraging to potential adopters. No bullet points, just prose."
    )

The key design decision here: descriptions are generated at seed time and stored in the database. The dashboard displays them as static text. You don’t want an LLM call on every page load — that would be slow, expensive in inference time, and unnecessary. Generate once, store, serve.

Updater agent

Runs every six hours via a scheduler container. Simulates realistic shelter activity:

Processes adoptions: roughly 10% of available animals get adopted per run
Adds new intakes: 1-3 new animals arrive, each getting an Ollama-generated bio
Logs all activity to an events table

This is where the platform feels alive. Between runs, animals change status, capacity utilization shifts, and the event feed fills up. The dashboard auto-refreshes every 60 seconds, so visitors see a system in motion rather than static data.

Narrator agent

Runs once daily at 08:00 UTC. Pulls current metrics — total animals, adoption rate, species breakdown, long-stay cases, best-performing shelter — and sends a structured prompt to Mistral:

context = f"""
You are a shelter network analyst writing a brief daily update for a public dashboard.
Keep it to 2-3 sentences. Be warm, factual, and encouraging. No bullet points.

Current data:
- Total animals: {metrics.get('total_animals', 0)}
- Available for adoption: {metrics.get('available', 0)} ({dogs_available} dogs, {cats_available} cats)
- Adopted total: {metrics.get('adopted', 0)}
- Network adoption rate: {metrics.get('adoption_rate', 0)}%
- Animals in shelter 30+ days: {long_stay_count}
- Most urgent case: {f"{urgent_name} has been waiting {urgent_days} days" if urgent_name else "none"}
- Best performing shelter: {top_shelter}
"""

The result gets POST’d to /metrics/summary and cached as JSON. The dashboard displays it as a highlighted insight card — labeled “Generated by CyberClaw” — above the charts.

This is a good example of where an LLM actually earns its place. The metrics are already computed. The LLM isn’t doing analysis — it’s doing communication, turning a structured data object into a sentence a human would actually want to read.

Scheduling without system cron

Rather than adding cron jobs to the host system, the scheduler runs as its own Docker container:

scheduler = BlockingScheduler()
scheduler.add_job(job_update, IntervalTrigger(hours=6), id='update')
scheduler.add_job(job_narrate, CronTrigger(hour=8, minute=0), id='narrate')
scheduler.start()

The container runs python scheduler.py as its command and has restart: unless-stopped in the compose file. If the machine reboots, Docker brings it back. Logs are visible via docker logs shelter-scheduler. No system-level configuration needed.

The data model

Three tables: shelters, animals, events. Metrics are derived at query time rather than stored — adoption rate, capacity utilization, and average stay duration are all SQL aggregations against the live data.

Shelter { id, name, city, state, capacity, current_count }
Animal  { id, shelter_id, name, species, breed, age_years, sex,
          health_status, status, intake_date, adopted_date, description }
Event   { id, animal_id, shelter_id, event_type, timestamp, notes }

The status field on Animal (available, pending, adopted) is the core state machine. When the updater agent marks an animal as adopted, the API handler automatically decrements current_count on the parent shelter and sets adopted_date. No agent logic needed — the API enforces consistency.

What the dashboard shows

The public dashboard at shelter.cybergrind.org pulls from eight API endpoints on every load:

AI insight card — Mistral’s daily narrative, timestamped
Alert bar — fires when a shelter hits 80%+ capacity or any animal has been waiting 30+ days
Three charts — intake vs adoptions over 30 days (bar), species breakdown (donut), adoption rate by shelter (horizontal bar)
Newest arrivals — the three most recent intakes with full bios
Shelter list — clickable, filters the animal grid below
Animal grid — filterable by status, shows days in shelter with color-coded urgency
Activity timeline — recent events feed plus per-shelter intake/adoption breakdown

The charts use Chart.js loaded from CDN. Everything else is vanilla JavaScript. The entire dashboard is one HTML file with no build dependencies.

Lessons from building this

Define your data model before writing any agent code. The schema took 15 minutes to design and saved hours of refactoring. Agents that write to a well-defined schema are easy to reason about. Agents that make ad-hoc decisions about data structure create debt immediately.

Not every task needs an LLM. Status updates, capacity calculations, and event logging are all scripted logic. Ollama only runs when the task genuinely requires language generation — writing adoption bios and the daily narrative. This keeps inference load low and latency predictable.

Separate the agent stack from the infrastructure stack. Running the shelter agents in their own docker-compose.yml, isolated from the main AI stack, meant I could rebuild and restart agents without touching Ollama, OpenClaw, or the Slack bot. The shared ai-shared Docker network gives them access to Ollama without coupling the stacks.

The Cloudflare Tunnel changes what’s possible on a homelab. No port forwarding, no dynamic DNS headaches, automatic SSL, IP hidden behind Cloudflare’s edge. For homelab projects that need a public URL, it’s the right default choice.

Running it yourself

The full source is at github.com/thestrad031487/shelter-platform. Prerequisites:

Docker + Docker Compose
Ollama running and accessible
A shared Docker network: docker network create ai-shared
A Cloudflare account with a domain (for the tunnel)

git clone https://github.com/thestrad031487/shelter-platform
cd shelter-platform
cp .env.example .env  # fill in your values
docker compose up -d
docker compose run --rm shelter-agents python main.py generate

The scheduler starts automatically and takes it from there.

What’s next

A few things on the roadmap:

Wire in real threat intel data — the same Ollama pipeline powering the narrator could pull from MISP or OpenCTI to generate shelter-style summaries of threat intelligence reports
Add a @CyberClaw run shelter update Slack command — the pattern is already established in the security pipeline, it’s just a matter of adding the trigger
Debate mode — two narrator agents with different analytical perspectives producing competing summaries, surfaced side by side on the dashboard

The core insight this project reinforced: the hard part of multi-agent AI isn’t the LLM. It’s the data model, the API design, and the scheduling. Get those right and the agents are almost trivial to add.

The live dashboard is at shelter.cybergrind.org. Source at github.com/thestrad031487/shelter-platform.

Slack integration

The shelter data is also accessible directly from Slack via CyberClaw, the same bot already handling security pipeline commands. A shelter query handler was added to the bot that detects animal-related keywords and routes them to the shelter API instead of Ollama: @CyberClaw dogs available @CyberClaw cats available Austin @CyberClaw daily insight @CyberClaw shelter summary The bot returns formatted results directly in the channel — animal names, breeds, ages, days in shelter, and truncated bios. Open-ended queries fall through to the LLM as usual.

The key architectural decision here was using the public Cloudflare URL (https://shelter-api.cybergrind.org) rather than the internal container name. The Slack bot lives in a different Docker Compose stack than the shelter API, so internal DNS doesn’t resolve across stacks. The public URL routes through Cloudflare back to the homelab — a small round trip, but it keeps the stacks cleanly isolated.

This means CyberClaw now serves two domains from a single bot: security analysis and shelter intelligence. Adding a third would follow the same pattern — detect keywords, route to the right data source, format and return.

The constraint that shapes everything#

Architecture overview#

The agent design#

Generator agent#

Updater agent#

Narrator agent#

Scheduling without system cron#

The data model#

What the dashboard shows#

Lessons from building this#

Running it yourself#

What’s next#

Slack integration#