Memory tiers · reboot myth · Pi / VPS / M4 matrix · 24-month TCO · six-step runbook
Hermes Agent gets smarter through three memory tiers that compound on disk: USER.md at roughly 1,375 characters for who you are, MEMORY.md capped near 2,200 characters per Skill entry, and SQLite FTS5 for full-text retrieval—Skills write only after 5+ tool calls in a finished task. This guide explains why that architecture demands 24/7 uptime, busts the reboot myth, benchmarks Pi vs VPS vs Mac Mini M4, and closes with a 24-month TCO frame plus a six-step runbook. Restarting does not erase persisted memory; a sleeping Gateway still breaks channels and the Skill polish loop.
Many people treat Nous Research Hermes Agent as a chat shell with tools. The persistence layer runs deeper. Tier one is in-session context—tool state and the current reasoning trace live in RAM and vanish on restart. Tier two is Skill Documents: markdown playbooks auto-generated after complex tasks, deduplicated and scanned before landing in the data directory. Community docs put each Skill near a 2,200-character ceiling—enough for a checklist, not a novella. Tier three is the persistent user model: USER.md with about a 1,375-character budget for tone, preferences, and long-range goals that deepen over weeks.
For retrieval, Hermes builds a local SQLite FTS5 index over Skills and memory entries. Before injecting context, the agent queries that index—cheaper on tokens than dumping the whole library into every prompt, and a reason disk IO and index health matter as much as raw compute. Skill synthesis has a hard gate: a task must hit at least five tool calls before extraction runs. Casual chat will not pollute the library, but long tool chains need a host that stays awake through the runway—not a laptop that suspends mid-task.
| Memory component | Typical size / mechanism | Survives reboot? | Why 24/7 matters |
|---|---|---|---|
| Session context | Current turn and tool state | No—must reconnect | Gateway must stay live; IM webhook timeouts break chains |
| USER.md | ~1,375-character user profile | Yes—on disk | Host migration needs data-dir copy; sleep slows profile iteration |
| MEMORY.md / Skills | ~2,200 characters per entry | Yes—on disk | FTS5 index grows with writes; backups are non-optional |
| SQLite FTS5 | Local full-text search index | Yes—database file | Disk jitter or VPS IO caps add retrieval latency |
So reboot ≠ memory wipe—but only for layers already flushed to disk. Channel UX, cron schedules, and in-flight 5+ tool chains still break. For a 30-day subjective narrative and first-week pitfalls, see the companion piece on 30 days running Hermes; this article stays on architecture and resource math.
Assuming restart clears everything: Skills and USER.md live in the data directory—you lose session rhythm, not all assets. Without backups, a host swap still feels like amnesia.
Ignoring the 5+ tool-call gate: Short chats never become Skills; a host that sleeps mid-task never finishes the extraction loop.
Treating FTS5 as a black box: Corrupt indexes or full disks yield “I wrote it but cannot find it”—monitor data-dir size and SQLite health.
Letting USER.md bloat: The 1,375-character budget is finite; unpruned profiles dilute preference weights.
Splitting Gateway from the model host: A dead Gateway with a live cloud backend still drops IM callbacks—24/7 means the whole chain stays up.
Hermes is built as an always-on agent: Telegram, Discord, Slack, and 20+ other channels reach the Gateway via webhooks; cron wakes subtasks on schedule; mechanisms like Honcho slowly refresh the user model in the background. When any link drops, the failure mode is not “a bit slower”—it is missed callbacks, queue backlog, and delayed Skill writes. Subjectively you get a new assistant every week even while Skill files keep growing on disk.
24/7 is not ops theater—it matches the time axis of three memory tiers. Session layer wants millisecond responses; Skill layer needs long tasks to finish five or more tool calls; user model layer compounds across weeks. A closed laptop, an intermittently offline NAS, or a VPS throttled by neighbor IO each cuts a different slice—and the compounding curve flattens. A dedicated host turns process survival, stable networking, and predictable disks into an SLA instead of hoping someone remembers to plug the machine back in.
Memory compounds on disk, but the feeling of getting smarter comes from a Gateway that never misses a shift—that is the gap between 24/7 and “I run it when I remember.”
The same curl -fsSL https://get.hermes-agent.org | bash installer behaves differently by host—memory bandwidth, disk IO, and the macOS-native path dominate. The table below is a qualitative benchmark band for one workload (Gateway + Telegram + local Ollama Hermes-3 8B with intermittent inference). Exact numbers shift with quantization and channel count; use this for review-meeting decisions, not lab certification.
| Host option | Idle RAM | Peak RAM | CPU / power | Hermes fit |
|---|---|---|---|---|
| Raspberry Pi 5 · 8GB | ≈1.5GB system headroom | Gateway alone ≈4GB; local 8B model not viable | Low-power ARM; SD-card IO bottleneck | API-only gateway; weak Skill compounding |
| Linux VPS 4C8G | ≈5GB usable | API mode ≈6GB; Docker backend +2GB | Shared vCPU jitter; capped disk IOPS | Remote SSH works; no macOS—some Skills feel awkward |
| Mac Mini M4 16GB | ≈9GB usable | Local 8B + channels ≈14–15GB at ceiling | Idle ≈12W; inference burst 25–35W | Native macOS; single channel + local model at the limit |
| Mac Mini M4 32GB | ≈22GB usable | 8B + dual channels + cron ≈18–20GB | Same silicon, less memory pressure | Production pick—room for Skill + FTS5 growth |
Unified memory (UMA) on M4 cuts CPU↔GPU copies during local inference; macOS keeps the official installer and Ollama path short. Pi saves watts but cannot hold an 8B model; VPS rent is low until cross-region RTT and IO throttling tax every tool loop—once Skills and FTS5 indexes reach gigabytes, you care more about stable disk latency than saving a few dollars on month one.
The decision is not “Apple or not”—it is total cost to run memory compounding for 24 months, including hardware, power, ops hours, upgrade anxiety, and data migration. Rental converts CapEx to OpEx; for teams already treating Skills and channels as production load, that often beats buy-plus-self-support on decision cost alone.
| TCO dimension (24 months) | Buy M4 16GB | Rent M4 32GB |
|---|---|---|
| Hardware cash flow | Upfront device + tax; you model depreciation | Fixed monthly fee × 24; upgrade RAM without replacing the box |
| Power (24/7) | ≈12–35W × 24h × 730 days (you pay) | Included in service fee; provider absorbs datacenter PUE |
| Ops hours | Warranty, OS upgrades, fan and outage on you | Hardware swap on failure; remote KVM ready |
| Hermes data assets | USER.md / Skills / FTS5 bound to one machine; migrate on swap | Backup → restore on new lease; wipe on return |
| Upgrade risk | M-series cadence tempts a second purchase | Pick a new spec at contract end—no resale math |
| Opportunity cost | Hardware research steals Skill polish time | Focus on agent workflows and channel expansion |
Pick RAM: API-only with one channel can live on 16GB; local Hermes-3 plus multiple channels and cron wants 32GB so FTS5 rebuilds do not OOM.
Order and access: Record lease ID and remote path; confirm MDM and team profile delivery for org use.
Acceptance: Verify Apple Silicon, ≥256GB disk, macOS version on the official Hermes path; disable automatic sleep.
Install Hermes: Run the official one-liner, then hermes init; confirm data-directory location and backup policy.
24/7 smoke test: Bind an IM channel, dispatch a long task with 5+ tool calls; after 24h confirm Skill write and FTS5 retrieval.
Backup and off-board plan: Export the data directory on schedule; before lease end migrate USER.md / Skills and wipe disk per policy.
curl -fsSL https://get.hermes-agent.org | bash hermes init hermes model
Tip: Pin Hermes version in production and log changes; after hermes model switches backends, watch the 24h memory curve before opening a second IM channel.
Hermes Agent’s moat is three-tier memory compounding on disk—but realizing that curve needs a 24/7 Gateway, healthy FTS5 indexes, and enough unified memory to finish 5+ tool chains. Pi and VPS can pass the installer yet trim the Skill curve on local inference or IO stability; Mac Mini M4 rental turns hardware into a predictable service so you spend cycles on USER.md polish and channels—not fans and resale timing.
If you are ready to run Hermes on dedicated Apple Silicon, the next step is matching plan to delivery: VpsMesh Mac Mini M4 monthly rental offers 16/32GB unified memory, remote access, and wipe-on-return. See Mac Mini M4 rental pricing, deployment and FAQ at the help center, and configure online at the order page.
Warning: Do not migrate hosts, rebuild FTS5, and wipe the Skill directory in the same weekend—three simultaneous changes make root-cause impossible. Move the machine, prove 24h Gateway stability, then touch model routing or bulk memory imports.
No. Skill Documents, USER.md, MEMORY.md, and SQLite FTS5 index files on disk survive a reboot; only in-session context breaks. What matters is a 24/7 stable host with regular backups—a sleeping laptop still drops channels and long tool chains.
Gateway idle is roughly 200–400MB; local Ollama with Hermes-3 8B often peaks at 8–12GB. With channels, cron, and local inference in parallel, 16GB gets tight—32GB unified memory is safer. Compare tiers on pricing.
If Skill compounding and channel uptime matter more than owning silicon, 24-month rental turns depreciation and upgrade risk into fixed OpEx—often lower than buy-plus-ops for individuals and small teams. Order and delivery: order page; setup questions: help center.