Research

Artux: A Design for Ambient Cognitive Infrastructure

2026v2.0 — March 2026· AI, cognitive architecture, ambient, infrastructure, whitepaper, memory, perception

Artux is a self-hosted ambient AI system built for personal environments. It is not a chatbot with plugins. It is not a local copy of a cloud AI product. It is a cognitive architecture — a small set of components with clear contracts, each doing one thing well, composable enough to handle the full complexity of an ambient personal environment without any single component becoming a monolith.

This document describes the design principles, architecture, and key decisions behind Artux. It is written after implementation rather than before, which means it describes what was actually built and why, not what was planned and hoped for.

Origin

Artux began as a conceptual document called ANIMA — Autonomous Neurocognitive Intelligence Modelling Architecture — written out of frustration with a specific and mundane problem: wake words. Every voice assistant in common use requires you to summon it before it listens. This reveals something about the system's self-conception: it assumes it needs permission to perceive. The wake word is not a privacy feature. It is a philosophical concession — the machine admitting that it does not know when it is relevant.

The central proposition of ANIMA was a restatement of Descartes: "Percipio, ita cogito agere" — I perceive, therefore I think to act. Not waiting to be addressed. Continuously aware, and deciding for itself when action is warranted.

ANIMA proposed two implementation spirits: Sagax, the stable backbone, and Mirai, the adaptive imagination. Without Sagax, ANIMA would drift; without Mirai, it would ossify.

Artux is what happened when that conceptual framework met the constraints of actual implementation. Several things transferred almost intact: the Sagax name and role, the Orchestrator as a routing kernel with no cognitive logic, the notion of internal thought as a first-class output rather than a side effect, and the continuous perception loop as the primary driver of action.

Mirai did not survive as a named agent. The adaptive spirit it represented was distributed across the architecture instead: Logos handles evolution and consolidation, the tool ecosystem handles extensibility, the provider model handles backend flexibility, the instruction system handles knowledge evolution.

The Problem with Ambient AI

Most AI assistants are session-local. They know what you told them in the current conversation. They may have tool access. They may have a document context. But when you come back tomorrow, they start again. The system has no memory of what you said, what it did, what worked, what didn't, or who you are beyond your account identifier.

This is fine for a productivity tool. It is insufficient for an ambient system that lives in your home, understands your routines, knows your preferences, and should get better at serving you over time — not by fine-tuning a model, but by accumulating structured knowledge about you specifically.

A second problem: cloud dependency. An ambient system in your home sees everything. It hears conversations. It sees what your camera sees. It knows when you are home and when you are not. That data should not leave your possession. The system should be self-hosted by design, not as an afterthought opt-in.

A third problem: rigidity. Fixed tool sets with hard-coded integrations mean every new capability requires a software update. A system that can reason about what tools exist, discover new ones at runtime, and learn from execution experience is qualitatively different from one where capabilities are enumerated in a config file.

Artux is built as a response to these three problems simultaneously.

Architecture

Artux is two repositories. Muninn is a passive SQLite-backed memory store — it stores events, consolidates knowledge, tracks entities, and answers recall queries. It has no opinions about what should be stored or why. Huginn is the cognitive layer — five components that together turn a stream of perceptual events into purposeful, improving action. Muninn can run alone as a library. Huginn always requires Muninn. Neither knows what the other's internals look like; they communicate through Muninn's public API.

The Memory Model

Three Tiers

Artux memory has three tiers with distinct roles that do not overlap.

Short-term memory (STM) is an append-only event log. Every perception — speech, video frame, sensor reading, tool result, agent output — becomes a typed event with a source, a timestamp, a confidence, and a structured payload. Events are never modified once written. They are never deleted by compression. They persist until a verified consolidation pass, at which point Logos advances the flush watermark and removes them from the live event store. The raw event is always ground truth.

Hot state (HTM) is the operational present. It has three surfaces. Tasks are durable work records — goals that span multiple steps, may be interrupted, must be resumable. ActiveSessionCache is ephemeral session context — the working set of entities, recent recalls, tool invocations, and a complete workbook of the current session's token stream and tool payloads. States is a flat key-value store for live operational parameters: which LLM provider is active, what speed the TTS daemon should run at, whether the microphone is enabled. States persist to LTM at session end. They are operational ground truth — the difference between a configuration file that requires a restart and a parameter you can change mid-sentence.

Long-term memory (LTM) is durable knowledge. It is written only by Logos, the background consolidation daemon. Every piece of LTM — episodic observations, semantic assertions, entity ledgers, tool descriptors, skill artifacts, instruction manuals, system configuration — is a first-class recall-able artifact. The same query interface retrieves a kettle control tool, a remembered fact about a person, and the operational manual for how to park an interrupted task. There is no distinction at the retrieval layer.

Why Append-Only STM

The decision to never delete raw events during compression reflects a design invariant: compression is a lossy summary for reasoning convenience, not a faithful record. When Sagax updates consN — its rolling narrative of recent events — it is making an editorial choice about what to keep in its active context. That choice should not destroy the record those decisions were made from.

Logos reads raw events for consolidation, not consN. Building LTM from a summary of a summary would compound lossiness with each pass. Logos reads the ground truth; Sagax reasons from the editorial summary; both are correct for their purpose.

The Entity Model

Entities — people, objects, concepts — are historical ledgers, not records. When something new is learned about an entity, it is appended. When something is corrected, the correction is appended with its authority tier. Contradictions coexist in the ledger, each weighted by source authority and recency. The entity's identity is derived from this history, not imposed as a single current state. This models how identity actually works — we do not overwrite what we knew about someone when we learn something new. We accumulate.

The Cognitive Model

Five Components

Perception Manager has no language model. It runs active perception pipeline tasks from HTM, chains tool calls in sequence (audio capture → ASR → signature resolution), and writes canonical typed events to STM. It does not decide what to do with what it perceives. It does not triage. It records.

Exilis is woken by each new STM event. It reads consN plus the new-event window — the same context Sagax would read — and makes one LLM call with a structured output: ignore, act, or urgent. That is its entire job. It never actuates. It never writes. It never calls recall(). It decides whether Sagax needs to wake up.

The key design decision in Exilis is that it shares Sagax's model and Sagax's consN. Triage coherence requires shared priors — if Exilis classifies a backchannel as urgent using a different world model than Sagax uses for reasoning, the system will interrupt itself on its own speech. By sharing the same model and the same rolling narrative, Exilis effectively asks: "Given what Sagax knows right now, does Sagax need to wake up?" That is precisely the right question.

Sagax is the reasoning and planning agent. It is event-driven — it sleeps between Exilis signals and between tool result arrivals. When it wakes, it reads consN plus the new-event window, checks HTM for active or due tasks, and produces a structured Narrator token stream. It discovers capabilities by calling recall(), not by consulting a registry. It tracks multi-step work in HTM Tasks. It updates consN when its working context has grown stale. It reads and writes HTM States for operational parameters.

Logos is the background consolidation daemon. It reads raw STM events, synthesises LTM narratives, promotes execution traces to skill artifacts, manages memory hygiene, and handles tool installation. Logos is the sole author of durable LTM — Sagax can request LTM writes in narrow circumstances, but the authoritative consolidation is always Logos.

Orchestrator is the routing bridge. It has no cognitive logic. It manages the token stream state machine, the permission gate, the two-stage nudge protocol, the speech chunker, the ActuationBus publisher, the HTM scheduler, and the session lifecycle. Everything flows through it; it decides nothing.

consN — Lossy by Design

The rolling narrative summary (consN) is one of the more counterintuitive decisions in the architecture. Most systems try to preserve information. Artux deliberately compresses it into something lossy.

The reason: Sagax has a bounded context window. The full raw event log for a long session would exceed it. Something has to summarise. The question is whether that summary should be treated as a faithful record or an editorial shorthand.

Artux treats it as editorial shorthand. consN is Sagax's private working context — it is updated by Sagax when needed, it serves Sagax's reasoning, and it is invisible to Logos. Logos always reads the raw events. This means the full-fidelity record is always available for consolidation quality, even as Sagax's working context shrinks to fit its window.

Skills as Guidance

When Sagax executes a skill, it does not run a script. It reads a guidance sequence — an ordered list of steps with natural-language descriptions, tool hints, and interaction flags — and reasons through each one. It fills in arguments. It handles interactive beats by asking the user or recalling preferences. It tracks progress in an HTM Task notebook.

This distinction matters for two reasons. First, it makes the system robust to capability changes — if a tool is replaced, the guidance step still makes sense even if the tool_hint is stale. Second, it produces rich execution traces. Logos synthesises skills from execution traces, and a trace produced by reasoned interpretation is far richer than one produced by mechanical execution.

Skill synthesis thresholds are conservative by design: three successful executions, structural similarity above 0.85, no failures in the last five runs, spread across at least two days. New skills require human confirmation for the first two autonomous runs. The system earns trust; it does not assume it.

The Provider Model

LLM Inference as a Tool

The most significant architectural decision in v2.0 is that LLM inference is no longer special. A provider tool is a .py file with a HUGINN_MANIFEST block declaring mode: provider and four callables: complete, stream, complete_json, complete_tools. It is installed through the same staging workflow as a kettle controller or a TTS daemon.

This has practical consequences that extend beyond flexibility. An operator wanting to run on a new local model or a new API does not need to modify any Huginn source code. They write a provider tool with the appropriate API calls, drop it in the staging directory, confirm it through Sagax, and it is active. The cognitive stack is agnostic about what runs inference.

The Router

LLMClient is a thin router. At every call, it reads three values from HTM.states: which provider tool is active for this agent role, the model name to pass to the provider, and the generation temperature. If a registered provider exists under that tool ID, it dispatches through the adapter. If no provider is registered (first boot, or before any provider tool is installed), a builtin provider handles the call — supporting Ollama, Anthropic, OpenAI, LM Studio, and llama.cpp as bootstrap backends.

Per-Role Configuration

Exilis, Sagax, and Logos can each run different providers and models. At boot, the Orchestrator recalls config entries from Muninn LTM and populates each role's provider, model, and temperature from previously persisted configuration. Changes Sagax makes at runtime are dirty-tracked in states and written back to LTM by Logos at session end, so they survive across reboots.

The model configuration for a running Artux instance is not in a config file anywhere. It is in Muninn LTM, and it evolves as the operator or Sagax modifies it.

The Tool Ecosystem

Capabilities as Memory

Tool descriptors live in Muninn LTM. Sagax finds them by calling recall() — the same way it finds facts about people, remembered preferences, and skill guidance sequences. The search is semantic; a query like "heat water for a hot drink" surfaces the kettle tool because its capability summary ends up in the semantic embedding space.

There is no tool registry that Huginn maintains separately. The tool ontology is self-organising: Logos writes descriptors at install time, updates them when tools are deprecated, and the next recall reflects the current state of capabilities automatically.

The Staging Lifecycle

A new tool's lifecycle:

Operator drops a .py file with a HUGINN_MANIFEST block into tools/staging/
Logos' next pass scans staging, parses the manifest, and creates a waiting HTM task
Sagax sees the staging task, finds a natural pause in conversation, and presents the tool to the user — what it does, what permissions it requests
User confirms or declines
Logos' next pass reads the affirmation, installs dependencies, loads the module, writes the LTM descriptor, and registers the handler

The key property: Sagax is the user-facing conversation layer and Logos is the installation engine. They communicate through HTM. Neither needs to know how the other works.

Tool Modes

Callable is the default. A single handle(**args) function called synchronously, returning a result. Appropriate for tools that do one thing and return.

Service tools are daemon threads. They expose start(config), stop(), and handle(event). They subscribe to the ActuationBus and process output events as they arrive. They read their configuration from HTM.states on every event cycle, so parameter changes take effect immediately without a restart.

Provider tools are LLM inference backends. They expose four callables matching the LLMClient router interface. They are discovered and installed like any other tool; they just happen to provide the inference substrate.

The Communication Model

The ActuationBus

Output events flow through an in-process publish/subscribe bus. The Orchestrator publishes; live tool daemon threads subscribe with filter dicts. The bus is non-blocking — full subscriber queues drop events silently rather than blocking the Orchestrator's token stream processing.

Three event completeness levels: partial (individual token for avatar lip sync), chunk (phrase-boundary flush for TTS synthesis), and full (complete speech block written to STM for Logos consolidation).

The Narrator Grammar

Sagax's output is a structured token stream. Each XML block type has a defined routing contract:

<contemplation> — written to STM as an output event; Logos reads it as part of the episodic record
<speech> — streamed live to TTS; close writes full event to STM
<speech_step> — suspends the planning cycle mid-execution for interactive skills
<tool_call> — dispatched through the permission gate; creates or updates an HTM task record
<aug_call> — dispatched in parallel with per-tool timeout budgets for read-only tool queries
<task_update> — written directly to HTM; supports task lifecycle and HTM States operations

Speech Step Suspension

<speech_step> changes the conversational model. Most agent architectures treat user input as something that arrives at the start of a turn and triggers a new planning cycle. That model makes it difficult to write skills with interactive beats.

speech_step suspends the planning cycle mid-execution. Sagax emits the question, the Orchestrator sets a pending flag, and the next chat() call routes to the response handler instead of waking Sagax. The response is bound to the declared variable name, and Sagax generation resumes with an injected <speech_step_result> block. The skill continues exactly where it stopped.

The Instruction System

Separating Contract from Manual

The operating contract is in the system prompt: grammar, micro-examples, a topic directory, hard rules. The reference material is in Muninn LTM, retrievable inline during generation.

Eight instruction artifacts are written to Muninn LTM by Logos on first boot, each covering one operational domain: task management, skill execution, memory operations, state management, live tools, staging, entity handling, and speech step mechanics.

Each artifact is action-oriented and self-contained — written so it can be read in isolation and acted upon immediately. get_instructions(topic) is registered as a read-only tool, making it fully eligible for inline retrieval during generation without interrupting the reasoning arc.

Identity and Privacy

Biometric Identity

Artux does not use login screens. Identity is established through signature matching — voiceprints, faceprints, or device identifiers emitted by perception tools. The Perception Manager resolves these against the entity registry before writing events to STM.

An implied entity accumulates evidence across events — voice patterns, name claims, contextual associations, visual confirmation. When evidence from two or more independent sources converges, Sagax creates a permanent entity. Until an entity is confirmed, it has guest access only: no personal calendar entries, no actuation of personal devices, no private information retrieval.

Self-Hosted by Design

No component of Artux requires external network access during normal operation. Muninn uses SQLite. The default inference backend is Ollama running locally. Perception tools run local models. The only outbound connections are optional: provider tools that call external APIs if the operator chooses to install them.

The privacy boundary is the machine. What Artux sees, hears, and knows stays on the machine unless the operator explicitly chooses to send it elsewhere through a tool they install and confirm.

From ANIMA to Artux

ANIMA concept	Artux implementation
Core motto: Percipio, ita cogito agere	Exilis — continuous perception, decides when to wake Sagax
Sagax: stable backbone	Sagax — reasoning and planning agent, event-driven
Mirai: adaptive imagination	Distributed: Logos (evolution), tool ecosystem (extensibility), provider model (backend), instruction artifacts (knowledge)
Orchestrator: cognitive kernel	Orchestrator — routing bridge, no cognitive logic
ToolFactory: dynamic capability generator	Staging workflow — manifest → confirm → install → LTM
Diary of thoughts	`contemplation` blocks → STM → Logos reads for LTM
Continuity of state	STM (event log) + consN (rolling summary) + LTM
Transparency imperative	Structural: contemplation → STM, append-only events, full workbook
Identity	Biometric signatures → entity grants → permission gate
Inference backend	Provider tools — swappable via two state_set writes

The most significant departure from ANIMA is the memory architecture. ANIMA describes memory as "short-term, long-term, and event RAM" without specifying how these tiers interact. Artux's answer — Muninn as a passive store with Logos as the sole LTM author, STM as an append-only log flushed only after verified consolidation, and HTM States as live operational ground truth — took the bulk of the implementation effort.

The second significant departure is that Artux treats inference as a service, not a substrate. ANIMA assumes a fixed LLM backbone. Artux makes the LLM itself a tool — installed, configured, and swappable through the same mechanisms as a kettle controller.

What Artux Is Not

Not a chatbot with plugins. A chatbot responds to turns. Artux runs continuously, perceives its environment, maintains structured knowledge, and acts proactively.

Not a local copy of a cloud product. Cloud AI products are optimised for breadth and session-local interaction. Artux is optimised for depth of knowledge about one environment and one set of people.

Not a framework. A framework gives you abstractions and lets you fill in the logic. Artux has opinions — about memory architecture, cognitive component separation, tool lifecycle, identity management. Those opinions are the product.

Not finished. Async STM notification, multi-agent write conflict resolution, avatar integration, and the full skill synthesis evaluation pipeline are pending. The architecture is stable; the implementation continues.

Design Principles

Memory is metabolic. It breathes — growing through consolidation, shrinking through forgetting, updating through reinforcement. The forgetting is not a failure mode; it is how the system stays usable as it accumulates years of experience.

Raw events are ground truth. Compression, summarisation, and narrative construction are editorial processes that aid reasoning. They are not the record. Nothing compresses it away.

All cognitive decisions are LLM calls. No hardcoded classifiers, no rule engines, no keyword matching. The models are interchangeable components; the architecture is not.

Capabilities are discovered, not enumerated. Sagax finds out what it can do the same way it finds out what it knows — by asking Muninn. New tools become available through a structured lifecycle that includes user confirmation.

Providers are tools. LLM inference is not special infrastructure. It is a tool with a known interface, installed through the same lifecycle as every other capability.

The system prompt is a contract, not a manual. The operating contract is 84 lines. Everything else lives in Muninn LTM and arrives on demand inline during generation.

Identity is authority. Permissions are not session grants from a login. They are properties of an identified entity, established through biometric confirmation and maintained as part of that entity's ledger.

Artux earns trust by evidence. Nothing becomes a capability until it has proven itself. Nothing becomes a known entity until cross-modal evidence converges. Nothing is stored in LTM until Logos has read it, evaluated it, and judged it worth keeping.

Huginn thinks. Muninn remembers. Together they give Odin sight.

Repositories: artux-muninn (memory module) · artux-huginn (cognitive module)

The ANIMA concept paper is available as separate research.