What We've Discovered
LocoLab claims five principles. Below are the artifacts — built, measured, or in progress — that those claims rest on. Each entry has a status marker borrowed from research (where papers carry similar markers):
claim— position taken, not yet builtarchitecture— designed and specced, implementation pendingbuilt— implementation shipped, not yet measured at scalemeasured— empirical results, not yet publishedpublished— paper or report public
This page evolves. Items move down the status ladder as work matures. Failed claims stay on the page with their evidence — that’s the point.
Engineer before hardware
Section titled “Engineer before hardware”Qwen3-30B-A3B at reading speed on a GTX 1060 6GB
Section titled “Qwen3-30B-A3B at reading speed on a GTX 1060 6GB”- Status:
architecture - Claim: five
llama.cppflags (--n-cpu-moe,--no-mmap,--mlock,--cache-type-k q4_0,--cache-type-v q3_0) plus 24-64GB of system RAM let a $80 used GTX 1060 run Qwen3-30B-A3B at ~17 tokens/sec - Artifacts: LocoBench MoE-on-a-budget design spec; implementation plan; Python harness library (33+ passing tests covering config loading, server boot, llama-bench wrapper, hardware fingerprinting); 17 cell configs across 4 VRAM tiers ready to run
- What would invalidate: if measured
tg128< 10 tok/s on the 1060 with the optimised preset, or if the article’s claim doesn’t replicate
Specialize and harness
Section titled “Specialize and harness”LocoAgente conversational harness — Phase 1
Section titled “LocoAgente conversational harness — Phase 1”- Status:
built - Claim: a routed harness around small models (four-subsystem architecture: Orchestration / Context / Tools / Inference) makes a Qwen3-4B genuinely useful as a thinking partner for tasks where frontier models converge to the average
- Artifacts: LocoAgente design spec; Phase 1 library (27 commits, 63 tests at 98% coverage on
harness.core); E primitivegenerate_variantsenforcingn >= 2; four orchestration patterns (SinglePass,DebatePattern,SynthesisPattern,IterativeRefinement); three Context profile bundles (business, academic, writing) - What would invalidate: if a blind-test reader cannot reliably distinguish the three frames in a Perspective Debate output (frame collapse on small models)
LocoLLM routing thesis
Section titled “LocoLLM routing thesis”- Status:
architecture - Claim: a routed swarm of task-specific Qwen3-4B adapters outperforms a single undifferentiated Qwen3-4B on focused tasks (math, code, summarisation, etc.)
- Artifacts: LocoLLM repository; QLoRA training pipeline; routing layer; partial benchmark suite
- What would invalidate: if the router’s task-classification accuracy drops below the gain from specialisation, the system is net-negative versus the unrouted baseline
Vary, don’t average
Section titled “Vary, don’t average”FrameStrategy deliberate variance engineering
Section titled “FrameStrategy deliberate variance engineering”- Status:
built - Claim: N variants engineered to differ (identity frames, discipline frames, constraint inversions) produce more useful divergent output than N samples hoping to differ
- Artifacts: four
FrameStrategyimplementations inharness/frames.py(IdentityFrames,DisciplineFrames,TemperatureLadder,ConstraintInversion); test suite verifying each strategy produces structurally distinct prompts; documented variance-collapse warning onTemperatureLadderfor narrow temperature spreads - What would invalidate: if the four strategies all collapse to the same output distribution on small models — i.e., engineered framing doesn’t actually channel variance through the model — the principle is just an aesthetic claim
Conversation, not delegation
Section titled “Conversation, not delegation”Variant + Uncertainty verification contract
Section titled “Variant + Uncertainty verification contract”- Status:
built - Claim: every harness output carries a rationale and surfaced uncertainty (load-bearing
flags+verification_hooks; auxiliaryconfidence); singular outputs are forbidden at the primitive level; the human’s verification loop is cheap because the harness does the prep work - Artifacts:
VariantandUncertaintydataclasses inharness/core.py; XML tag parser enforcing required<text>and<rationale>;CalibrationLogrecording user picks/rejects/edits as JSONL for downstream analysis - What would invalidate: if users routinely ignore the
verification_hooksfield — i.e., the harness produces hooks but they’re not the right hooks — the principle is performative not functional
Keep Asking — Study 1: Does the Nudge Work?
Section titled “Keep Asking — Study 1: Does the Nudge Work?”- Status:
architecture - Claim: a conversational nudge shifts students from passive delegation to active conversation and improves task outcomes (using frontier models, to isolate the nudge effect from model quality)
- Artifacts: research design under active development; see research for the full thread (Cognitive Strategy Transfer, DSR AI Education Simulation, Keep Asking Studies 1 and 2)
- What would invalidate: if nudged students don’t differ from un-nudged controls on either conversation patterns or outcomes
Bridge experience, locally
Section titled “Bridge experience, locally”LocoPuente PoC
Section titled “LocoPuente PoC”- Status:
running(=measuredin operational terms) - Claim: a single mid-range machine (Ryzen 5 2600 + RTX 3090 24GB) running an integrated stack (primary LLM, cited search, image generation, voice, research tooling) delivers frontier-equivalent UX for everyday user tasks at zero cloud cost
- Artifacts: LocoPuente service running on the Puente machine; browser-accessible LAN deployment; usable today by lab members and visitors
- What would invalidate: if users with full LocoPuente access still prefer the cloud equivalent for non-privacy-sensitive work, the bridge isn’t bridging