Research ReportIndependent Analysis

Local LLM vs. Cloud LLM
for AI-Driven Options Trading

A comprehensive research-backed analysis of architecture decisions for a custom options trading bot — covering latency, cost, privacy, multi-model design, and hardware optimization for a dual-node AMD + RTX 3090 setup.

Verdict

Local LLM

Clear winner for this setup

Architecture

3-Layer Agent

Gateway → Specialist → Strategist

Strategist Model

70B Q4_K_M

Dual RTX 3090 NVLinked

Marginal Cost

~$0 / query

vs $0.675+/day cloud

Executive Summary

This report provides an independent, research-backed analysis of the architectural decisions facing the development of a custom AI-driven options trading bot. The central question — whether to rely on a locally hosted large language model (LLM) or a cloud-based API service — is evaluated across five critical dimensions: latency, cost, data privacy and security, performance and capability, and operational resilience.

Conclusion: Given the hardware already in possession and the nature of the trading workload, a local multi-model LLM architecture is the superior choice for this project. The dual-3090 NVLink configuration (48 GB combined VRAM) is sufficient to run a 70B parameter model in Q4 quantization — making the local path genuinely competitive, not a compromise.

Beyond the binary local vs. cloud decision, this report evaluates the multi-model agent pattern — deploying a hierarchy of specialized models rather than a single monolithic LLM — and finds strong empirical and theoretical support for this approach. Specific model candidates, inference engine recommendations, hardware budgeting, and safety control mechanisms are discussed in detail.

Important: LLMs are not appropriate for high-frequency trading (HFT) regardless of deployment model. The LLM's role in this architecture is as a strategic reasoning and approval layer, not the primary signal generator. Deterministic algorithms on Node 1 generate signals; the LLM approves or rejects them.

Hardware Overview

Dual-node AMD system with RTX 3090 GPU allocation

NODE 1 — SIGNAL ENGINE

AMD 64-Core Processor

192 GB System RAM

1× RTX 3090 (24 GB VRAM)

Services

Polygon WSGreeks EngineSignal GenOllamaIBKR TWSQuestDBgRPC Client

NODE 2 — INTELLIGENCE HUB

AMD 64-Core Processor

192 GB System RAM

2× RTX 3090 NVLinked (48 GB VRAM)

Services

vLLM (GPU)llama.cpp (CPU)QdrantNews IngestiongRPC Server

Node	Role	CPU	RAM	GPU
Node 1 (Signal Engine)	Data ingestion, Greeks calc, trade execution, development	AMD 64-core	192 GB	1× RTX 3090 (24 GB)
Node 2 (Intelligence Hub)	LLM reasoning, sentiment analysis, trade approval	AMD 64-core	192 GB	2× RTX 3090 NVLinked (48 GB)

PCIe Bifurcation Check Required: The motherboard on Node 2 must support PCIe 4.0 x8/x8 bifurcation to provide adequate bandwidth to both RTX 3090s simultaneously. Consumer boards often only support x16/x4, which will starve the second GPU. Verify this against your specific motherboard specifications before final hardware configuration.

Local LLM vs. Cloud LLM

Comprehensive comparison across five critical dimensions

3.1 Latency

Latency is the most operationally critical dimension for a trading system. Cloud LLM APIs introduce network round-trip latency that is fundamentally unavoidable. Benchmarks show that cloud API calls to GPT-4 class models introduce a Time to First Token (TTFT) of approximately 500 ms to 2,000 ms under typical conditions. A local LLM on the same private network with a 10 GbE direct-attach connection contributes only ~0.1–0.5 ms of network overhead.

Cloud TTFT

500ms – 2,000ms

Plus network round-trip variance

Local Network Overhead

~0.1 – 0.5ms

10 GbE SFP+ DAC between nodes

3.2 Cost Analysis

Cloud LLM pricing is token-based and scales linearly with query volume. A trading bot making 50 LLM calls per day (2,000-token prompts, 500-token responses) costs approximately $0.675/day with GPT-4.1, or ~$170/year. Expanded monitoring or overnight analysis can push this into thousands annually. The local LLM has a marginal cost of zero per query.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Est. Daily Cost
GPT-4.1 (OpenAI)	$3.00	$12.00	~$0.675
GPT-4o Mini	$0.15	$0.60	~$0.034
Claude Opus 4.5	$5.00	$25.00	~$1.25
Gemini 1.5 Flash	$0.08	$0.30	~$0.018
Local Llama 4 70B (Q4)	$0.00	$0.00	$0.00

3.3 Data Privacy & Regulatory Security

When a trading bot sends market data, position information, or strategy logic to a cloud LLM API, that data traverses third-party infrastructure. This creates three categories of risk:

Proprietary Strategy Exposure

Prompts necessarily contain strategy logic, signals, and position data — even with enterprise DPAs, this leaves your control.

Regulatory Compliance

SEC and FINRA have increasingly scrutinized AI in trading. Data residency requirements are becoming more stringent as AI-specific financial regulations evolve.

Operational Security

Cloud APIs can be subject to outages, rate limiting, and service degradation — creating a critical dependency outside the operator's control.

Local deployment means all data — market feeds, position data, strategy logic, and model outputs — remains within the operator's private network at all times.

3.4 Summary Comparison

Dimension	Local LLM	Cloud LLM
Inference Latency	1–5s (no network overhead)	0.5–2s + network round-trip
Cost at Scale	Near-zero marginal cost	$170–$2,000+/year
Data Privacy	Complete — stays on-network	Sent to 3rd-party infra
Regulatory Risk	Minimal	Moderate to high
Uptime Dependency	Self-controlled	Dependent on provider SLA
Model Capability	Competitive for trading tasks	Marginally superior (frontier)
Setup Complexity	High (hardware, software)	Low (API key, HTTP calls)
Customization	Full (fine-tuning, RAG)	Limited (prompt only)

03b

Cost Calculator

Adjust your trading workload parameters to compare cloud API costs against local electricity costs in real time

Cloud vs. Local Cost Calculator

Adjust sliders to model your trading workload

Cloud cheaper at this volume

QUERY PARAMETERS

Daily LLM Queries

Number of LLM calls per trading day

50 queries/day

1 queries/day500 queries/day

Input Tokens per Query

Prompt size (context + signal data)

2,000 tokens

100 tokens16,000 tokens

Output Tokens per Query

Response size (decision + reasoning)

500 tokens

50 tokens4,000 tokens

LOCAL HARDWARE

GPU Configuration

Electricity Rate

Cost per kWh (US avg: $0.12)

$0.12/kWh

$0.05/kWh$0.50/kWh

CLOUD PROVIDERS TO COMPARE

SUMMARY

Daily queries50

Tokens per query2,500 (in+out)

Annual local electricity$207.31

Hardware investment$3.0K

Cheapest cloud (annual)$5.47

Most expensive cloud (annual)$410.63

ANNUAL COST COMPARISON

DETAILED COST BREAKDOWN

Provider	Daily	Monthly	Annual	vs. Local
GPT-4.1	$0.60	$18.25	$219.00	+$11.69
GPT-4o	$0.50	$15.21	$182.50	$-24.81
GPT-4o Mini	$0.03	$0.91	$10.95	$-196.36
Claude Opus 4.5	$1.13	$34.22	$410.63	+$203.31
Claude Sonnet 4.5	$0.23	$6.84	$82.13	$-125.19
Gemini 1.5 Pro	$0.25	$7.60	$91.25	$-116.06
Gemini 1.5 Flash	$0.01	$0.46	$5.47	$-201.84
DeepSeek API	$0.02	$0.64	$7.67	$-199.65
Local (Electricity only)	$0.57	$17.28	$207.31	—

Multi-Model Architecture

The case for specialized agents over a single monolithic LLM

A single monolithic LLM is suboptimal for a trading system because the tasks required span a wide range of cognitive demands. Real-time sentiment classification requires speed and domain vocabulary; strategic trade approval requires deep contextual reasoning. These demands are in tension. Running a 70B model for every task — including simple ones — wastes compute and increases latency unnecessarily.

The multi-agent architecture mirrors how sophisticated trading operations actually function. Real trading desks employ specialists: a news analyst, a quantitative risk manager, a senior portfolio manager who makes final decisions. Replicating this structure in software produces more robust, interpretable, and auditable outcomes.

LAYER 1

Gateway / Router

Receives incoming signals from Node 1, classifies query type, assesses urgency, routes to the appropriate specialist. Performs basic sanity checks: is the signal within expected parameters? Is the market open? Has a similar trade been executed recently?

Speed-optimized — acts as traffic controller and pre-filter, preventing larger models from being invoked unnecessarily.

LAYER 2

Finance Specialist

Handles domain-specific tasks: interpreting options chain data, analyzing sentiment from news and SEC filings, calculating implied volatility context, and generating structured summaries of market conditions relevant to the trade signal.

Runs on CPU using 192 GB system RAM — preserves all GPU VRAM exclusively for the Strategist.

LAYER 3

Strategist (Primary Reasoning Engine)

Receives pre-processed context from the Finance Specialist and routing metadata from the Gateway. Makes the final trade approval or rejection decision with a confidence score (0.0–1.0). Has access to full portfolio state, risk parameters, and recent trade history via RAG.

Invoked only when the Gateway determines a trade signal has passed initial screening — minimizing unnecessary inference cycles.

4.1 RAG Pipeline

Retrieval-Augmented Generation (RAG) allows the LLM to access current, structured information without requiring model retraining. The RAG system uses a local vector database (Qdrant) to store and retrieve embeddings. When the Strategist is invoked, the RAG pipeline retrieves the most relevant context and injects it into the prompt.

Current portfolio positions and Greeks exposure

Recent trade history (last 50–100 trades with outcomes)

Relevant news articles from the past 24–48 hours

Historical volatility patterns for the instrument

Earnings calendar and upcoming macro events

SEC filing summaries and analyst sentiment

Model Selection

Recommended candidates for each layer of the agent hierarchy

LAYER 1 — GATEWAY

Llama 3.2 3B or Qwen 2.5 7B (Instruct)

3B – 7B2 – 4.5 GB VRAMOllama

Hardware: RTX 3090 on Node 1 (co-located with signal engine) | Role: Signal classification, urgency scoring, pre-filter routing

LAYER 2 — FINANCE SPECIALIST

FinGPT (Llama 3 8B, LoRA fine-tuned) or DeepSeek-R1 Distill 8B

8B~5 GB (CPU only)llama.cpp

Hardware: CPU inference on Node 2 — 192 GB RAM, zero GPU VRAM used | Role: Sentiment analysis, IV context, SEC filing interpretation, structured market summary

LAYER 3 — STRATEGIST

Llama 4 Maverick 70B or DeepSeek-V3 70B

70B38 – 42 GB VRAMvLLM (tensor-parallel=2)

Hardware: 2× RTX 3090 NVLinked on Node 2 (48 GB combined) | Role: Final trade APPROVE / REJECT, confidence score 0.0–1.0, reasoning trace

Model	Parameters	VRAM (Q4)	Recommended Role
Llama 3.2 3B	3B	~2 GB	Gateway (Node 1 RTX 3090)
Qwen 2.5 7B	7B	~4.5 GB	Gateway (Node 1 RTX 3090)
FinGPT (Llama 3 8B)	8B	~5 GB (CPU)	Finance Specialist (CPU)
DeepSeek-R1 Distill 8B	8B	~5 GB (CPU)	Finance Specialist (CPU)
Llama 4 Maverick 70B	70B	~38–42 GB	Strategist (Dual 3090 NVLink)
DeepSeek-V3 70B	70B	~38–42 GB	Strategist (Dual 3090 NVLink)

05b

Interactive Model Comparison

Explore benchmarks, trade-offs, and specs for every recommended model — select a layer and model to compare

BENCHMARK COMPARISON — ALL MODELS

Llama 3.2 3B
Qwen 2.5 7B
Mistral 7B

QUICK SPECS COMPARISON

Model	Params	VRAM	Speed	License
Llama 3.2 3B★	3B	~2 GB	~800–1,200	Llama
Qwen 2.5 7B	7B	~4.5 GB	~400–600	Qwen
Mistral 7B	7B	~4.5 GB	~380–550	Apache

SELECTED MODEL DETAIL

Llama 3.2 3B Instruct

Meta AI · Llama 3.2 Community

★ RECOMMENDED

Ultra-lightweight model optimized for fast classification and routing. Ideal as a gateway that must respond in under 500ms. Strong instruction-following for structured JSON output.

Best choice when raw speed is the priority. Leaves full RTX 3090 VRAM headroom for other Node 1 tasks.

Parameters

VRAM Required

~2 GB

Inference Speed

~800–1,200 t/s

Context Window

128K tokens

Engine

Ollama

Hardware

RTX 3090 (Node 1)

VRAM Usage vs. Node 2 Total (48 GB)

~2 GB

STRENGTHS

Extremely fast inference
Minimal VRAM footprint
Strong JSON/structured output
Low power draw

WEAKNESSES

Limited complex reasoning
Weaker financial domain knowledge
Small context window for complex prompts

TRADE-OFF CHECKLIST

Sub-500ms response

Financial domain knowledge

Complex reasoning

Structured output (JSON)

Fits Node 1 RTX 3090

Multi-turn context

BENCHMARK SCORES (0–100)

Reasoning

Financial NLP

Speed

VRAM Efficiency

Instruction Follow

Context Handling

05c

Build Your Stack

Select one model per layer to see live VRAM budget, combined latency estimates, and a ready-to-run deployment script

Build Your Stack

Select one model per layer — see live VRAM, latency, and deployment commands

VRAM OK

P95 ~8.4s

Runs on Node 1 (single RTX 3090, 24 GB). Classifies incoming signals, filters noise, routes to specialist. Must respond in < 500ms.

Llama 3.2 3B Instruct

REC

3B · Q4_K_M·2.2 GB VRAM·~80ms P50

Ultra-low latencyMinimal VRAM

Qwen 2.5 7B Instruct

7B · Q4_K_M·4.8 GB VRAM·~160ms P50

Better reasoning than 3BStrong instruction following

Phi-4 Mini Instruct

3.8B · Q4_K_M·2.8 GB VRAM·~95ms P50

Strong reasoning for sizeMicrosoft research quality

Best choice for high-frequency signal classification. Leaves 21+ GB VRAM free on Node 1 for other workloads.

LATENCY BREAKDOWN (P50 ms)

Gateway and Specialist run in parallel — the longer of the two determines the combined phase latency. Strategist waits for both before generating a decision.

CURRENT STACK

Llama 3.2 3B

2.2 GB

80ms

FinGPT 8B

CPU

1200ms

Llama 4 70B

40 GB

3500ms

VRAM BUDGET

Node 1 (Gateway)

OK2.2 + 2GB overhead / 24GB

2.2 GB model

0 GB90% (22 GB)24 GB

Node 2 (Strategist)

OK40.0 + 2GB overhead / 48GB

40.0 GB model

0 GB90% (43 GB)48 GB

Specialist runs on CPU — no VRAM consumed. 2 GB overhead per node reserved for CUDA/vLLM runtime.

ROUND-TRIP LATENCY

Parallel phase (max of L1, L2) (P50)1200ms

Strategist decision (P50)3500ms

Network + orchestration (est.)~200ms

Total round-trip (P50)~4900ms

Total round-trip (P95)~8.4s

Within 10s target — suitable for 1-min bar strategies

THROUGHPUT

Gateway120 tok/s

Specialist12 tok/s

Strategist (bottleneck)18 tok/s

Deployment Script

Inference Engine Selection

vLLM vs. llama.cpp — when to use each

vLLM

GPU-Optimized Inference Server

PagedAttention for efficient KV cache management

Native tensor parallelism across multiple GPUs

35× higher throughput than llama.cpp at peak load

OpenAI-compatible REST API out of the box

Use for: Strategist 70B on Node 2 dual-3090

llama.cpp

Portable C/C++ Inference

Runs on virtually any hardware including CPU-only

Wide quantization format support (GGUF)

Minimal dependencies, easy deployment

Comparable latency at low concurrency (1–4 requests)

Use for: Finance Specialist 8B on CPU (Node 2)

Recommendation: Use vLLM on Node 2 for the Strategist (70B model) with --tensor-parallel 2 to distribute across both NVLinked RTX 3090s. Use llama.cpp on Node 2 CPU for the Finance Specialist (8B model). Use Ollama (wrapping llama.cpp) on Node 1 for the Gateway (3B–7B model).

VRAM Budget & Hardware Considerations

Node 2 VRAM Budget (48 GB total)

Component	VRAM Allocation	Notes
Llama 4 70B (Q4_K_M) model weights	~38–40 GB	Primary model weights across both GPUs
KV Cache (32K context window)	~4–6 GB	Sufficient for full trade analysis prompts
vLLM overhead & CUDA buffers	~1–2 GB	Runtime overhead
Total	~43–48 GB	Tight but workable — limit context to 32K tokens

RTX 3090 FP8 Limitation: The RTX 3090 (Ampere GA102) does not include hardware support for FP8 tensor operations. Use INT4 (Q4_K_M) quantization via GGUF format or AWQ — the most efficient strategy for this hardware. This retains ~95–97% of full-precision model performance.

NVLink Performance: Benchmarks confirm that NVLink improves dual-3090 inference throughput by approximately 50% compared to PCIe-only communication (600 GB/s NVLink vs. ~32 GB/s PCIe). This is the key reason the dual-3090 NVLink setup is viable for 70B tensor-parallel inference.

Inter-Node Communication & Latency

The two nodes should be connected via a 10 GbE SFP+ Direct Attach Cable (DAC). The communication protocol should use gRPC rather than REST — gRPC uses HTTP/2 and Protocol Buffers, reducing inter-service latency by up to 60% for structured data payloads.

Estimated Round-Trip Latency

Step	Estimated Latency
Node 1 generates trade signal	< 1 ms
Signal transmitted to Node 2 (10 GbE)	0.1 – 0.3 ms
Gateway model routes request	200 – 500 ms
Finance Specialist processes context	500 – 1,500 ms
Strategist generates decision	1,000 – 5,000 ms
Decision transmitted back to Node 1	0.1 – 0.3 ms
Node 1 executes trade via IBKR API	50 – 200 ms
Total round-trip	~2 – 7 seconds

This latency profile is appropriate for options trading strategies operating on 1-minute bars or longer. It is not suitable for scalping or HFT. The deterministic signal engine on Node 1 operates independently at sub-second speeds; the LLM layer provides a strategic overlay and approval mechanism.

Safety, Risk Controls & Dead Man's Switch

DEAD MAN'S SWITCH

Node 1 maintains a timeout counter for every LLM request sent to Node 2. If Node 2 fails to respond within 2.5–3 seconds, Node 1 automatically falls back to one of two pre-programmed behaviors:

MODE A

Safe-Exit Mode: Close any open positions pending LLM approval and halt new trade entry until the LLM connection is restored.

MODE B

Conservative-Default Mode: Apply pre-defined rule-based decisions (e.g., "only enter trades with delta < 0.30 and no earnings within 5 days") without LLM involvement.

Position Sizing Rule

The LLM should never have direct authority to size positions. Position sizing remains under deterministic risk management logic on Node 1. The LLM provides only a binary approval/rejection signal plus a confidence score (0.0–1.0).

Audit Logging

Every LLM call — full prompt, model response, routing decision, and final trade outcome — should be logged to QuestDB (time-series database). Serves backtesting validation, regulatory compliance, and model performance evaluation.

API Integration

Polygon.io and Interactive Brokers TWS

POLYGON.IO API

WebSocket streams for real-time tick data and options chain updates (sub-second latency)

REST API for historical data retrieval during backtesting and model training

RAM-disk / shm layer for live order book — sub-millisecond access for Greeks calculator

192 GB RAM sufficient for full in-memory options chain snapshot (SPY, QQQ + 20–30 equities)

INTERACTIVE BROKERS TWS

ib_insync Python library — most widely used and well-documented TWS API wrapper

TWS application co-located on Node 1 to minimize execution latency

Order submission latency: 50–200 ms for options orders (acceptable for target timeframes)

Paper trading mode — use extensively during development before live capital deployment

System Architecture Diagram

High-level dual-node design with data flows and integration points

system_architecture_diagram.png Open full size

AI-Driven Options Trading Bot System Architecture Diagram

11b

Hardware Upgrade Roadmap

When to upgrade, what to buy, and how to plan your path from dual RTX 3090 to future hardware

Hardware Upgrade Roadmap

Current: 2× RTX 3090 NVLinked (48 GB) — Node 2

2 Act

2 Plan

3 Watch

Critical NVLink Note: NVIDIA removed NVLink from consumer GPUs starting with the RTX 4090 and RTX 5090. The RTX 3090 is the last consumer GPU to support NVLink. Any upgrade path that replaces the dual-3090 NVLink setup must either use a single high-VRAM GPU (RTX PRO 6000) or accept PCIe-only inter-GPU communication — which reduces tensor-parallel throughput by approximately 50%.

These are the specific, measurable conditions that should trigger an upgrade evaluation. ACT triggers require immediate action. PLAN triggers mean begin budgeting and evaluating options. WATCH triggers mean monitor the metric monthly. Click any trigger to expand details.

ACT — Upgrade Now

PLAN — Begin Evaluating

WATCH — Monitor Monthly

MONTHLY MONITORING CHECKLIST

P95 LLM round-trip latency (target: < 7s)

VRAM utilization peak (target: < 90% of 48 GB)

Context window truncation events (target: 0/day)

OOM errors in vLLM logs (target: 0/month)

Electricity cost vs. equivalent cloud cost ratio

Concurrent request queue depth (target: < 2)

NVLink bandwidth utilization (via nvidia-smi nvlink)

New frontier model releases requiring > 48 GB VRAM

Section 12

LLM Training & Configuration Guide

Layer-by-layer training data recommendations, LoRA/QLoRA hyperparameters calibrated to your RTX 3090 hardware, data sourcing strategies, quality filtering pipelines, and evaluation frameworks for each agent layer.

Core Principle: Precision Over Volume

A carefully curated dataset of 5,000–20,000 high-quality, domain-specific examples will outperform a noisy dataset of 500,000 scraped documents. FinLora (2025) demonstrated that LoRA fine-tuning on financial datasets achieves performance comparable to full fine-tuning while requiring only 1% of trainable parameters.

Llama 3.2 3B Instruct

Signal classifier and router. Receives raw signals from Node 1, classifies by type/urgency, filters noise, routes to Specialist.

VRAM / Location

24 GB (Node 1 RTX 3090)

Method

LoRA BF16 (rank 16)

Training Time

~2 hours

Dataset Size

5,000–7,000 examples

Dataset Composition

Recommended % split by category

Signal Classification60%

Noise Rejection20%

Edge Cases20%

Category Volume Distribution

Percentage of total training dataset per category

Training Data Categories — Click to Expand

Signal Classificationprimary

60% of dataset

3,000–5,000 examples

Noise Rejectionimportant

20% of dataset

1,000–2,000 examples

Edge Casesimportant

20% of dataset

1,000–1,500 examples

Section 13

Dataset Builder

Track your data collection progress for each LLM layer. Check off sources as you collect them. Progress is saved automatically in your browser. Export a summary report at any time.

Not Ready

Weighted readiness score (critical items count 3×)

0/33

Items Done

0/11

Critical Done

Hours Done

~92h

Hours Left

Gateway

Specialist

Strategist

11 Critical Gaps Detected

The following critical data sources are not yet collected. Training without them will significantly degrade model performance.

[Gateway] Node 1 historical signal logs(~3h)

[Specialist] McMillan 'Options as a Strategic Investment' Q&A(~6h)

[Specialist] tastytrade options education content(~4h)

[Specialist] Your own strategy documentation(~4h)

[Specialist] FinGPT sentiment dataset (HuggingFace)(~2h)

[Specialist] QuantLib-verified Greeks examples(~3h)

+ 5 more critical items…

Layer 1 — Gateway

Model: Llama 3.2 3B·Target: 5,000–7,000 examples·0/9 sources collected

~12h remaining

Signal Classification3,000–5,000 examples

0/4

Node 1 historical signal logscritical

2,000–4,000 records·JSON → JSONL·~3h

Synthetic signal variationsimportant

1,000–2,000 examples·Python script → JSONL·~2h

CBOE IV rank / percentile dataimportant

1–2 years daily·CSV → processed JSONL·~1h

Polygon.io historical OHLCVimportant

2+ years daily bars·Polygon API → JSONL·~1h

Noise Rejection1,000–2,000 examples

0/2

Historical duplicate/stale signalsimportant

500–1,000 records·JSON → JSONL·~1h

Synthetic malformed signal examplesimportant

500–1,000 examples·Python script → JSONL·~1h

Edge Cases1,000–1,500 examples

0/3

FOMC / CPI / NFP calendar datesimportant

200–500 event dates·CSV → JSONL·~1h

Earnings calendar dataimportant

1–3 years of dates·API → JSONL·~1h

Conflicting signal examplessupporting

200–500 examples·Python script → JSONL·~1h

Progress and notes are automatically saved to your browser's local storage. They will persist across page refreshes.·Expand any item to add inventory notes (storage path, actual volume, issues encountered).