Note

edge-ai-inference-optimizer

◈ Obsidian Startup March 17, 2026

Edge AI Inference Optimizer

The intersection: AI cost/performance optimization × Hosung's hardware/GPU/power background Not: a cloud API proxy. Not LabCast. The combination of both directions.

The Problem

Companies building hardware products with AI (robotics, medical devices, industrial IoT, autonomous systems) need to run models on constrained hardware. They're stuck:

Model too slow → latency issues in production
Model too big → runs out of memory on device
Model too power-hungry → drains battery, overheats
No good tooling to diagnose or fix any of it

They can't "just use a cheaper API" — it's their own hardware, their own inference stack. They need someone who understands both AI models AND power/thermal systems.

That's Hosung. 3 years optimizing GPU power systems at Nvidia, applied to a new domain.

The Product

Software daemon that runs alongside their inference stack (TensorRT, llama.cpp, ONNX Runtime, vLLM):

AI model running on device (Jetson, Pi, custom board)
        ↓
[Daemon — C++]  monitors every inference call
        ↓
Tracks: latency / power draw / memory / thermal state
Recommends: quantization level, batch size, model swap
        ↓
Dashboard
"Your model is using 4W average. Switch to INT4 quant → 1.8W, 3% accuracy loss."
"Peak memory 2.1GB. Reduce batch size to 4 → fits in 1.2GB, 8% slower."

No custom hardware. Runs as software on their existing devices.

Why This Beats the Proxy Direction

Cloud API optimization	Edge inference optimization
Portkey, Martian, RouteLLM already exist	Nobody owns this space yet
Hosung's skills aren't the moat	Hosung's GPU/power background IS the moat
Commoditizing fast	Growing with edge AI explosion
Buyer can switch cloud providers	Buyer is locked to their own hardware
Proxy is a feature, not a company	Deep systems tooling = durable company

Who Buys This

Robotics companies — need AI on robot, constrained compute, battery life matters
Medical device companies — adding AI inference, strict power/thermal requirements
Industrial IoT — deploying vision models on edge nodes
Autonomous systems — any company shipping hardware with a model inside
AI hardware startups — building inference chips, need optimization tooling

The buyer is an engineering manager or CTO at a hardware company that shipped (or is shipping) a product with AI inside. They have budget. They have a specific, measurable problem.

The LabCast Connection

LabCast becomes the remote monitoring layer — not a separate product, just one feature.

Companies deploying edge AI need to monitor devices remotely. The Raspberry Pi debug box is how you get visibility into what's running on their hardware in the field. Same hardware approach from LabCast, different product frame:

LabCast framing: "remote hardware debugging tool"
This framing: "remote AI performance monitoring for edge devices"

Same box. Broader use case. Easier to sell.

Hosung's Unfair Advantage

His Nvidia experience	How it applies here
GPU power systems	Knows exactly how power draw maps to model configuration
Display/thermal systems	Understands thermal throttling, sustained vs burst performance
Systems programming (C++)	Builds the low-overhead daemon — can't be replicated by a web dev
Hardware-software interface	Speaks the language of embedded engineers / hardware CTOs

Angie's Role

Dashboard product design — the "before/after" power/latency visualization is the sales asset
Customer discovery — talking to hardware CTOs, framing the problem
Go-to-market — landing page, HN launch, positioning
Not warehouse/logistics — clean break from day job

MVP Build

What Hosung builds (month 1-2):

C++ daemon that hooks into llama.cpp / ONNX Runtime
Monitors: inference latency, power draw (via sysfs/NVML), memory usage, thermal state
Outputs structured JSON logs

What Angie builds (month 2-3):

Dashboard that reads the logs
Before/after comparison when config changes
Recommendation engine UI ("try these 3 changes")

V1 doesn't need ML. Rule-based recommendations from known quantization/batching tradeoffs are enough to show value.

Open Source Strategy

Same open core model:

OSS:

The daemon (gets stars from embedded AI engineers)
Local dashboard

Paid cloud ($500-1000/mo):

Fleet monitoring (multiple devices)
Historical trends
Auto-tuning (ML recommendations that learn from your hardware profile)
Remote access layer (the LabCast piece)

YC Story

"We built optimization tooling for AI running on edge hardware. Companies shipping robots and medical devices are running models at 40% efficiency because nobody has built the tooling to optimize on-device inference. Hosung spent 3 years optimizing GPU power systems at Nvidia — this is the same problem applied to edge AI. We reduce power consumption by X% and latency by Y% with zero model changes."

Specific. Credentialed. Timely. Hardware moat. Nobody else can say this.

Open Questions

How many companies are actively deploying AI on edge hardware today vs. 12 months from now? (timing risk)
Is the pain acute enough to pay for tooling, or do they just hire a systems engineer?
Does Hosung get excited about this or does it feel too close to his day job?
What's the fastest way to find 5 companies with this exact problem?

Written 2026-03-17