Note
edge-ai-inference-optimizer
Edge AI Inference Optimizer
The intersection: AI cost/performance optimization × Hosung's hardware/GPU/power background Not: a cloud API proxy. Not LabCast. The combination of both directions.
The Problem
Companies building hardware products with AI (robotics, medical devices, industrial IoT, autonomous systems) need to run models on constrained hardware. They're stuck:
- Model too slow → latency issues in production
- Model too big → runs out of memory on device
- Model too power-hungry → drains battery, overheats
- No good tooling to diagnose or fix any of it
They can't "just use a cheaper API" — it's their own hardware, their own inference stack. They need someone who understands both AI models AND power/thermal systems.
That's Hosung. 3 years optimizing GPU power systems at Nvidia, applied to a new domain.
The Product
Software daemon that runs alongside their inference stack (TensorRT, llama.cpp, ONNX Runtime, vLLM):
AI model running on device (Jetson, Pi, custom board)
↓
[Daemon — C++] monitors every inference call
↓
Tracks: latency / power draw / memory / thermal state
Recommends: quantization level, batch size, model swap
↓
Dashboard
"Your model is using 4W average. Switch to INT4 quant → 1.8W, 3% accuracy loss."
"Peak memory 2.1GB. Reduce batch size to 4 → fits in 1.2GB, 8% slower."
No custom hardware. Runs as software on their existing devices.
Why This Beats the Proxy Direction
| Cloud API optimization | Edge inference optimization |
|---|---|
| Portkey, Martian, RouteLLM already exist | Nobody owns this space yet |
| Hosung's skills aren't the moat | Hosung's GPU/power background IS the moat |
| Commoditizing fast | Growing with edge AI explosion |
| Buyer can switch cloud providers | Buyer is locked to their own hardware |
| Proxy is a feature, not a company | Deep systems tooling = durable company |
Who Buys This
- Robotics companies — need AI on robot, constrained compute, battery life matters
- Medical device companies — adding AI inference, strict power/thermal requirements
- Industrial IoT — deploying vision models on edge nodes
- Autonomous systems — any company shipping hardware with a model inside
- AI hardware startups — building inference chips, need optimization tooling
The buyer is an engineering manager or CTO at a hardware company that shipped (or is shipping) a product with AI inside. They have budget. They have a specific, measurable problem.
The LabCast Connection
LabCast becomes the remote monitoring layer — not a separate product, just one feature.
Companies deploying edge AI need to monitor devices remotely. The Raspberry Pi debug box is how you get visibility into what's running on their hardware in the field. Same hardware approach from LabCast, different product frame:
- LabCast framing: "remote hardware debugging tool"
- This framing: "remote AI performance monitoring for edge devices"
Same box. Broader use case. Easier to sell.
Hosung's Unfair Advantage
| His Nvidia experience | How it applies here |
|---|---|
| GPU power systems | Knows exactly how power draw maps to model configuration |
| Display/thermal systems | Understands thermal throttling, sustained vs burst performance |
| Systems programming (C++) | Builds the low-overhead daemon — can't be replicated by a web dev |
| Hardware-software interface | Speaks the language of embedded engineers / hardware CTOs |
Angie's Role
- Dashboard product design — the "before/after" power/latency visualization is the sales asset
- Customer discovery — talking to hardware CTOs, framing the problem
- Go-to-market — landing page, HN launch, positioning
- Not warehouse/logistics — clean break from day job
MVP Build
What Hosung builds (month 1-2):
- C++ daemon that hooks into llama.cpp / ONNX Runtime
- Monitors: inference latency, power draw (via sysfs/NVML), memory usage, thermal state
- Outputs structured JSON logs
What Angie builds (month 2-3):
- Dashboard that reads the logs
- Before/after comparison when config changes
- Recommendation engine UI ("try these 3 changes")
V1 doesn't need ML. Rule-based recommendations from known quantization/batching tradeoffs are enough to show value.
Open Source Strategy
Same open core model:
OSS:
- The daemon (gets stars from embedded AI engineers)
- Local dashboard
Paid cloud ($500-1000/mo):
- Fleet monitoring (multiple devices)
- Historical trends
- Auto-tuning (ML recommendations that learn from your hardware profile)
- Remote access layer (the LabCast piece)
YC Story
"We built optimization tooling for AI running on edge hardware. Companies shipping robots and medical devices are running models at 40% efficiency because nobody has built the tooling to optimize on-device inference. Hosung spent 3 years optimizing GPU power systems at Nvidia — this is the same problem applied to edge AI. We reduce power consumption by X% and latency by Y% with zero model changes."
Specific. Credentialed. Timely. Hardware moat. Nobody else can say this.
Open Questions
- How many companies are actively deploying AI on edge hardware today vs. 12 months from now? (timing risk)
- Is the pain acute enough to pay for tooling, or do they just hire a systems engineer?
- Does Hosung get excited about this or does it feel too close to his day job?
- What's the fastest way to find 5 companies with this exact problem?
Written 2026-03-17