Note

edge-ai-inference-optimizer

Edge AI Inference Optimizer

The intersection: AI cost/performance optimization × Hosung's hardware/GPU/power background Not: a cloud API proxy. Not LabCast. The combination of both directions.


The Problem

Companies building hardware products with AI (robotics, medical devices, industrial IoT, autonomous systems) need to run models on constrained hardware. They're stuck:

  • Model too slow → latency issues in production
  • Model too big → runs out of memory on device
  • Model too power-hungry → drains battery, overheats
  • No good tooling to diagnose or fix any of it

They can't "just use a cheaper API" — it's their own hardware, their own inference stack. They need someone who understands both AI models AND power/thermal systems.

That's Hosung. 3 years optimizing GPU power systems at Nvidia, applied to a new domain.


The Product

Software daemon that runs alongside their inference stack (TensorRT, llama.cpp, ONNX Runtime, vLLM):

AI model running on device (Jetson, Pi, custom board)
        ↓
[Daemon — C++]  monitors every inference call
        ↓
Tracks: latency / power draw / memory / thermal state
Recommends: quantization level, batch size, model swap
        ↓
Dashboard
"Your model is using 4W average. Switch to INT4 quant → 1.8W, 3% accuracy loss."
"Peak memory 2.1GB. Reduce batch size to 4 → fits in 1.2GB, 8% slower."

No custom hardware. Runs as software on their existing devices.


Why This Beats the Proxy Direction

Cloud API optimization Edge inference optimization
Portkey, Martian, RouteLLM already exist Nobody owns this space yet
Hosung's skills aren't the moat Hosung's GPU/power background IS the moat
Commoditizing fast Growing with edge AI explosion
Buyer can switch cloud providers Buyer is locked to their own hardware
Proxy is a feature, not a company Deep systems tooling = durable company

Who Buys This

  • Robotics companies — need AI on robot, constrained compute, battery life matters
  • Medical device companies — adding AI inference, strict power/thermal requirements
  • Industrial IoT — deploying vision models on edge nodes
  • Autonomous systems — any company shipping hardware with a model inside
  • AI hardware startups — building inference chips, need optimization tooling

The buyer is an engineering manager or CTO at a hardware company that shipped (or is shipping) a product with AI inside. They have budget. They have a specific, measurable problem.


The LabCast Connection

LabCast becomes the remote monitoring layer — not a separate product, just one feature.

Companies deploying edge AI need to monitor devices remotely. The Raspberry Pi debug box is how you get visibility into what's running on their hardware in the field. Same hardware approach from LabCast, different product frame:

  • LabCast framing: "remote hardware debugging tool"
  • This framing: "remote AI performance monitoring for edge devices"

Same box. Broader use case. Easier to sell.


Hosung's Unfair Advantage

His Nvidia experience How it applies here
GPU power systems Knows exactly how power draw maps to model configuration
Display/thermal systems Understands thermal throttling, sustained vs burst performance
Systems programming (C++) Builds the low-overhead daemon — can't be replicated by a web dev
Hardware-software interface Speaks the language of embedded engineers / hardware CTOs

Angie's Role

  • Dashboard product design — the "before/after" power/latency visualization is the sales asset
  • Customer discovery — talking to hardware CTOs, framing the problem
  • Go-to-market — landing page, HN launch, positioning
  • Not warehouse/logistics — clean break from day job

MVP Build

What Hosung builds (month 1-2):

  • C++ daemon that hooks into llama.cpp / ONNX Runtime
  • Monitors: inference latency, power draw (via sysfs/NVML), memory usage, thermal state
  • Outputs structured JSON logs

What Angie builds (month 2-3):

  • Dashboard that reads the logs
  • Before/after comparison when config changes
  • Recommendation engine UI ("try these 3 changes")

V1 doesn't need ML. Rule-based recommendations from known quantization/batching tradeoffs are enough to show value.


Open Source Strategy

Same open core model:

OSS:

  • The daemon (gets stars from embedded AI engineers)
  • Local dashboard

Paid cloud ($500-1000/mo):

  • Fleet monitoring (multiple devices)
  • Historical trends
  • Auto-tuning (ML recommendations that learn from your hardware profile)
  • Remote access layer (the LabCast piece)

YC Story

"We built optimization tooling for AI running on edge hardware. Companies shipping robots and medical devices are running models at 40% efficiency because nobody has built the tooling to optimize on-device inference. Hosung spent 3 years optimizing GPU power systems at Nvidia — this is the same problem applied to edge AI. We reduce power consumption by X% and latency by Y% with zero model changes."

Specific. Credentialed. Timely. Hardware moat. Nobody else can say this.


Open Questions

  • How many companies are actively deploying AI on edge hardware today vs. 12 months from now? (timing risk)
  • Is the pain acute enough to pay for tooling, or do they just hire a systems engineer?
  • Does Hosung get excited about this or does it feel too close to his day job?
  • What's the fastest way to find 5 companies with this exact problem?

Written 2026-03-17