Note
amazing summary from claude
Okay, now I see the full picture:
- Hosung is at Nvidia but his team (display/power) is losing relevance as
Nvidia goes all-in on AI/data center
- He can't go remote because hardware work
- You're long distance and want to be together
- Angie's company won't let her go
The startup isn't just a side project — it's the exit plan. It solves three
problems at once:
1. Hosung pivots from a shrinking team into AI/robotics (where the industry is
going)
2. He gets to work remotely (or from the same city as you)
3. You build something together
That changes the question from "what's the best market" to "what can Hosung
start building now, part-time, that gets good enough to leave Nvidia for?"
Which means it needs to:
- Let him work from anywhere
- Be software (not tied to a lab)
- Use his systems engineering skills but pivot toward AI
- Show traction fast enough to justify leaving
Want me to think about it from that angle instead of generating more idea
lists? Okay. Let me think about this differently.
Hosung's real asset: He's a systems engineer who understands low-level
performance — C++, OS internals, power, hardware-software interface. That's
rare and valuable. The AI world is desperate for people who can make models
run fast and cheap.
The AI industry's biggest problem right now: AI agents and models are
expensive to run. MCP eats 40-50% of context windows. Inference costs are the
#1 barrier to deploying agents at scale. Every company building AI agents is
bleeding money on compute.
The idea:
An AI inference optimization engine. Make AI models and agents run 2-5x faster
and cheaper. Hosung's systems engineering — the same skills he uses to
optimize GPU power and OS performance at Nvidia — applied to making AI
inference efficient.
Why this works for your situation:
- 100% software, 100% remote — no lab, no hardware
- Hosung's exact skills — performance optimization, C++, systems-level
thinking. He's not learning a new domain, he's applying what he already knows
to the hottest market
- Massive market — every company running AI models wants to spend less.
Inference cost is the #1 concern.
- Fast traction — measurable result (2x faster, 50% cheaper). Easy to prove
value.
- Career pivot — moves Hosung from "display/power engineer" to "AI
infrastructure" on his resume, even if the startup doesn't become a unicorn
- Angie's role — product, dashboard, GTM, pricing. She's sold B2B SaaS before.
This is a B2B dev tools sale.
Think of it as: Hosung does for AI inference what he did for GPU power
management at Nvidia — but as a product, not an internal tool.
Does this direction resonate more than the idea lists? --
So the problem was differentiation, not demand. The product worked but wasn't
defensible — anyone can fork the same open source and build a proxy/cache
tracker.
That's probably why YC passed. Not "bad idea" but "no moat."
The question is: what would give you a moat in AI cost optimization?
Your current product: generic proxy + usage tracking + caching. Easy to
replicate.
What would NOT be easy to replicate:
Hosung's systems engineering. If the optimization happens at a deeper level
than "proxy that caches responses" — like actual inference-level optimization,
model routing, token compression, or hardware-aware scheduling — that's hard
to copy. That requires the kind of C++/systems expertise Hosung has.
The difference:
- What you built: Track and cache AI API calls (software plumbing, replicable)
- What Hosung could build: Optimize HOW the model runs — which model to route
to, how to compress context, how to batch requests, how to run inference on
cheaper hardware. Systems-level optimization that requires deep engineering.
That's closer to what Hosung did at Nvidia — not just monitoring power usage,
but actually optimizing it at the OS/hardware level.
Would that reframe excite Hosung more? Going from "tracking dashboard" to "AI
inference engine that makes agents 5x cheaper through systems-level
optimization"?
Or has this whole direction lost its energy for both of you?
❯ yeah that's the direction but how can we do that?
⏺ Web Search("AI inference optimization startup techniques 2026 model routing
semantic caching")
⎿ Did 1 search in 9s
⏺ Web Search("LLM cost reduction techniques smart routing cheaper models 2026")
⎿ Did 1 search in 8s
⏺ Web Search("AI gateway inference optimization open source 2026 portkey
martian")
⎿ Did 2 searches in 11s
⏺ Here's the honest landscape:
Your Competitors
Portkey — Open-source AI gateway. 200+ models, <1ms latency, 10B tokens/day
processed. Well-funded.
Martian — $9M raised. Smart routing to the best LLM per request. Accenture
invested.
RouteLLM — Open-source routing framework from LMSYS. 85% cost reduction.
Swfte, Burnwise, others — Multiple startups doing routing + caching.
The problem: At the proxy/gateway level, it's crowded. That's why YC passed —
you were building where everyone else is building.
Where Hosung's Systems Skills Create a REAL Moat
The existing players all work at the API layer — routing requests between
cloud providers. That's relatively easy to build (and copy).
What's HARD and nobody is doing well:
1. On-device / edge inference optimization
- Companies want to run models LOCALLY (privacy, latency, cost = $0)
- But local models are slow and resource-hungry
- Making a 7B model run 5x faster on a MacBook or edge device = systems
engineering (memory management, quantization, kernel optimization)
- This is what Hosung did at Nvidia — optimize software for hardware
- Competitors: llama.cpp (open source, one guy), Ollama (YC, but wrapper)
2. Agentic workflow optimization
- Not just single LLM calls, but entire agent workflows (10-50 LLM calls per
task)
- Optimize the whole pipeline: which calls can be parallelized, which can use
cheaper models, which results can be cached, which tool calls are redundant
- This requires understanding the SYSTEM, not just individual requests
- Nobody does this well because agent architectures are new
3. Hybrid local + cloud routing
- Simple tasks → run locally (free)
- Complex tasks → route to cloud (paid)
- Smart enough to know which is which, in real-time
- Requires systems-level integration (local inference engine + cloud routing +
quality estimation)
My honest recommendation: #2 or #3.
#2 (Agent workflow optimization) — because it's where the market is going
(agentic AI), it's unsolved, and it requires systems thinking not just API
routing. Your existing proxy codebase is a starting point, but you go deeper —
from "track costs" to "make agents 5x cheaper by optimizing the entire
workflow."
#3 (Hybrid local + cloud) — because it combines Hosung's systems skills (local
inference optimization) with your existing proxy (cloud routing). Nobody has
nailed this yet.
Both are software-only, fully remote, and build on what you've already
started.
Would either of these get Hosung excited?
Sources:
- https://www.swfte.com/blog/intelligent-llm-routing-multi-model-ai
- https://analyticsweek.com/inference-economics-finops-ai-roi-2026/
- https://portkey.ai/features/ai-gateway