Note

amazing summary from claude

◈ Obsidian Startup March 17, 2026

Okay, now I see the full picture:

- Hosung is at Nvidia but his team (display/power) is losing relevance as

Nvidia goes all-in on AI/data center

- He can't go remote because hardware work

- You're long distance and want to be together

- Angie's company won't let her go

The startup isn't just a side project — it's the exit plan. It solves three

problems at once:

1. Hosung pivots from a shrinking team into AI/robotics (where the industry is

going)

2. He gets to work remotely (or from the same city as you)

3. You build something together

That changes the question from "what's the best market" to "what can Hosung

start building now, part-time, that gets good enough to leave Nvidia for?"

Which means it needs to:

- Let him work from anywhere

- Be software (not tied to a lab)

- Use his systems engineering skills but pivot toward AI

- Show traction fast enough to justify leaving

Want me to think about it from that angle instead of generating more idea

lists? Okay. Let me think about this differently.

Hosung's real asset: He's a systems engineer who understands low-level

performance — C++, OS internals, power, hardware-software interface. That's

rare and valuable. The AI world is desperate for people who can make models

run fast and cheap.

The AI industry's biggest problem right now: AI agents and models are

expensive to run. MCP eats 40-50% of context windows. Inference costs are the

#1 barrier to deploying agents at scale. Every company building AI agents is

bleeding money on compute.

The idea:

An AI inference optimization engine. Make AI models and agents run 2-5x faster

and cheaper. Hosung's systems engineering — the same skills he uses to

optimize GPU power and OS performance at Nvidia — applied to making AI

inference efficient.

Why this works for your situation:

- 100% software, 100% remote — no lab, no hardware

- Hosung's exact skills — performance optimization, C++, systems-level

thinking. He's not learning a new domain, he's applying what he already knows

to the hottest market

- Massive market — every company running AI models wants to spend less.

Inference cost is the #1 concern.

- Fast traction — measurable result (2x faster, 50% cheaper). Easy to prove

value.

- Career pivot — moves Hosung from "display/power engineer" to "AI

infrastructure" on his resume, even if the startup doesn't become a unicorn

- Angie's role — product, dashboard, GTM, pricing. She's sold B2B SaaS before.

This is a B2B dev tools sale.

Think of it as: Hosung does for AI inference what he did for GPU power

management at Nvidia — but as a product, not an internal tool.

Does this direction resonate more than the idea lists? --

So the problem was differentiation, not demand. The product worked but wasn't

defensible — anyone can fork the same open source and build a proxy/cache

tracker.

That's probably why YC passed. Not "bad idea" but "no moat."

The question is: what would give you a moat in AI cost optimization?

Your current product: generic proxy + usage tracking + caching. Easy to

replicate.

What would NOT be easy to replicate:

Hosung's systems engineering. If the optimization happens at a deeper level

than "proxy that caches responses" — like actual inference-level optimization,

model routing, token compression, or hardware-aware scheduling — that's hard

to copy. That requires the kind of C++/systems expertise Hosung has.

The difference:

- What you built: Track and cache AI API calls (software plumbing, replicable)

- What Hosung could build: Optimize HOW the model runs — which model to route

to, how to compress context, how to batch requests, how to run inference on

cheaper hardware. Systems-level optimization that requires deep engineering.

That's closer to what Hosung did at Nvidia — not just monitoring power usage,

but actually optimizing it at the OS/hardware level.

Would that reframe excite Hosung more? Going from "tracking dashboard" to "AI

inference engine that makes agents 5x cheaper through systems-level

optimization"?

Or has this whole direction lost its energy for both of you?

❯ yeah that's the direction but how can we do that?

⏺ Web Search("AI inference optimization startup techniques 2026 model routing

semantic caching")

⎿ Did 1 search in 9s

⏺ Web Search("LLM cost reduction techniques smart routing cheaper models 2026")

⎿ Did 1 search in 8s

⏺ Web Search("AI gateway inference optimization open source 2026 portkey

martian")

⎿ Did 2 searches in 11s

⏺ Here's the honest landscape:

Your Competitors

Portkey — Open-source AI gateway. 200+ models, <1ms latency, 10B tokens/day

processed. Well-funded.

Martian — $9M raised. Smart routing to the best LLM per request. Accenture

invested.

RouteLLM — Open-source routing framework from LMSYS. 85% cost reduction.

Swfte, Burnwise, others — Multiple startups doing routing + caching.

The problem: At the proxy/gateway level, it's crowded. That's why YC passed —

you were building where everyone else is building.

Where Hosung's Systems Skills Create a REAL Moat

The existing players all work at the API layer — routing requests between

cloud providers. That's relatively easy to build (and copy).

What's HARD and nobody is doing well:

1. On-device / edge inference optimization

- Companies want to run models LOCALLY (privacy, latency, cost = $0)

- But local models are slow and resource-hungry

- Making a 7B model run 5x faster on a MacBook or edge device = systems

engineering (memory management, quantization, kernel optimization)

- This is what Hosung did at Nvidia — optimize software for hardware

- Competitors: llama.cpp (open source, one guy), Ollama (YC, but wrapper)

2. Agentic workflow optimization

- Not just single LLM calls, but entire agent workflows (10-50 LLM calls per

task)

- Optimize the whole pipeline: which calls can be parallelized, which can use

cheaper models, which results can be cached, which tool calls are redundant

- This requires understanding the SYSTEM, not just individual requests

- Nobody does this well because agent architectures are new

3. Hybrid local + cloud routing

- Simple tasks → run locally (free)

- Complex tasks → route to cloud (paid)

- Smart enough to know which is which, in real-time

- Requires systems-level integration (local inference engine + cloud routing +

quality estimation)

My honest recommendation: #2 or #3.

#2 (Agent workflow optimization) — because it's where the market is going

(agentic AI), it's unsolved, and it requires systems thinking not just API

routing. Your existing proxy codebase is a starting point, but you go deeper —

from "track costs" to "make agents 5x cheaper by optimizing the entire

workflow."

#3 (Hybrid local + cloud) — because it combines Hosung's systems skills (local

inference optimization) with your existing proxy (cloud routing). Nobody has

nailed this yet.

Both are software-only, fully remote, and build on what you've already

started.

Would either of these get Hosung excited?

Sources:

- https://www.swfte.com/blog/intelligent-llm-routing-multi-model-ai

- https://analyticsweek.com/inference-economics-finops-ai-roi-2026/

- https://portkey.ai/features/ai-gateway

- https://withmartian.com/

- https://lmsys.org/blog/2024-07-01-routellm/

- https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html