Note

amazing summary from claude

Okay, now I see the full picture:

  - Hosung is at Nvidia but his team (display/power) is losing relevance as

  Nvidia goes all-in on AI/data center

  - He can't go remote because hardware work

  - You're long distance and want to be together

  - Angie's company won't let her go

  The startup isn't just a side project — it's the exit plan. It solves three

  problems at once:

  1. Hosung pivots from a shrinking team into AI/robotics (where the industry is

   going)

  2. He gets to work remotely (or from the same city as you)

  3. You build something together

  That changes the question from "what's the best market" to "what can Hosung

  start building now, part-time, that gets good enough to leave Nvidia for?"

  Which means it needs to:

  - Let him work from anywhere

  - Be software (not tied to a lab)

  - Use his systems engineering skills but pivot toward AI

  - Show traction fast enough to justify leaving

  Want me to think about it from that angle instead of generating more idea

  lists?    Okay. Let me think about this differently.

  Hosung's real asset: He's a systems engineer who understands low-level

  performance — C++, OS internals, power, hardware-software interface. That's

  rare and valuable. The AI world is desperate for people who can make models

  run fast and cheap.

  The AI industry's biggest problem right now: AI agents and models are

  expensive to run. MCP eats 40-50% of context windows. Inference costs are the

  #1 barrier to deploying agents at scale. Every company building AI agents is

  bleeding money on compute.

  The idea:

  An AI inference optimization engine. Make AI models and agents run 2-5x faster

   and cheaper. Hosung's systems engineering — the same skills he uses to

  optimize GPU power and OS performance at Nvidia — applied to making AI

  inference efficient.

  Why this works for your situation:

  - 100% software, 100% remote — no lab, no hardware

  - Hosung's exact skills — performance optimization, C++, systems-level

  thinking. He's not learning a new domain, he's applying what he already knows

  to the hottest market

  - Massive market — every company running AI models wants to spend less.

  Inference cost is the #1 concern.

  - Fast traction — measurable result (2x faster, 50% cheaper). Easy to prove

  value.

  - Career pivot — moves Hosung from "display/power engineer" to "AI

  infrastructure" on his resume, even if the startup doesn't become a unicorn

  - Angie's role — product, dashboard, GTM, pricing. She's sold B2B SaaS before.

   This is a B2B dev tools sale.

  Think of it as: Hosung does for AI inference what he did for GPU power

  management at Nvidia but as a product, not an internal tool.

  Does this direction resonate more than the idea lists?  --

  So the problem was differentiation, not demand. The product worked but wasn't

  defensible — anyone can fork the same open source and build a proxy/cache

  tracker.

  That's probably why YC passed. Not "bad idea" but "no moat."

  The question is: what would give you a moat in AI cost optimization?

  Your current product: generic proxy + usage tracking + caching. Easy to

  replicate.

  What would NOT be easy to replicate:

  Hosung's systems engineering. If the optimization happens at a deeper level

  than "proxy that caches responses" — like actual inference-level optimization,

   model routing, token compression, or hardware-aware scheduling — that's hard

  to copy. That requires the kind of C++/systems expertise Hosung has.

  The difference:

  - What you built: Track and cache AI API calls (software plumbing, replicable)

  - What Hosung could build: Optimize HOW the model runs — which model to route

  to, how to compress context, how to batch requests, how to run inference on

  cheaper hardware. Systems-level optimization that requires deep engineering.

  That's closer to what Hosung did at Nvidia — not just monitoring power usage,

  but actually optimizing it at the OS/hardware level.

  Would that reframe excite Hosung more? Going from "tracking dashboard" to "AI

  inference engine that makes agents 5x cheaper through systems-level

  optimization"?

  Or has this whole direction lost its energy for both of you?

❯ yeah that's the direction but how can we do that?                             

Web Search("AI inference optimization startup techniques 2026 model routing

            semantic caching")

  ⎿  Did 1 search in 9s

Web Search("LLM cost reduction techniques smart routing cheaper models 2026")

  ⎿  Did 1 search in 8s

Web Search("AI gateway inference optimization open source 2026 portkey

            martian")

  ⎿  Did 2 searches in 11s

⏺ Here's the honest landscape:

  Your Competitors

  Portkey — Open-source AI gateway. 200+ models, <1ms latency, 10B tokens/day

  processed. Well-funded.

  Martian — $9M raised. Smart routing to the best LLM per request. Accenture

  invested.

  RouteLLM — Open-source routing framework from LMSYS. 85% cost reduction.

  Swfte, Burnwise, others — Multiple startups doing routing + caching.

  The problem: At the proxy/gateway level, it's crowded. That's why YC passed —

  you were building where everyone else is building.

  Where Hosung's Systems Skills Create a REAL Moat

  The existing players all work at the API layer — routing requests between

  cloud providers. That's relatively easy to build (and copy).

  What's HARD and nobody is doing well:

  1. On-device / edge inference optimization

  - Companies want to run models LOCALLY (privacy, latency, cost = $0)

  - But local models are slow and resource-hungry

  - Making a 7B model run 5x faster on a MacBook or edge device = systems

  engineering (memory management, quantization, kernel optimization)

  - This is what Hosung did at Nvidia — optimize software for hardware

  - Competitors: llama.cpp (open source, one guy), Ollama (YC, but wrapper)

  2. Agentic workflow optimization

  - Not just single LLM calls, but entire agent workflows (10-50 LLM calls per

  task)

  - Optimize the whole pipeline: which calls can be parallelized, which can use

  cheaper models, which results can be cached, which tool calls are redundant

  - This requires understanding the SYSTEM, not just individual requests

  - Nobody does this well because agent architectures are new

  3. Hybrid local + cloud routing

  - Simple tasks → run locally (free)

  - Complex tasks → route to cloud (paid)

  - Smart enough to know which is which, in real-time

  - Requires systems-level integration (local inference engine + cloud routing +

   quality estimation)

  My honest recommendation: #2 or #3.

  #2 (Agent workflow optimization) — because it's where the market is going

  (agentic AI), it's unsolved, and it requires systems thinking not just API

  routing. Your existing proxy codebase is a starting point, but you go deeper —

   from "track costs" to "make agents 5x cheaper by optimizing the entire

  workflow."

  #3 (Hybrid local + cloud) — because it combines Hosung's systems skills (local

   inference optimization) with your existing proxy (cloud routing). Nobody has

  nailed this yet.

  Both are software-only, fully remote, and build on what you've already

  started.

  Would either of these get Hosung excited?

  Sources:

  - https://www.swfte.com/blog/intelligent-llm-routing-multi-model-ai

  - https://analyticsweek.com/inference-economics-finops-ai-roi-2026/

  - https://portkey.ai/features/ai-gateway

  - https://withmartian.com/

  - https://lmsys.org/blog/2024-07-01-routellm/

  - https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html