Name: Synorb Manifests
Brand: Synorb

01 / Manifest

One source event. Three machine-native views.

A Manifest is the complete unit Synorb writes. Agents can inspect the canonical JSON, read a concise Brief, or act on individual Signals without losing lineage back to the original source.

STATIC SAMPLE

OBJECT 1 / 4

Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters

NVIDIA argues AI infrastructure should be measured by cost per token, not raw FLOPS per dollar, and shows how full-stack optimization drives lower token cost and higher throughput.

home domain: engineering-technology source: nvidia-blog format: text significance: high

26d ago

65 claims

1 stream

Manifest

id 1776319402051417156

{
  "manifest_id": "1776319402051417156",
  "record_id": "1776319400318871222",
  "signal_id": "1776327224706191226",
  "brief_id": "1776327224478196811",
  "status": "routed",
  "record_ready": true,
  "signal_ready": true,
  "brief_ready": true,
  "claims_extracted": true,
  "priority": 0,
  "stream_ids": [
    "17723038993561924"
  ],
  "stream_names": [
    "nvidia"
  ],
  "source": {
    "name": "nvidia-blog",
    "source_channel": "nvidia-newsroom-rss",
    "source_type": "organization",
    "media_format": "text",
    "claim_type": "analysis",
    "published_date": "2026-04-15",
    "title": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
    "url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories"
  },
  "domain_classification": {
    "home_domain": "engineering-technology",
    "cross_domains": [
      "economics-business-work",
      "everyday-life-practical-knowledge",
      "physical-sciences-mathematics"
    ]
  },
  "outputs": {
    "signals_count": 65,
    "record_version": 5
  },
  "lineage": {
    "captured_at": "2026-04-16 06:03:19.732319+00:00",
    "manifest_created_on": "2026-04-16 06:03:19.732319+00:00",
    "manifest_updated_on": "2026-05-05 17:44:01.357487+00:00"
  }
}

Signal

{
  "signal_id": "1776327224706191226",
  "manifest_id": "1776319402051417156",
  "record_id": "1776319400318871222",
  "stream_ids": [
    "17723038993561924"
  ],
  "stream_names": [
    "nvidia"
  ],
  "significance": "high",
  "source_published_date": "2026-04-15",
  "headline": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
  "summary": "NVIDIA asserts that the economic evaluation of AI infrastructure must shift from traditional input metrics like compute cost and FLOPS per dollar to \"cost per token.\" This change is necessitated by the evolution of data centers into AI token factories, where intelligence is manufactured as tokens. NVIDIA positions its full-stack codesign, particularly with the Blackwell platform, as delivering the industry's lowest cost per token and highest token throughput, enabling profitable AI scaling for enterprises.",
  "body": {
    "claims": [
      {
        "quote": null,
        "signal": "This claim establishes the historical context for data centers, highlighting the shift in their function with the advent of AI.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Traditional data centers only stored, retrieved and processed data.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This signals a fundamental transformation in data center purpose, indicating a new operational paradigm for AI infrastructure.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In the generative and agentic AI era, data facilities have evolved into AI token factories.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This identifies the dominant activity in modern AI infrastructure, emphasizing the need for metrics tailored to inference performance.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "AI inference is becoming the primary workload for these AI token factories.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This defines the core output of AI infrastructure, establishing the basis for output-centric economic evaluation.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "The primary output of AI token factories is intelligence manufactured in the form of tokens.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This claim highlights the necessity for a new evaluation framework for AI infrastructure, directly impacting investment and procurement strategies.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": true,
        "claim_text": "This transformation demands a corresponding shift in how the economics of AI infrastructure, including total cost of ownership (TCO), is assessed.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": 0
      },
      {
        "quote": null,
        "signal": "This identifies common, but potentially outdated, evaluation practices that enterprises should reconsider for AI infrastructure investments.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Enterprises evaluating AI infrastructure still too often focus on peak chip specifications, compute cost or floating point operations per second for every dollar spent (FLOPS per dollar).",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This defines a traditional input metric, clarifying its scope but also implying its limitations for AI-specific evaluation.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Compute cost is what enterprises pay for AI infrastructure, whether rented from cloud providers or owned on premises.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This defines another traditional input metric, setting the stage for why it's insufficient for real-world AI output.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "FLOPS per dollar is how much raw computing power an enterprise gets for every dollar spent.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This claim establishes the core distinction that necessitates a new metric, indicating that raw power doesn't equate to business value in AI.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Raw compute and real-world token output are not the same thing.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides the definition of the proposed key metric, guiding how enterprises should measure AI infrastructure efficiency.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Cost per token is an enterprise's all-in cost to produce each delivered token, usually represented as cost per million tokens.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This categorizes traditional metrics as insufficient, reinforcing the need for an output-focused approach.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Compute cost and FLOPS per dollar are merely input metrics.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights a critical strategic misalignment for businesses investing in AI, urging a shift in optimization focus.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Optimizing for inputs while the business runs on output is a fundamental mismatch.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This claim identifies cost per token as the direct determinant of AI scalability and profitability, making it a crucial metric for strategic planning.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": true,
        "claim_text": "Cost per token determines whether enterprises can profitably scale AI.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": 1
      },
      {
        "quote": null,
        "signal": "This details the comprehensive nature of the cost per token metric, indicating it provides a holistic view of AI infrastructure efficiency.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Cost per token directly accounts for hardware performance, software optimization, ecosystem support and real-world utilization.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This is a direct competitive claim by NVIDIA, signaling its market leadership in AI infrastructure efficiency.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": true,
        "claim_text": "NVIDIA delivers the lowest cost per token in the industry.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": 2
      },
      {
        "quote": null,
        "signal": "This provides a key operational directive for enterprises seeking to reduce AI costs, focusing on output maximization.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Optimizing token cost requires maximizing the delivered token output (the denominator in the cost per million tokens equation).",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights a common misdirection in AI infrastructure evaluation, indicating a need for re-education on effective metrics.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Many enterprises evaluating AI infrastructure focus on the numerator (the cost per GPU per hour).",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This explains the direct financial benefit of optimizing token output, linking technical efficiency to business profitability.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Minimizing token cost by increasing token output drives down cost per token, which grows the profit margin on every interaction served.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This outlines how maximizing token output directly leads to increased revenue and better utilization of existing AI infrastructure.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "More tokens delivered per second translates to more tokens per megawatt, which means more intelligence to use in AI-powered products and services, generating more revenue from the same infrastructure investment.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This emphasizes the strategic error of incomplete analysis in AI infrastructure procurement, guiding decision-makers to look deeper.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Focusing only on the numerator (cost per GPU hour) means missing what drives the denominator (delivered token output).",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This analogy highlights the hidden complexities and critical factors that influence actual AI performance and cost efficiency.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "The denominator (delivered token output) represents key factors that determine real-world token output, like an \"inference iceberg\" beneath the surface.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This lists common but insufficient metrics, guiding enterprises away from superficial evaluations.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Surface-level inquiry for AI infrastructure includes cost per GPU hour, peak petaflops, high-bandwidth memory capacity, and FLOPS per dollar.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific, advanced metric for evaluating AI infrastructure, particularly relevant for complex AI models.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask about cost per million tokens, specifically for large-scale mixture-of-experts (MoE) reasoning models.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This identifies a prevalent AI model type, indicating that infrastructure choices should be optimized for its specific requirements.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Large-scale mixture-of-experts (MoE) reasoning models represent the most widely deployed type of AI models.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights a crucial efficiency metric for on-premises AI infrastructure, impacting energy consumption and capital expenditure.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask about delivered token output per megawatt, which is critical for on-premises deployments.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This explains the financial imperative behind energy efficiency for on-premises AI, guiding infrastructure design and investment.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Maximizing intelligence produced per megawatt is critical for on-premises deployments due to substantial capital commitment to land, power, and infrastructure.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This points to a specific technical requirement for efficient MoE model deployment, crucial for network architects and infrastructure planners.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if the scale-up interconnect can handle the \"all-to-all\" traffic of MoE models.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights a technical optimization (FP4 precision) that can significantly impact performance and cost, relevant for hardware and software selection.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if FP4 precision is supported and if the inference stack can make use of FP4 while maintaining high accuracy.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This identifies software features that enhance user experience and efficiency, impacting the perceived value and adoption of AI services.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if the inference runtime supports speculative decoding or multi-token prediction to increase user interactivity.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This specifies advanced serving layer optimizations that are critical for maximizing throughput and minimizing latency in AI deployments.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if the serving layer supports disaggregated serving, KV-aware routing, KV-cache offloading and other optimizations.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This points to specific performance requirements for agentic AI, guiding infrastructure choices for this emerging workload.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if the platform supports the unique workload requirements of agentic AI, including ultralow latency, high throughput and large input sequence lengths.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This emphasizes the importance of a versatile platform that supports the entire AI lifecycle, crucial for long-term investment value and operational flexibility.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "In-depth cost analysis should ask if the platform supports the full lifecycle, from training and post-training to high-scale inference, across all model architectures, to ensure infrastructure fungibility and high utilization.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights the critical need for a holistic, integrated approach to AI infrastructure optimization, warning against piecemeal solutions.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Every algorithmic, hardware and software optimization must be active and integrated, or the denominator (delivered token output) collapses.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a direct warning against superficial cost-saving measures, emphasizing that true cost efficiency comes from output, not just input price.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "A \"cheaper\" GPU that delivers significantly fewer tokens per second results in a much higher cost per token.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This reinforces the value of a fully integrated and optimized AI stack, suggesting a synergistic effect on performance and cost.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "AI infrastructure that gets it right across the full stack ensures that every optimization enhances the others.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This introduces a specific case study to illustrate the practical implications of different AI infrastructure evaluation metrics.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "The DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a direct cost comparison between two NVIDIA platforms based on a traditional metric, setting up the contrast with output-based metrics.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Looking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This reinforces the inadequacy of compute cost as a standalone metric for AI infrastructure, guiding decision-makers to look beyond initial price.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Compute cost says nothing about the output that investment buys.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This presents a theoretical performance advantage for Blackwell based on a traditional metric, which will be contrasted with real-world outcomes.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights a massive efficiency gain for Blackwell in terms of energy-to-output, crucial for large-scale and sustainable AI deployments.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": true,
        "claim_text": "NVIDIA Blackwell delivers more than 50x greater token output per watt than NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": 3
      },
      {
        "quote": null,
        "signal": "This is a key performance metric demonstrating Blackwell's superior economic efficiency, directly supporting the article's central argument.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": true,
        "claim_text": "NVIDIA Blackwell results in nearly 35x lower cost per million tokens compared to NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": 3
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Hopper's hourly GPU cost, used for comparative analysis.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Hopper (HGX H200) has a cost per GPU per hour of $1.41.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Blackwell's hourly GPU cost, used for comparative analysis.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell (GB300 NVL72) has a cost per GPU per hour of $2.65.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights the higher initial compute cost of Blackwell, which is then offset by its efficiency gains.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell's cost per GPU per hour is 2x that of NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Hopper's FLOPS per dollar, used for comparative analysis.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Hopper has 2.8 PFLOPS per dollar.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Blackwell's FLOPS per dollar, used for comparative analysis.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell has 5.6 PFLOPS per dollar.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This shows Blackwell's theoretical performance advantage, which is then dwarfed by its real-world token output efficiency.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell's PFLOPS per dollar is 2x that of NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Hopper's token output, used for comparative analysis.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Hopper has 90,000 tokens per second per GPU.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Blackwell's token output, demonstrating its higher raw throughput.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell has 650,000 tokens per second per GPU.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This quantifies Blackwell's significant improvement in raw token generation per GPU.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell's tokens per second per GPU is 6.5x that of NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Hopper's energy efficiency in token generation.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Hopper has 54,000 tokens per second per MW.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Blackwell's energy efficiency, highlighting its substantial improvement.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell has 2.8 million tokens per second per MW.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This quantifies Blackwell's dramatic improvement in energy efficiency for AI workloads, a critical factor for large-scale deployments.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell's tokens per second per MW is 50x that of NVIDIA Hopper.",
        "claim_type": "data",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Hopper's cost per token, serving as a baseline for comparison.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Hopper has a cost per million tokens of $4.20.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a specific data point for Blackwell's cost per token, showcasing its superior economic efficiency.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA Blackwell has a cost per million tokens of $0.12.",
        "claim_type": "data",
        "confidence": "measured",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides the provenance of the performance data, allowing for assessment of its credibility and potential biases.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17723038993599085
          },
          {
            "name": "SemiAnalysis",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17723038994126390
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "The data is sourced from NVIDIA analysis and the SemiAnalysis InferenceX v2 benchmark.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This concludes that Blackwell offers significantly greater business value, justifying its adoption despite potentially higher initial costs.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "The massive divergence proves NVIDIA Blackwell delivers a massive leap in business value over the earlier Hopper generation.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This reinforces the economic justification for investing in Blackwell, indicating a strong return on investment despite higher upfront expenses.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "This leap in business value for NVIDIA Blackwell far outpaces any increase in system cost.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This strongly advises against using traditional metrics for AI inference, guiding enterprises towards more relevant economic evaluations.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Comparing AI infrastructure based on compute cost or theoretical FLOPS per dollar isn't just insufficient; it doesn't provide an accurate representation of inference economics.",
        "claim_type": "analysis",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides a clear directive for how enterprises should conduct AI infrastructure evaluations to maximize financial outcomes.",
        "entities": [],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Accurate evaluation of AI infrastructure's revenue potential and profitability requires a shift from input metrics to cost per token and delivered token output.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This reiterates NVIDIA's competitive advantage, highlighting its comprehensive approach to AI infrastructure optimization.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "NVIDIA delivers the industry's lowest token cost and highest token throughput through extreme codesign across compute, networking, memory, storage, software and partner technologies.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This highlights the long-term value proposition of NVIDIA's ecosystem, indicating continuous improvement and cost reduction for existing customers.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Constant optimization of open source inference software such as vLLM, SGLang, NVIDIA TensorRT-LLM and NVIDIA Dynamo built on the NVIDIA platform means that on existing NVIDIA infrastructure, token output continues to increase and the cost per token continues to decline long after it's acquired.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This indicates that NVIDIA's claimed advantages are already available and proven in large-scale deployments through its partner network.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Leading cloud providers and NVIDIA cloud partners are already delivering this advantage at scale.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This provides concrete examples of partners leveraging NVIDIA Blackwell to offer competitive AI services, validating NVIDIA's claims.",
        "entities": [
          {
            "name": "CoreWeave",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17733540864854276
          },
          {
            "name": "Nebius",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17724054197223435
          },
          {
            "name": "Nscale",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17733518877659987
          },
          {
            "name": "Together AI",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17723038994308374
          },
          {
            "name": "NVIDIA",
            "role": "mentioned",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "Partners such as CoreWeave, Nebius, Nscale and Together AI have deployed NVIDIA Blackwell infrastructure and optimized their stacks to bring enterprises the lowest token cost available today.",
        "claim_type": "event",
        "confidence": "stated",
        "key_point_index": null
      },
      {
        "quote": null,
        "signal": "This emphasizes the integrated value proposition of NVIDIA's ecosystem, suggesting a comprehensive solution for AI deployment.",
        "entities": [
          {
            "name": "NVIDIA",
            "role": "subject",
            "type": "organization",
            "tag_id": 17723038993599085
          }
        ],
        "evidence": "paraphrase",
        "featured": false,
        "claim_text": "These partners benefit from the full benefit of NVIDIA's hardware, software and ecosystem codesign behind every interaction served.",
        "claim_type": "statement",
        "confidence": "stated",
        "key_point_index": null
      }
    ],
    "headline": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
    "sentiment": "positive",
    "extraction": {
      "topics": [
        "artificial_intelligence",
        "semiconductors",
        "cloud_computing"
      ],
      "entities": [
        {
          "name": "NVIDIA",
          "role": "source_org",
          "type": "organization",
          "subtype": "public_company"
        },
        {
          "name": "Shruti Koparkar",
          "role": "speaker",
          "type": "person",
          "subtype": "executive"
        },
        {
          "name": "CoreWeave",
          "role": "mentioned",
          "type": "organization",
          "subtype": "private_company"
        },
        {
          "name": "Nebius",
          "role": "mentioned",
          "type": "organization",
          "subtype": "private_company"
        },
        {
          "name": "Nscale",
          "role": "mentioned",
          "type": "organization",
          "subtype": "private_company"
        },
        {
          "name": "Together AI",
          "role": "mentioned",
          "type": "organization",
          "subtype": "private_company"
        },
        {
          "name": "SemiAnalysis",
          "role": "mentioned",
          "type": "organization",
          "subtype": "media"
        }
      ],
      "headline": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
      "sentiment": "positive",
      "significance": "high",
      "entity_details": [
        {
          "role": "source_org",
          "tag_id": 17723038993599085,
          "tag_type": "organization",
          "tag_value": "NVIDIA",
          "tag_subtype": "public_company",
          "canonical_name": "Nvidia"
        },
        {
          "role": "speaker",
          "tag_id": 17726487328814545,
          "tag_type": "person",
          "tag_value": "Shruti Koparkar",
          "tag_subtype": "executive",
          "canonical_name": "Shruti Koparkar"
        },
        {
          "role": "mentioned",
          "tag_id": 17733540864854276,
          "tag_type": "organization",
          "tag_value": "CoreWeave",
          "tag_subtype": "private_company",
          "canonical_name": "CoreWeave"
        },
        {
          "role": "mentioned",
          "tag_id": 17724054197223435,
          "tag_type": "organization",
          "tag_value": "Nebius",
          "tag_subtype": "private_company",
          "canonical_name": "Nebius"
        },
        {
          "role": "mentioned",
          "tag_id": 17733518877659987,
          "tag_type": "organization",
          "tag_value": "Nscale",
          "tag_subtype": "private_company",
          "canonical_name": "Scale"
        },
        {
          "role": "mentioned",
          "tag_id": 17723038994308374,
          "tag_type": "organization",
          "tag_value": "Together AI",
          "tag_subtype": "private_company",
          "canonical_name": "Together AI"
        },
        {
          "role": "mentioned",
          "tag_id": 17723038994126390,
          "tag_type": "organization",
          "tag_value": "SemiAnalysis",
          "tag_subtype": "media",
          "canonical_name": "SemiAnalysis"
        },
        {
          "role": "mentioned",
          "tag_id": 17723038993834764,
          "tag_type": "topic",
          "tag_value": "artificial_intelligence",
          "tag_subtype": "tech",
          "canonical_name": "Artificial Intelligence"
        },
        {
          "role": "mentioned",
          "tag_id": 17723038993839926,
          "tag_type": "topic",
          "tag_value": "semiconductors",
          "tag_subtype": "tech",
          "canonical_name": "Semiconductors"
        },
        {
          "role": "mentioned",
          "tag_id": 17723038993835295,
          "tag_type": "topic",
          "tag_value": "cloud_computing",
          "tag_subtype": "tech",
          "canonical_name": "Cloud Computing"
        }
      ],
      "content_summary": "NVIDIA argues that the traditional metrics for evaluating AI infrastructure, such as compute cost and FLOPS per dollar, are insufficient in the generative AI era. The company advocates for \"cost per token\" as the sole critical metric, emphasizing that optimizing for token output drives profitability and revenue. NVIDIA claims its Blackwell platform delivers significantly lower cost per token and higher token output compared to its Hopper architecture, positioning itself as the industry leader.",
      "domain_classification": {
        "home_domain": "engineering-technology",
        "cross_domains": [
          "economics-business-work",
          "everyday-life-practical-knowledge",
          "physical-sciences-mathematics"
        ]
      },
      "published_date_extraction": {
        "date": null,
        "source": "none",
        "reasoning": "Date provided in source metadata, not extracted from content text."
      }
    },
    "source_url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories",
    "claim_count": 65,
    "home_domain": "engineering-technology",
    "source_name": "nvidia-blog",
    "evidence_ref": {
      "source_urls": [
        "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories"
      ]
    },
    "significance": "high",
    "claims_rollup": {
      "total_claims": 65,
      "one_line_summary": "Traditional data center  only  tored, retrieved and proce ed data. (paraphrase).",
      "sentiment_distribution": {
        "unknown": {
          "pct": 100.0,
          "count": 65
        }
      },
      "confidence_distribution": {
        "stated": {
          "pct": 78.5,
          "count": 51
        },
        "measured": {
          "pct": 21.5,
          "count": 14
        }
      },
      "claim_length_distribution": {
        "short (<12w)": {
          "pct": 86.2,
          "count": 56
        },
        "medium (12-30w)": {
          "pct": 13.8,
          "count": 9
        }
      },
      "evidence_type_distribution": {
        "paraphrase": {
          "pct": 100.0,
          "count": 65
        }
      }
    },
    "cross_domains": [
      "economics-business-work",
      "everyday-life-practical-knowledge",
      "physical-sciences-mathematics"
    ],
    "claims_summary": "Traditional data center  only  tored, retrieved and proce ed data. (paraphrase).",
    "entity_details": [
      {
        "tag_type": "Organization",
        "tag_value": "NVIDIA"
      },
      {
        "tag_type": "Person",
        "tag_value": "Shruti Koparkar"
      },
      {
        "tag_type": "Organization",
        "tag_value": "CoreWeave"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Nebius"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Nscale"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Together AI"
      },
      {
        "tag_type": "Organization",
        "tag_value": "SemiAnalysis"
      },
      {
        "tag_type": "Tag",
        "tag_value": "artificial_intelligence"
      },
      {
        "tag_type": "Tag",
        "tag_value": "semiconductors"
      },
      {
        "tag_type": "Tag",
        "tag_value": "cloud_computing"
      }
    ],
    "featured_count": 5,
    "published_date": "2026-04-15",
    "source_channel": "nvidia-newsroom-rss",
    "_generation_metadata": {},
    "domain_classification": {
      "home_domain": "engineering-technology",
      "cross_domains": [
        "economics-business-work",
        "everyday-life-practical-knowledge",
        "physical-sciences-mathematics"
      ]
    },
    "extraction_prompt_version": "4.1.2"
  },
  "domain_classification": {
    "home_domain": "engineering-technology",
    "cross_domains": [
      "economics-business-work",
      "everyday-life-practical-knowledge",
      "physical-sciences-mathematics"
    ]
  },
  "provenance": {
    "source_name": "nvidia-blog",
    "source_channel": "nvidia-newsroom-rss",
    "source_url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories"
  }
}

Brief

{
  "brief_id": "1776327224478196811",
  "manifest_id": "1776319402051417156",
  "record_id": "1776319400318871222",
  "stream_ids": [
    "17723038993561924"
  ],
  "stream_names": [
    "nvidia"
  ],
  "significance": "high",
  "source_published_date": "2026-04-15",
  "headline": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
  "summary": "NVIDIA asserts that the economic evaluation of AI infrastructure must shift from traditional input metrics like compute cost and FLOPS per dollar to \"cost per token.\" This change is necessitated by the evolution of data centers into AI token factories, where intelligence is manufactured as tokens. NVIDIA positions its full-stack codesign, particularly with the Blackwell platform, as delivering the industry's lowest cost per token and highest token throughput, enabling profitable AI scaling for enterprises.",
  "body": {
    "summary": "NVIDIA asserts that the economic evaluation of AI infrastructure must shift from traditional input metrics like compute cost and FLOPS per dollar to \"cost per token.\" This change is necessitated by the evolution of data centers into AI token factories, where intelligence is manufactured as tokens. NVIDIA positions its full-stack codesign, particularly with the Blackwell platform, as delivering the industry's lowest cost per token and highest token throughput, enabling profitable AI scaling for enterprises.",
    "tldr": "NVIDIA argues that cost per token is the practical TCO metric for scaling AI inference profitably.",
    "why_it_matters": "Teams buying AI infrastructure need an output-based metric that connects spend to delivered inference, not just raw compute capacity.",
    "facts": [
      {
        "label": "Blackwell token output/watt",
        "value": "50x",
        "entity": "NVIDIA Blackwell"
      },
      {
        "label": "Cost per million tokens",
        "value": "35x lower",
        "entity": "NVIDIA Blackwell"
      }
    ],
    "unresolved": [
      "Article does not publish the full benchmark methodology behind the comparative cost-per-token claims."
    ],
    "signal_digest": {
      "claim_count": 65,
      "featured": [
        {
          "claim_text": "Cost per token is the critical TCO metric for profitably scaling AI.",
          "claim_type": "analysis",
          "confidence": "stated"
        }
      ]
    },
    "headline": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
    "sentiment": "positive",
    "source_url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories",
    "home_domain": "engineering-technology",
    "source_name": "nvidia-blog",
    "evidence_ref": {
      "source_urls": [
        "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories"
      ]
    },
    "key_insights": [
      "AI infrastructure evaluation must shift from traditional metrics (peak chip specifications, compute cost, FLOPS per dollar) to \"cost per token\" due to the evolution of data centers into AI token factories.",
      "Cost per token is identified as the critical Total Cost of Ownership (TCO) metric for profitably scaling AI, as it directly accounts for hardware performance, software optimization, ecosystem support, and real-world utilization.",
      "NVIDIA claims to deliver the industry's lowest cost per token and highest token throughput through extreme codesign across compute, networking, memory, storage, software, and partner technologies.",
      "The NVIDIA Blackwell platform demonstrates a significant advantage over NVIDIA Hopper, delivering more than 50x greater token output per watt and nearly 35x lower cost per million tokens, despite a higher compute cost."
    ],
    "significance": "high",
    "cross_domains": [
      "economics-business-work",
      "everyday-life-practical-knowledge",
      "physical-sciences-mathematics"
    ],
    "guest_details": [
      {
        "name": "Shruti Koparkar",
        "title": "Author",
        "affiliation": "NVIDIA"
      }
    ],
    "entity_details": [
      {
        "tag_type": "Organization",
        "tag_value": "NVIDIA"
      },
      {
        "tag_type": "Person",
        "tag_value": "Shruti Koparkar"
      },
      {
        "tag_type": "Organization",
        "tag_value": "CoreWeave"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Nebius"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Nscale"
      },
      {
        "tag_type": "Organization",
        "tag_value": "Together AI"
      },
      {
        "tag_type": "Organization",
        "tag_value": "SemiAnalysis"
      },
      {
        "tag_type": "Tag",
        "tag_value": "artificial_intelligence"
      },
      {
        "tag_type": "Tag",
        "tag_value": "semiconductors"
      },
      {
        "tag_type": "Tag",
        "tag_value": "cloud_computing"
      }
    ],
    "notable_quotes": [],
    "published_date": "2026-04-15",
    "source_channel": "nvidia-newsroom-rss",
    "cross_promotion": [
      "vLLM",
      "SGLang",
      "NVIDIA TensorRT-LLM",
      "NVIDIA Dynamo",
      "CoreWeave",
      "Nebius",
      "Nscale",
      "Together AI"
    ],
    "cultural_relevance": "This content highlights the evolving economic considerations for AI infrastructure, reflecting the increasing integration of generative AI into business operations and the critical need for efficient, scalable, and cost-effective AI deployment strategies across industries.",
    "_generation_metadata": {},
    "actionable_takeaways": [
      "Enterprises should prioritize \"cost per token\" over traditional input metrics like FLOPS per dollar when evaluating AI infrastructure investments to accurately assess profitability and revenue potential.",
      "Conduct an in-depth cost analysis for AI infrastructure, focusing on factors that maximize delivered token output (the denominator), such as interconnect capabilities, precision support, runtime optimizations, and full lifecycle support.",
      "Consider full-stack optimized solutions that integrate hardware, software, and ecosystem support to ensure all optimizations enhance each other and effectively reduce the cost per token."
    ],
    "domain_classification": {
      "home_domain": "engineering-technology",
      "cross_domains": [
        "economics-business-work",
        "everyday-life-practical-knowledge",
        "physical-sciences-mathematics"
      ]
    }
  },
  "provenance": {
    "source_name": "nvidia-blog",
    "source_channel": "nvidia-newsroom-rss",
    "source_url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories"
  }
}

Record

{
  "record_id": "1776319400318871222",
  "manifest_id": "1776319402051417156",
  "stream_ids": [
    "17723038993561924"
  ],
  "stream_names": [
    "nvidia"
  ],
  "source_name": "nvidia-blog",
  "source_channel": "nvidia-newsroom-rss",
  "source_type": "organization",
  "title": "Rethinking AI TCO: Why Cost per Token Is the Only Metric That Matters",
  "url": "https://blogs.nvidia.com/blog/lowest-token-cost-ai-factories",
  "author": "Shruti Koparkar",
  "summary": "Traditional data centers only stored, retrieved and processed data. In the generative and agentic AI era, these facilities have evolved into AI token factories. With AI inference becoming their primary workload, their primary output is intelligence manufactured in the form of tokens.  This transformation demands a corresponding shift in how the economics of AI infrastructure, […]",
  "content": "Traditional data centers only stored, retrieved and processed data. In the generative and agentic AI era, these facilities have evolved into AI token factories. With AI inference becoming their primary workload, their primary output is intelligence manufactured in the form of tokens. \nThis transformation demands a corresponding shift in how the economics of AI infrastructure, including total cost of ownership (TCO), is assessed. Enterprises evaluating AI infrastructure still too often focus on peak chip specifications, compute cost or floating point operations per second for every dollar spent, aka FLOPS per dollar. \nThe distinction that matters is this:\n\nCompute cost is what enterprises pay for AI infrastructure, whether rented from cloud providers or owned on premises.\nFLOPS per dollar is how much raw computing power an enterprise gets for every dollar spent, but raw compute and real-world token output are not the same thing. \nCost per token is an enterprise's all-in cost to produce each delivered token, usually represented as cost per million tokens.\n\nThe first two are merely input metrics. Optimizing for inputs while the business runs on output is a fundamental mismatch. \nCost per token determines whether enterprises can profitably scale AI. It's the one TCO metric that directly accounts for hardware performance, software optimization, ecosystem support and real-world utilization - and NVIDIA delivers the lowest cost per token in the industry. \nWhat Are the Factors That Lower Token Cost?\nUnderstanding how to optimize token cost requires looking at the equation for calculating cost per million tokens.\n\nIn this equation, many enterprises evaluating AI infrastructure focus on the numerator: the cost per GPU per hour. For cloud deployments, this is the hourly rate paid to a cloud provider; for on-premises deployments, it's the effective hourly cost derived from amortizing owned infrastructure. The real key to reducing token cost, however, lies in the denominator: maximizing the delivered token output.\nThat denominator carries two business implications.\n\nMinimize token cost: When this increase in token output is reflected through the cost equation, it drives down cost per token, which is what grows the profit margin on every interaction served.\nMaximize revenue: More tokens delivered per second also translates to more tokens per megawatt, which means more intelligence to use in AI-powered products and services, generating more revenue from the same infrastructure investment.\n\nSo focusing only on the numerator means missing what drives the denominator. Think of it as an \"inference iceberg\": The numerator sits above the surface, visible and easy to compare. The denominator is everything beneath the surface, which represents key factors that determine real-world token output. Accurately evaluating AI infrastructure starts with asking what lies beneath. \n\nSurface-level inquiry:\n\nWhat is the cost per GPU hour?\nWhat are the peak petaflops and high-bandwidth memory capacity?\nWhat are the FLOPS per dollar?\n\nIn-depth cost analysis:\n\nWhat is the cost per million tokens? Specifically, what is the cost per million tokens for large-scale mixture-of-experts (MoE) reasoning models, which represent the most widely deployed type of AI models?\nWhat is the delivered token output per megawatt? For on-premises deployments especially, where capital commitment to land, power and infrastructure is substantial, maximizing intelligence produced per megawatt is critical.\nCan the scale-up interconnect handle the \"all-to-all\" traffic of MoE models?\nIs FP4 precision supported? Can the inference stack make use of FP4 while maintaining high accuracy?\nDoes the inference runtime support speculative decoding or multi-token prediction to increase user interactivity?\nDoes the serving layer support disaggregated serving, KV-aware routing, KV-cache offloading and other optimizations?\nDoes the platform support the unique workload requirements of agentic AI - including ultralow latency, high throughput and large input sequence lengths?\nDoes the platform support the full lifecycle, from training and post-training to high-scale inference, across all model architectures, to ensure infrastructure fungibility and high utilization?\n\nEvery one of these algorithmic, hardware and software optimizations must be active and integrated, or the denominator collapses. A \"cheaper\" GPU that delivers significantly fewer tokens per second results in a much higher cost per token. AI infrastructure that gets it right across the full stack ensures that every optimization enhances the others.\nWhy Does Cost per Token Matter Much More Than FLOPS per Dollar?\nThe following data for the DeepSeek-R1 AI model demonstrates the difference between theoretical and actual business outcomes.\nLooking at compute cost alone, the NVIDIA Blackwell platform appears to cost roughly 2x more than NVIDIA Hopper - but compute cost says nothing about the output that investment buys. An analysis of mere FLOPS per dollar suggests a 2x NVIDIA Blackwell advantage compared with the NVIDIA Hopper architecture. However, the actual outcome is orders of magnitude different: Blackwell delivers more than 50x greater token output per watt than Hopper, resulting in nearly 35x lower cost per million tokens. \n\n\tMetricNVIDIA Hopper (HGX H200) NVIDIA Blackwell (GB300 NVL72) NVIDIA Blackwell Relative to Hopper\n\n\tCost per GPU per Hour ($)$1.41 $2.65 2x\n\n\tFLOP per Dollar (PFLOPS) 2.85.62x\n\n\tTokens per Second per GPU906,00065x\n\n\tTokens per Second per MW54K2.8M50x\n\n\tCost per Million Tokens ($)$4.20 $0.12 35x lower\n\nNote: Data is sourced from NVIDIA analysis and the SemiAnalysis InferenceX v2 benchmark. \nThis massive divergence proves NVIDIA Blackwell delivers a massive leap in business value over the earlier Hopper generation that far outpaces any increase in system cost.\nHow to Choose the Right AI Infrastructure\nComparing AI infrastructure based on compute cost or theoretical FLOPS per dollar isn't just insufficient; it doesn't provide an accurate representation of inference economics. As the data demonstrates, an accurate evaluation of AI infrastructure's revenue potential and profitability requires a shift from input metrics to cost per token and delivered token output.\nNVIDIA delivers the industry's lowest token cost and highest token throughput through extreme codesign across compute, networking, memory, storage, software and partner technologies. Moreover, the constant optimization of open source inference software such as vLLM, SGLang, NVIDIA TensorRT-LLM and NVIDIA Dynamo built on the NVIDIA platform means that on existing NVIDIA infrastructure, token output continues to increase and the cost per token continues to decline long after it's acquired.\nLeading cloud providers and NVIDIA cloud partners are already delivering this advantage at scale. Partners such as CoreWeave, Nebius, Nscale and Together AI have deployed NVIDIA Blackwell infrastructure and optimized their stacks to bring enterprises the lowest token cost available today, with the full benefit of NVIDIA's hardware, software and ecosystem codesign behind every interaction served.",
  "media_format": "text",
  "claim_type": "analysis",
  "source_published_date": "2026-04-15",
  "synorb_ingested_at": "2026-04-16 06:03:19.732319+00:00",
  "record_version": 5,
  "domain_classification": {
    "home_domain": "engineering-technology",
    "cross_domains": [
      "economics-business-work",
      "everyday-life-practical-knowledge",
      "physical-sciences-mathematics"
    ]
  },
  "entities": [
    {
      "tag_type": "Organization",
      "tag_value": "NVIDIA"
    },
    {
      "tag_type": "Person",
      "tag_value": "Shruti Koparkar"
    },
    {
      "tag_type": "Organization",
      "tag_value": "CoreWeave"
    },
    {
      "tag_type": "Organization",
      "tag_value": "Nebius"
    },
    {
      "tag_type": "Organization",
      "tag_value": "Nscale"
    },
    {
      "tag_type": "Organization",
      "tag_value": "Together AI"
    },
    {
      "tag_type": "Organization",
      "tag_value": "SemiAnalysis"
    },
    {
      "tag_type": "Tag",
      "tag_value": "artificial_intelligence"
    },
    {
      "tag_type": "Tag",
      "tag_value": "semiconductors"
    },
    {
      "tag_type": "Tag",
      "tag_value": "cloud_computing"
    }
  ]
}

02 / Manifest Views

Same event. Different jobs.

The format choice is not cosmetic. Signals are for reasoning and alerts. Briefs are for dashboards and RAG context. Records are for joins, lineage, replay, and warehouse-grade use.

Signal

Atomic assertions

Claims with evidence, confidence, sentiment, source, tags, and date. Built so agents can reason over the smallest useful unit.

Brief

Compact narrative

A source-aware summary that keeps the important claims together. Built for dashboards, digests, and retrieval-augmented systems.

Record

Canonical JSON

The structured content object with stable identifiers, provenance, versioning, source metadata, and machine-joinable fields.

03 / Streams + Source Channels

What produces a Manifest?

Source Channels are the exact surfaces Synorb watches. Streams are the canonical rollups agents subscribe to. The Manifest is what gets written when those watched surfaces publish.

Source Channel

The publishing surface

Blogs, filings, podcasts, feeds, reports, social posts, data releases, research pages, and other surfaces.

google-cloud odd-lots-podcast bls-consumer-price-index

Stream

The canonical subscription

A saved query or entity rollup that bundles related channels under a durable object agents can follow.

Alphabet Odd Lots BLS CPI

Manifest

The machine object

Signal, Brief, and Record views with stable IDs, provenance, lineage, domain classification, and typed tags.

manifest_id record_id stream_ids

04 / Primitives

Streams start from three primitive shapes.

Synorb does not treat every source as a loose feed. Streams resolve to primitives that agents can reason about: organizations, people, and datasets.

Primitive / Organization

Organizations

Companies, labs, banks, governments, publishers, funds, universities, agencies, and public institutions.

Alphabet6,028 /168h

Stripe553 /168h

Federal Reserve620 /168h

Sequoia Capital258 /168h

Primitive / Person

People

Founders, operators, researchers, investors, executives, policymakers, creators, and domain specialists.

Andrej Karpathy812 /168h

Paul Graham210 /168h

Dwarkesh Patel505 /168h

Sarah Guo243 /168h

Primitive / Dataset

Datasets

Filings, research, economic indicators, podcasts, corporate blogs, statistical releases, and more.

SEC EDGAR2,410 /168h

BLS CPI144 /168h

Weather Alerts390 /168h

FOMC Calendar116 /168h

05 / Catalog

A slice of the primitive catalog.

The matrix is intentionally broad: organizations, people, and datasets live together because agents need to join them together. A person can cite a company; a company can publish a data point; a dataset can move a market narrative.

Organization

Alphabet

6,028 /168h

Organization

Stripe

553 /168h

Organization

Nvidia

317 /168h

Organization

Goldman Sachs

229 /168h

Organization

KPMG

852 /168h

Organization

Databricks

366 /168h

Organization

Federal Reserve

620 /168h

Organization

Cleveland Fed

311 /168h

Organization

Sequoia Capital

258 /168h

Organization

Oaktree Capital

499 /168h

Organization

Salesforce

546 /168h

Organization

Cloudflare

397 /168h

Person

Andrej Karpathy

812 /168h

Person

Paul Graham

210 /168h

Person

Dwarkesh Patel

505 /168h

Person

Sarah Guo

243 /168h

Person

Wilfred Frost

585 /168h

Person

David Rubenstein

269 /168h

Person

Jensen Huang

178 /168h

Person

Jerome Powell

441 /168h

Person

Patrick O'Shaughnessy

329 /168h

Person

Marc Andreessen

198 /168h

Person

Dan Luu

117 /168h

Person

Julia Evans

94 /168h

Dataset

SEC EDGAR

2,410 /168h

Dataset

BLS CPI

144 /168h

Dataset

Employment Situation

304 /168h

Dataset

JOLTS

188 /168h

Dataset

Producer Price Index

221 /168h

Dataset

Weather Alerts

390 /168h

Dataset

Weather Forecast

264 /168h

Dataset

Sports Odds

506 /168h

Dataset

FOMC Calendar

116 /168h

Dataset

Earnings Calls

712 /168h

Dataset

Vehicle Data

86 /168h

Dataset

Auction Results

139 /168h

06 / Example Stream

A Stream bundles every channel an entity publishes.

Alphabet is one Stream. It includes developer, research, cloud, AI, product, and corporate channels. An agent subscribes to the Stream and receives Manifests from every linked Source Channel.

Example Stream

Alphabet

home_domain: engineering-technology · source_type: organization · 14 source channels

Claims / 168h

6,028

Google Cloudgoogle-cloud

428

Google DeepMindgoogle-deepmind

900

AIgoogle-developers-ai

389

Cloudgoogle-developers-cloud

77

Mobilegoogle-developers-mobile

808

Webgoogle-developers-web

226

Channels

14

All Alphabet-published surfaces.

Claims / 168h

6,028

Atomic assertions extracted last 7 days.

Avg / Channel

430

Per-channel claim rate.

Resolution

<1 min

Capture-to-Manifest latency.

07 / Home Domains

The top-level ontology for every Manifest.

Every Manifest has one home_domain: the primary place an agent should file it. Cross-domains preserve the secondary context, but the home domain keeps the graph stable.

economics-business-work

Capital, companies, labor, and strategy

Markets, management, financial institutions, employment, company updates, allocation, and business operations.

Examples: earnings, bank research, VC, podcasts, corporate news

engineering-technology

Software, systems, AI, and infrastructure

Engineering work, platforms, developer tools, AI research, cloud, semiconductors, cybersecurity, and technical strategy.

Examples: engineering blogs, AI labs, cloud channels

society-law-government

Policy, institutions, law, and civic systems

Public institutions, legislation, regulation, elections, courts, agencies, civic infrastructure, and government action.

Examples: Fed policy, regulators, government releases

health-medicine

Medicine, care, biology, and public health

Healthcare delivery, biotech, clinical research, public health, pharmaceuticals, medical devices, and life sciences.

Examples: hospital systems, FDA updates, biotech reports

life-environment

Climate, energy, ecology, and living systems

Environment, agriculture, energy transition, weather impacts, climate systems, conservation, and resource use.

Examples: weather alerts, energy reports, climate research

physical-sciences-mathematics

Formal and physical sciences

Physics, chemistry, mathematics, materials, measurement, scientific discovery, and quantitative research.

Examples: research papers, lab updates, science datasets

arts-culture-entertainment

Culture, media, sports, and creative work

Film, television, music, games, sports, cultural industries, media production, and entertainment markets.

Examples: sports odds, media companies, creator channels

people-biography-history

People and institutions over time

Careers, biographies, leadership changes, historical narratives, institutional memory, and individual influence.

Examples: person streams, executive moves, interviews

language-literature

Writing, language, and interpretation

Books, essays, rhetoric, translation, language systems, literary work, and communication as a domain.

Examples: essays, publishing, linguistic research

everyday-life-practical-knowledge

Practical life and applied know-how

Consumer behavior, work practices, education, how-to knowledge, local decisions, and usable everyday context.

Examples: guides, education, consumer updates

places-geography

Countries, cities, regions, and place

Geographic context, regional economies, urban systems, geopolitics, demographics, and physical place.

Examples: regional data, city reports, country analysis

universe-earth

Earth systems and planetary scale

Space, astronomy, planetary science, earth systems, geophysics, and phenomena beyond local human systems.

Examples: astronomy, geology, planetary datasets

Free Credentials Included

Get the complete manifest schema reference and 1,000 free manifests/month.

Schema PDF + free credentials delivered instantly with 1,000 manifests/mo on Daily Batch delivery.

Streams dispatch Manifests with structured claims.

One source event. Three machine-native views.

Identifiers

Source + Domain

Outputs + Delivery

Cross-Domain Routing

Lineage

Signal Identity

Headline + Summary

Claims

Claim Distribution

Brief Identity

Key Insights

TL;DR

Key Facts

Why It Matters

Actionable Takeaways

Record Identity

Source Metadata

Routing + Entities

Same event. Different jobs.

Atomic assertions

Compact narrative

Canonical JSON

What produces a Manifest?

The publishing surface

The canonical subscription

The machine object

Streams start from three primitive shapes.

Organizations

People

Datasets

A slice of the primitive catalog.

A Stream bundles every channel an entity publishes.

The top-level ontology for every Manifest.

Free Credentials Included