Nvidia famous that price per token went from 20 cents on the older Hopper platform to 10 cents on Blackwell. Transferring to Blackwell’s native low-precision NVFP4 format additional diminished the price to only 5 cents, so a fundamental improve gave a 4x enchancment in price per token whereas sustaining the accuracy that clients anticipate.
Nvidia outlined 4 business deployments in a blog post displaying how this mix of Blackwell infrastructure, NVFP4, optimized software program stacks and open-source fashions delivers vital price reductions. They break down like this:
- Healthcare — In healthcare, tedious, time-consuming duties like medical coding, documentation and managing insurance coverage kinds minimize into the time docs can spend with sufferers. Sully.ai helps deal with this downside by AI brokers to deal with routine duties that take up time.
The issue is that Sully.ai’s proprietary, closed supply fashions didn’t scale effectively. So Sully.ai used Baseten’s open-source Mannequin API on Blackwell GPUs with NVFP4 knowledge format, the TensorRT-LLM library and the Dynamo inference framework .The consequence was a 90% drop in inference prices dropped by 90%, representing a 10x discount in contrast with the prior closed supply implementation, whereas response instances improved by 65% for crucial workflows like producing medical notes.
