데브허브 | DEVHUB | How Nvidia Reduced JSON Processing Costs by Over 80%
In this video, we're going to learn how Nvidia was able to cut on processing time by 4x, reduced costs by 80%.
Clarification on the parallel tokenization:
Let's start with the problem: character-by-character processing is slow because it requires tracking the depth and structure while continuously parsing every character.
In parallel tokenization, the parsing of the JSON is split into independent tasks, with each thread handling a specific portion of the structure. For $.inventory.quantity, one thread first identifies the depth and position of the "inventory" key and its associated object, while another thread begins looking for the "quantity" key inside the "inventory" object, based on pre-computed positions. These threads work simultaneously without waiting for each other to finish, allowing the GPU to efficiently extract the desired value by directly jumping to "quantity" at the correct nesting level.
----
Also apologies for earlier viewers of this video. I previously had a section called testing where it was saying "Their test dataset consisted of 200k rows of json data across five columns, totally 9.5GB uncompressed The results were compelling. The processing time for their production workload decreased from 16.7 hours to just 3.8 hours". But to clarify the 200k rows were for their testing and the timing was from the retailers production env.
Timeline:
00:00:00 - Intro
00:00:21 - Background
00:01:38 - Issue with CPU processing
00:02:18 - Rapid Accelerator
00:03:06 - Issues with GPU processing
00:05:04 - Fixes / optimization
00:06:55 - Future improvements
00:07:11 - Ending
Nvidia resources:
https://developer.nvidia.com/blog/acc...https://docs.nvidia.com/cuda/cuda-c-p...https://docs.nvidia.com/spark-rapids/...
Apache Spark:
https://aws.amazon.com/what-is/apache...
Videos:
• Hanging up flip phone(VINE)
Music credit:
♪ [non copyright music] Lofi Type Beat - Saturday Morning | aesthetic lofi music / Lofiru Link : [