Contributors Download PDF of this page
The explosive growth of data and the exponential growth of machine learning (ML) and artificial intelligence (AI) have converged to create a new economy with unique development and implementation challenges. Massive quantities of data are usually stored in a low-cost data lake, where high-performance AI compute resources such as GPUs cannot efficiently access it. In this report, we present a novel solution in which data science practitioners implement a data hub and, with one click, create a cache of datasets in proximity to their compute resources—wherever they are located. As a result, AI practitioners can perform high-performance model training more easily with enhanced collaboration enabled by a new dataset version hub.