Skip to main content
NetApp Solutions
简体中文版经机器翻译而成,仅供参考。如与英语版出现任何冲突,应以英语版为准。

贡献者
= Jupyter notebooks for reference :hardbreaks: :nofooter: :icons: font :linkattrs: :imagesdir: ./../media/ link:aks-anf_dataset_and_model_versioning_using_netapp_dataops_toolkit.html[Previous: Dataset and Model Versioning using NetApp DataOps Toolkit.] There are two Jupyter notebooks associated with this technical report: * link:https://nbviewer.jupyter.org/github/NetAppDocs/netapp-solutions/blob/main/media/CTR-PandasRF-collated.ipynb[*CTR-PandasRF-collated.ipynb.*] This notebook loads Day 15 from the Criteo Terabyte Click Logs dataset, processes and formats data into a Pandas DataFrame, trains a Scikit-learn random forest model, performs prediction, and calculates accuracy. * link:https://nbviewer.jupyter.org/github/NetAppDocs/netapp-solutions/blob/main/media/criteo_dask_RF.ipynb[*criteo_dask_RF.ipynb.*] This notebook loads Day 15 from the Criteo Terabyte Click Logs dataset, processes and formats data into a Dask cuDF, trains a Dask cuML random forest model, performs prediction, and calculates accuracy. By leveraging multiple worker nodes with GPUs, this distributed data and model processing and training approach is highly efficient. The more data you process, the greater the time savings versus a conventional ML approach. You can deploy this notebook in the cloud, on-premises, or in a hybrid environment where your Kubernetes cluster contains compute and storage in different locations, as long as your networking setup enables the free movement of data and model distribution. link:aks-anf_conclusion.html[Next: Conclusion.]