Skip to main content
日本語は機械翻訳による参考訳です。内容に矛盾や不一致があった場合には、英語の内容が優先されます。

共同作成者
= Jupyter notebooks for reference :hardbreaks: :nofooter: :icons: font :linkattrs: :imagesdir: ./../media/ link:aks-anf_dataset_and_model_versioning_using_netapp_dataops_toolkit.html[Previous: Dataset and Model Versioning using NetApp DataOps Toolkit.] There are two Jupyter notebooks associated with this technical report: * link:https://nbviewer.jupyter.org/github/NetAppDocs/netapp-solutions/blob/main/media/CTR-PandasRF-collated.ipynb[*CTR-PandasRF-collated.ipynb.*] This notebook loads Day 15 from the Criteo Terabyte Click Logs dataset, processes and formats data into a Pandas DataFrame, trains a Scikit-learn random forest model, performs prediction, and calculates accuracy. * link:https://nbviewer.jupyter.org/github/NetAppDocs/netapp-solutions/blob/main/media/criteo_dask_RF.ipynb[*criteo_dask_RF.ipynb.*] This notebook loads Day 15 from the Criteo Terabyte Click Logs dataset, processes and formats data into a Dask cuDF, trains a Dask cuML random forest model, performs prediction, and calculates accuracy. By leveraging multiple worker nodes with GPUs, this distributed data and model processing and training approach is highly efficient. The more data you process, the greater the time savings versus a conventional ML approach. You can deploy this notebook in the cloud, on-premises, or in a hybrid environment where your Kubernetes cluster contains compute and storage in different locations, as long as your networking setup enables the free movement of data and model distribution. link:aks-anf_conclusion.html[Next: Conclusion.]