Collect Inference Metrics from Triton Inference Server
Contributors
Suggest changes
The Triton Inference Server provides Prometheus metrics indicating GPU and request statistics.
By default, these metrics are available at [triton_inference_server_IP]:8002/metrics" class="bare">http://[triton_inference_server_IP]:8002/metrics.
The Triton Inference Server IP is the LoadBalancer IP that was recorded earlier.
The metrics are only available by accessing the endpoint and are not pushed or published to any remote server.