RTB Model Serving Platform

Factored built an ML serving platform for RTB that scaled throughput 20X, cut latency to 10ms, and improved reliability and cost.

Key Takeaways:

Handling the Volume of Demand.

The ML ad serving platform had limited throughput (50,000 requests per second) and struggled with high latency. Additionally, there was a need for multi-framework support, robust monitoring, and data quality assurance in both the serving and feature store pipeline.

Using ML for Operational Efficiency & Insights.

  • We enabled ML inference for multiple model development libraries like PyTorch and TensorFlow using NVIDIA Triton for flexible deployment.
  • To enhance reliability, we integrated Prometheus and Grafana for real-time monitoring, improving error handling, automated recovery, and rate limiting to prevent overload.
  • Performance was optimized through Grafana load testing, Golang MLServing enhancements, and Aerospike Cache integration.

Scaling Throughput 20X - 1M Per Second.

  • Latency as low as 10ms.
  • Reduced cost and better performance.
  • More reliable and transparent ML operations.
Skills
No items found.
Roles
No items found.

Continue Reading

Automated compliance reporting

Compliance Reporting

Reporting automated

Audience segmentation model

Audience Segmentation

Targeting efficiency improved

Clinical data ingestion pipelines

Clinical Data Pipelines

Research access improved

Want to discuss a solution for you?
Talk to an Expert
Elite engineers ready to accelerate your roadmap
Start vetting within one week
Have talent placed in under a month.