Join Marcus Chen (Principal Platform Engineer · Databricks) for a free live session.

Marcus Chen
Principal Platform Engineer · Databricks
⭐ 4.9 / 5
Built for engineers who already own Kubernetes: six weeks across GPU Operator, vLLM, KServe, Triton, KubeFlow, and MLflow — the full AI infrastructure stack taught from the infra side. No ML prerequisites, no 15-course sprawl — just what your job description actually requires.
Design a GPU cluster architecture with Terraform node pools — covering inference economics, resource management, and cost trade-offs across cloud environments.
Deploy NVIDIA GPU Operators on Kubernetes with MIG partitioning and multi-tenant scheduling — verified running in a live lab cluster.
Deploy Mistral-7B behind vLLM and KServe, optimized with Triton — benchmarked for P50/P99 latency and tokens/sec against production targets.
Build a KubeFlow pipeline wired to MLflow, with Evidently drift detection and automated retraining triggers — production-ready from day one.
This sprint is designed for:
Who need to incorporate GPU orchestration and MLOps into their cloud environments to meet the demands of new AI-oriented projects.
Who are tasked with deploying and managing AI models in production environments but lack the specific skills for GPU and LLM operations.
Who can train and fine-tune models but need the infrastructure layer — GPU scheduling, inference serving, and MLOps pipelines — to ship to production without depending on a separate platform team.
6 weeks · 3 sessions per week
Leave with real work to show, not just a certificate.
A detailed architecture plan for AI infrastructure that includes GPU cluster configurations and inference economics. This plan will serve as a foundational document for deploying AI workloads at scale.
A comprehensive manual for deploying and optimizing LLM inference systems using KServe and Triton, focusing on real-world performance metrics. A reusable guide for deploying various LLMs in production.
A detailed pipeline configuration document using KubeFlow and Argo, covering model versioning and drift detection. A blueprint for implementing robust MLOps practices in any organization.

Principal Platform Engineer · Databricks
⭐ 4.9 / 5
Marcus Chen is the Principal Platform Engineer for AI Infrastructure at Databricks, where he runs GPU cluster operations across 2,000+ nodes on AWS and Azure and owns the LLM inference platform serving production workloads. Before Databricks, he spent five years as a Senior SRE at Google Cloud. He teaches from the infra side — not the ML side.
⭐⭐⭐⭐⭐
"The GPU orchestration knowledge Marcus shared was instrumental in optimizing our AI cluster management. Implementing it saved us substantial costs."
Samantha Lee
Cloud Engineer · Cloudflare
⭐⭐⭐⭐⭐
"Deploying LLMs with KServe and Triton was a game-changer for our team. The real-world exercises made it easy to apply right away."
Daniel Hughes
DevOps Engineer · Rippling
⭐⭐⭐⭐⭐
"The MLOps pipeline we built during Week 4 is now the backbone of our AI operations. It's streamlined our deployment process significantly."
Jennifer Tran
SRE · Brex
All sessions are instructor-led and live. Recordings available within 24 hours.
SUNDAY
9:00 AM PDT
Live ClassIn-depth exploration of AI infrastructure architecture and GPU orchestration techniques.
WEDNESDAY
6:00 PM PDT
Lab SessionHands-on lab to apply weekly frameworks and tools, addressing any blockers.
THURSDAY
6:00 PM PDT
Build & ShipExecute and peer review deliverables, focusing on real-world applicability.
with Marcus Chen · Principal Platform Engineer, Databricks
What you'll walk away with:
🎁 Bonus for attendees:
Get "The AI Infrastructure Starter Guide"
Includes a GPU orchestration checklist, LLM deployment templates, and MLOps pipeline frameworks — ready to integrate with your systems.
Claim your free seat
Skills you can deploy on Monday morning.