2025 - Present
Meta / Monetization GenAI
Building large-scale GenAI model and serving infrastructure for creative generation systems. Work spans CUDA/Triton sparse attention kernels for Diffusion Transformer inference, adaptive sparse attention for video diffusion models, vLLM-based benchmarking for dense and MoE LLM serving, disaggregated prefill analysis, and RL/post-training infrastructure with VERL, GRPO, multi-objective RL, tool calling, and LLM-as-judge evaluation for agentic workflows.