Build robust AI frameworks to lower neural networks (PyTorch) to edge devices
Build robust AI infrastructure to train and fine-tune networks for Autopilot on large GPU clusters
Deploy state-of-the-art neural networks on heterogenous compute, to maximize network performance while minimizing latency
Closely collaborate with AI scientists and hardware teams to effectively quantize, prune, and run inference in low-precision
Design and implement custom GPU kernels (OpenCL/CUDA) for efficient training and post-processing of network output
Proficiency with Python and C++, including modern C++ (14/17/20)
Experience with PyTorch, TensorFlow, or other machine learning frameworks
Experience with Machine Learning, Deep Learning, and Computer Vision
Experience with Model Fine-Tuning: Quantization Aware Training, Compression, Pruning
Experience with training and deploying neural networks for real-world AI
Experience with Computer Systems/Architecture
Experience with CUDA and/or OpenCL
Các công việc tương tự