Hybrid role in Shanghai, China for a senior-level Principal Software Engineer at Microsoft. This position focuses on optimizing GPU performance for large-scale AI models, enhancing inference engines, and pushing hardware limits for deep learning.
Skills / Requirements
- Ampere Tensor Cores
- C++
- CMake
- CUDA
- CUDA Programming
- FlashAttention
- GEMM
- Kernel Fusion
- LayerNorm
- Linux Kernel
- NCCL
- Nsight Systems
- NVIDIA GPU Architectures
- NVIDIA Hopper
- Pipeline Parallelism
- Python
- PyTorch
- Quantization
- Tensor Parallelism
- Triton
- Vibe Coding
Why Apply
This role is ideal for developers who leverage AI tools to optimize GPU kernels using CUDA/Triton and enhance inference engines. You'll work with advanced acceleration techniques like Quantization and Kernel Fusion, making it a perfect fit for AI-native developers.
What You'll Be Doing
You will design and implement optimized GPU kernels and contribute to high-performance inference engines. Your work involves analyzing model performance, optimizing communication for multi-GPU setups, and ensuring software fully utilizes modern GPU architectures.
Working in Shanghai, China
- Shanghai is a leading tech hub in China, offering a vibrant job market for tech professionals.
- The city boasts a robust public transportation system, making commutes efficient and affordable.
- Shanghai's international community provides ample networking opportunities for career growth.
- The city's dynamic lifestyle and cultural diversity enhance work-life balance for professionals.
Pay and Career Growth
Microsoft offers a robust career trajectory with opportunities to work on cutting-edge AI infrastructure. The role promises exposure to advanced GPU technologies and systems programming, fostering both technical and professional growth.
Benefits and Perks
- 401k
- competitive salary
- Equity
- Inclusive culture
Is This Role Right for You?
Good fit if you...
- Experienced in CUDA/C++ kernel development and GPU software.
- Proficient in optimizing deep learning operations and inference engines.
- Comfortable working in a hybrid setup in Shanghai, China.
May not be for you if...
- Lacks experience in GPU kernel development or CUDA programming.
- Unfamiliar with AI model optimization and performance profiling.
- Prefers fully remote work arrangements.
Original Job Description
Overview
We are seeking an expert Senior GPU Engineer to join our AI Infrastructure team. In this role, you will architect and optimize the core inference engine that powers our large-scale AI models. You will be responsible for pushing the boundaries of hardware performance, reducing latency, and maximizing throughput for Generative AI and Deep Learning workloads.
You will work at the intersection of Deep Learning algorithms and low-level hardware, designing custom operators and building a highly efficient training/inference execution engine from the ground up.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Starting January 26, 2026, Microsoft AI (MAI) employees who live within a 50- mile commute of a designated Microsoft office in the U.S. or 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week. This expectation is subject to local law and may vary by jurisdiction.
Responsibilities
Custom Operator Development: Design and implement highly optimized GPU kernels (CUDA/Triton) for critical deep learning operations (e.g., FlashAttention, GEMM, LayerNorm) to outperform standard libraries.
Inference Engine Architecture: Contribute to the development of our high-performance inference engine, focusing on graph optimizations, operator fusion, and dynamic memory management (e.g., KV Cache optimization).
Performance Optimization: Deeply analyze and profile model performance using tools like Nsight Systems/Compute. Identify bottlenecks in memory bandwidth, instruction throughput, and kernel launch overheads.
Model Acceleration: Implement advanced acceleration techniques such as Quantization (INT8, FP8, AWQ), Kernel Fusion, and continuous batching.
Distributed Computing: Optimize communication primitives (NCCL) to enable efficient multi-GPU and multi-node inference (Tensor Parallelism, Pipeline Parallelism).
Hardware Adaptation: Ensure the software stack fully utilizes modern GPU architecture features (e.g., NVIDIA Hopper/Ampere Tensor Cores, Asynchronous Copy).
Qualifications
Required Qualifications:
Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Professional Depth: 5+ years of experience in systems programming, HPC, or GPU software development, featuring at least 5 years of hands-on CUDA/C++ kernel development.
Architectural Mastery: Expertise in the CUDA programming model and NVIDIA GPU architectures (specifically Ampere/Hopper).
Deep understanding of the memory hierarchy (Shared Memory, L2 cache, Registers), warp-level primitives, occupancy optimization, and bank conflict resolution.
Familiarity with advanced hardware features: Tensor Cores, TMA (Tensor Memory Accelerator), and asynchronous copy.
Proven ability to navigate and modify complex, large-scale codebases (e.g., PyTorch internals, Linux kernel).
Experience with build and binding ecosystems: CMake, pybind11, and CI/CD for GPU workloads.
Performance Engineering: Mastery of NVIDIA Nsight Systems/Compute.
Ability to mathematically reason about performance using the Roofline Model, memory bandwidth utilization, and compute throughput.
Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
OR equivalent experience.
Engine & Framework Expertise: Working knowledge of state-of-the-art inference/training stacks: sglang, vLLM, TensorRT-LLM, DeepSpeed, or Megatron-LM.
Deep understanding of optimization patterns: PagedAttention, RadixAttention (Prefix Caching), continuous batching, and speculative decoding.
Operator & GEMM Optimization: * Practical experience with CUTLASS, CuTe, or OpenAI Triton.
Expertise in high-performance linear algebra (GEMM) optimization, including tiling strategies, data layouts, and mixed-precision accumulation.
Distributed Systems: Proficiency in multi-GPU/multi-node scaling using NCCL and parallelism strategies (Tensor, Pipeline, and Sequence parallelism).
Vibe Coding & AI-Native Velocity: An AI-native mindset: Expert at using vibe coding tools to bypass boilerplate and accelerate the development lifecycle.
The technical intuition to architect systems rapidly, moving from “vibe” to “highly-optimized production code” with extreme velocity.
#MicrosoftAI
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations. (https://careers.microsoft.com/v2/global/en/accessibility.html)