Serbia- AI Workloads Engineer
Next Silicon
Serbia- AI Workloads Engineer
- Software
- Serbia
Description
NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.
At NextSilicon, everything we do is guided by three core values:
- Professionalism: We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
- Unity: Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
- Impact: We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.
The AI Workloads team is responsible for modeling and enabling end-to-end AI workflows on NextSilicon’s next-generation hardware platforms. As an AI Workloads Engineer in Belgrade, you’ll build workflow modeling infrastructure, run and adapt open-source AI systems, and use real workloads to drive performance improvements from chip design through production.
Requirements
- 4+ years of experience in software engineering.
- Strong Python and PyTorch development experience.
- Solid understanding of LLMs and modern inference workflows (e.g., KV cache, paged attention, speculative/assisted decoding, batching/scheduling)
- Experience running, profiling, and instrumenting open-source AI inference systems (e.g., vLLM or similar)
- Proficiency in C++ for developing software that models or interacts with hardware execution behavior (latency, dataflow, memory access patterns).
- Experience with distributed inference and collectives (e.g., NCCL) and parallelism strategies (TP/PP/EP) is an advantage
- Experience with dynamic batching systems (e.g., vLLM, TensorRT-LLM) is an advantage
- Familiarity with MLPerf Inference benchmarks and methodology (Server/Offline, latency constraints, request arrival patterns) is an advantage
- Experience programming custom kernels (e.g., CUDA, Triton, or similar) is an advantage
- Background in performance analysis, simulation, compiler/runtime profiling, or workload modeling is an advantage
Responsibilities
- Model and analyze end-to-end AI workflows (e.g., assisted decoding, dynamic batching, dynamic KV cache, MLPerf-like scenarios) on NextSilicon platforms, from simulation through production.
- Run and adapt open-source AI workloads, collecting and analyzing metrics such as latency, throughput, and traversal or arrival statistics.
- Use SDK and framework-integration tools to profile full-stack behavior, identify performance bottlenecks, and drive improvements with compiler, runtime, and hardware design teams.
- Prototype custom kernels or runtime components when needed to enable or optimize new AI workflows on NextSilicon hardware.