Job openings across our network

companies
Jobs

Serbia- AI Workloads Engineer

Next Silicon

Next Silicon

Software Engineering, Data Science
Serbia
Posted on Jan 6, 2026

Serbia- AI Workloads Engineer

  • Software
  • Serbia

Description

NextSilicon is reimagining high-performance computing (HPC & AI). Our accelerated compute solutions leverage intelligent adaptive algorithms to vastly accelerate supercomputers, driving them forward into a new generation. We have developed a novel software-defined hardware architecture that is achieving significant advancements in both the HPC and AI domains.

At NextSilicon, everything we do is guided by three core values:

  • Professionalism: We strive for exceptional results through professionalism and unwavering dedication to quality and performance.
  • Unity: Collaboration is key to success. That's why we foster a work environment where every employee can feel valued and heard.
  • Impact: We're passionate about developing technologies that make a meaningful impact on industries, communities, and individuals worldwide.

The AI Workloads team is responsible for modeling and enabling end-to-end AI workflows on NextSilicon’s next-generation hardware platforms. As an AI Workloads Engineer in Belgrade, you’ll build workflow modeling infrastructure, run and adapt open-source AI systems, and use real workloads to drive performance improvements from chip design through production.

Requirements

  • 4+ years of experience in software engineering.
  • Strong Python and PyTorch development experience.
  • Solid understanding of LLMs and modern inference workflows (e.g., KV cache, paged attention, speculative/assisted decoding, batching/scheduling)
  • Experience running, profiling, and instrumenting open-source AI inference systems (e.g., vLLM or similar)
  • Proficiency in C++ for developing software that models or interacts with hardware execution behavior (latency, dataflow, memory access patterns).
  • Experience with distributed inference and collectives (e.g., NCCL) and parallelism strategies (TP/PP/EP) is an advantage
  • Experience with dynamic batching systems (e.g., vLLM, TensorRT-LLM) is an advantage
  • Familiarity with MLPerf Inference benchmarks and methodology (Server/Offline, latency constraints, request arrival patterns) is an advantage
  • Experience programming custom kernels (e.g., CUDA, Triton, or similar) is an advantage
  • Background in performance analysis, simulation, compiler/runtime profiling, or workload modeling is an advantage

Responsibilities

  • Model and analyze end-to-end AI workflows (e.g., assisted decoding, dynamic batching, dynamic KV cache, MLPerf-like scenarios) on NextSilicon platforms, from simulation through production.
  • Run and adapt open-source AI workloads, collecting and analyzing metrics such as latency, throughput, and traversal or arrival statistics.
  • Use SDK and framework-integration tools to profile full-stack behavior, identify performance bottlenecks, and drive improvements with compiler, runtime, and hardware design teams.
  • Prototype custom kernels or runtime components when needed to enable or optimize new AI workflows on NextSilicon hardware.