Build a Career Without Limits

Join our mission to scale AI infrastructure globally. Explore opportunities to innovate with cutting-edge middleware technology.

hero

Senior Backend Engineer (Agentic AI/LLM)

Develop high-performance backend systems for AI and LLM applications.

Bengaluru, India
On-site
Full-Time
2025-11-07

About CAI Stack

CAI Stack delivers modular AI infrastructure designed for enterprises to efficiently build, scale, and deploy ML and DL applications. Our platform empowers organizations to launch industry-specific AI products using our robust, scalable technology, trusted by global leaders for mission-critical deployments.

Role Overview

We are looking for a hands-on LLM Engineer passionate about building intelligent, high-performance applications. You will design, implement, and maintain agentic and RAG-based solutions, ensuring speed, accuracy, and scalability. Your work will directly impact how CAI Stack leverages large language models for innovative business solutions.

Key Responsibilities

Develop end-to-end RAG applications, from data ingestion to real-time delivery.

Optimize retrieval pipelines to handle millions of rows with minimal latency.

Design and implement intelligent agents capable of complex decision-making using LLMs.

Enhance product data by extracting descriptive tags from images.

Prepare and clean catalog data for integration with LLM applications.

Collect and process product and review data from social platforms.

Set up hyperparameter optimization (HPO) workflows for machine learning pipelines.

Fine-tune and adapt models to improve performance on domain-specific tasks.

Collaborate with data scientists and engineers to deploy LLM applications into production.

Implement MLOps best practices for continuous integration, deployment, and monitoring.

Stay updated on advancements in LLM architectures, agentic frameworks, and RAG methodologies.

Required Qualifications

Proven experience building and deploying agentic and RAG-based applications in production.

Ability to optimize retrieval pipelines for large datasets (3M+ rows).

Experience with tool-based setups like TrinoDB, web browsers, and Python/Scala REPLs.

Hands-on experience building LLM agents across platforms such as Gemini, OpenAI, and Claude.

Experience with multimodal tasks such as extracting tags from product images.

Familiarity with frameworks like LangChain, LlamaIndex, Hugging Face Transformers, and AutoGen.

Practical knowledge of vector databases including ChromaDB, Pinecone, or Weaviate.

Experience fine-tuning open-source LLMs using LoRA or similar techniques.

Deployment experience with cloud platforms (AWS, GCP, Azure), containerization (Docker, Kubernetes), and inference servers (vLLM).

Expert-level proficiency in Python and ML frameworks like PyTorch or TensorFlow.

Preferred Qualifications

Experience with distributed computing frameworks such as Spark or Ray for processing large-scale datasets.

Advanced degree (M.S. or Ph.D.) in Computer Science, Engineering, or a related quantitative field.

Apply for This Job

Partner with Our Expert Consultants

Empower your AI journey with our expert consultants, tailored strategies, and innovative solutions.

Get in Touch