Join our mission to deliver AI infrastructure at scale. Explore opportunities to work with advanced middleware tech stacks.
Build scalable backend systems supporting AI/LLM applications.
CAI Stack provides cutting-edge pluggable building blocks for the entire AI stack. These components enable enterprises to build and scale machine learning (ML) and deep learning (DL) solutions, addressing diverse use cases. Our platform is utilized by some of the world's largest organizations to deploy vertical-specific products built on our proprietary AI infrastructure.
We are seeking a hands-on LLM Engineer with a passion for building high-performance, intelligent applications. The ideal candidate will have extensive experience designing, developing, and deploying agentic and RAG-based applications in a production environment. You will be a key contrib utor, responsible for delivering lightning-fast, accurate, and scalable solutions that unlock the power of large language models for our business.
Design, develop, and deploy end-to-end RAG-based applications, from data ingestion to real-time serving.
Optimize RAG pipelines to minimize retrieval latency on large datasets of around 3 million rows.
Build and implement intelligent agents capable of complex reasoning and autonomous decision-making using LLMs.
Extract descriptive tags from product images to enrich product data.
Clean and prepare catalog data for use in LLM applications.
Scrape product and review data from social media platforms.
Build configurations for hyperparameter optimization (HPO) of machine learning pipelines.
Conduct model fine-tuning and domain adaptation to enhance model performance for specific business tasks.
Collaborate with data scientists and machine learning engineers to integrate LLM applications into existing production systems.
Implement and manage MLOps best practices for continuous integration, deployment, and monitoring of LLM applications.
Stay up-to-date with the latest advancements in LLM architectures, agentic frameworks, and RAG techniques.
Hands-On Experience is a Must: Proven experience building and deploying agentic and RAG-based applications in a production setting.
Performance Optimization Demonstrated ability to optimize RAG pipelines for low-latency queries on large-scale datasets "3M+ rows."
Tool-Based Setups: Experience with tool-based setups, including TrinoDB, browsers, web scrapers, and Python/Scala REPLs.
Agent Development: Experience building agents for multiple LLM setups like Gemini, OpenAI, and Claude.
Multimodal Experience: Experience in extracting descriptive tags from product images, indicating proficiency with multimodal models.
LLM & Agent Frameworks: Hands-on experience with industry-standard frameworks and libraries such as LangChain, LlamaIndex, Hugging Face Transformers, and AutoGen.
Vector Databases: Practical experience with vector databases like ChromaDB, Pinecone, or Weaviate.
Fine-Tuning: Hands-on experience with fine-tuning open-source LLMs using techniques like LoRA.
Deployment: Familiarity with modern deployment stacks and cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (Docker, Kubernetes). Experience with inference servers like vLLM and MLOps tools like MLflow or Weights & Biases.
Programming Proficiency: Expert-level proficiency in Python and its data science ecosystem (PyTorch, TensorFlow).
Experience with distributed computing frameworks (e.g., Spark, Ray) for handling large datasets.
Advanced degree (M.S. or Ph.D.) in a quantitative field such as Computer Science, Engineering, or a related discipline.
Empower your AI journey with our expert consultants, tailored strategies, and innovative solutions.