[Remote] Lead AI Engineer
Note: The job is a remote job and is open to candidates in USA. EPAM Systems is seeking a Lead AI Engineer to design, build, and scale cutting-edge AI applications powered by large language models. In this role, you will partner with clients to deliver tailored LLM-driven solutions, architect agentic systems, and drive the adoption of emerging AI technologies across enterprise environments.
Responsibilities
- Design, implement and maintain end-to-end AI applications, including chatbots, Q&A platforms, agent workflows and other LLM-driven solutions
- Collaborate directly with clients to understand their needs, identify opportunities and recommend tailored AI/LLM solutions that drive business value
- Architect and optimize robust data pipelines, prompt strategies and datasets to ensure effective, accurate and scalable AI models
- Evaluate, monitor and refine AI system performance, ensure outputs are accurate, secure, scalable and compliant with industry regulations and best practices
- Conduct research, design experiments and perform rapid prototyping to validate technical feasibility and demonstrate the business value of AI solutions
- Stay current with evolving LLM technologies, frameworks, protocols (such as MCP, A2A, ACP) and methodologies, continuously improve solution quality and client outcomes
- Design and implement agentic systems with frameworks such as LangChain, LangGraph and Semantic Kernel, integrate with vector databases and advanced memory architectures
- Develop and maintain APIs and system integrations for production-grade AI applications, including enterprise system integration (CRM, ERP, databases)
- Deploy AI solutions at scale, consider performance, cost-efficiency, maintainability, observability and security (including guardrails and prompt injection prevention)
- Implement and monitor retrieval systems (keyword search, vector search, embeddings), ranking algorithms and agent evaluation frameworks
- Use MLOps/AIOps practices for agentic systems and ensure robust observability and monitoring of deployed solutions
- Clearly communicate complex technical concepts and AI strategies to both technical and non-technical stakeholders, iterate on models based on user feedback
Skills
- Strong proficiency in at least one modern programming language (such as Python, Java, C#, Go, etc.); experience with web frameworks like FastAPI or similar is a plus
- Deep understanding of the AI application development lifecycle, including production deployment, system integration and rapid UI prototyping (Streamlit, Gradio or similar)
- Familiarity with major LLM platforms and APIs (OpenAI, Anthropic, Amazon Bedrock, Gemini) and related frameworks (LangChain, LangGraph, LlamaIndex, Strands Agents, etc.)
- Knowledge of advanced AI integration patterns (e.g., RAG, agent orchestration, tool calling), retrieval systems (keyword/vector search, embeddings) and ranking algorithms
- Experience to deploy AI solutions at scale, with a focus on performance, cost-efficiency, maintainability, observability and security (including guardrails and prompt injection prevention)
- Proven ability to evaluate generative AI quality with retrieval/classification scores, LLM-based evaluation, agent evaluation metrics and A/B testing
- Experience with vector databases (Pinecone, Weaviate, ChromaDB, FAISS) and semantic/hybrid search
- Experience to design experiments, conduct A/B tests and iterate on models based on user feedback
- Experience with enterprise system integration (CRM, ERP, databases) and deployment to cloud AI platforms or on-premise solutions
- Experience with observability and monitoring tools/frameworks, and application of MLOps/AIOps practices for agentic systems
- Familiarity with emerging protocols (MCP, A2A, ACP) and advanced memory architectures
- Proven experience in AI engineering and delivery of ML-based solutions in production environments
- Strong problem-solving skills, attention to detail and ability to work independently and collaboratively
- Excellent communication, collaboration and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders
Benefits
- Career plan and real growth opportunities
- Unlimited access to LinkedIn learning solutions
- Constant training, mentoring, online corporate courses, eLearning and more
- English classes with a certified teacher
- Support for employee’s initiatives (Algorithms club, toastmasters, agile club and more)
- Enjoyable working environment (Gaming room, napping area, amenities, events, sport teams and more)
- Flexible work schedule and dress code
- Collaborate in a multicultural environment and share best practices from around the globe
- Hired directly by EPAM & 100% under payroll
- Law benefits (IMSS, INFONAVIT, 25% vacation bonus)
- Major medical expenses insurance: Life, Major medical expenses with dental & visual coverage (for the employee and direct family members)
- 13 % employee savings fund, capped to the law limit
- Grocery coupons
- 30 days December bonus
- Employee Stock Purchase Plan
- 12 vacations days
- Official Mexican holidays, plus 5 extra holidays (Maundry Thursday and Friday, November 2nd, December 24th & 31st)
- Monthly non-taxable amount for the electricity and internet bills
Company Overview