Job Description
We are seeking an exceptional AI Integration Engineer who operates at the intersection of development, operations, data, and systems engineering to build solutions for large-scale continuous data transformation and delivery. This role focuses specifically on building and maintaining data pipelines for both structured and unstructured data, enabling the development and deployment of AI/ML models that power our RAG-based document processing and insight generation systems.
Responsibilities
Data Infrastructure & Integration
Design and implement data integrations and ingestion processes for internal and external data sources
Build and maintain scalable data pipelines for ingesting, processing, and transforming unstructured data sources (customer feedback, documents, multimedia content)
Develop data models and mapping rules to transform raw data into actionable insights and structured outputs
Architect and implement semantic layers that integrate analytics data from multiple sources efficiently
AI/ML System Integration
Develop and maintain robust backend APIs and services supporting the entire prompt-to-answer workflow
Implement and optimize retrieval logic including vector search, hybrid search, and advanced information retrieval techniques
Manage document ingestion pipelines including parsing, OCR, chunking, and embedding generation
Support integration of various LLM providers (OpenAI, Azure AI, Anthropic) with internal business data sources
Infrastructure & Operations
Ensure reliability, scalability, and low latency of AI response generation systems
Implement data governance policies and procedures for responsible and ethical use of data in AI applications
Develop data quality monitoring and validation processes specifically for AI/ML datasets, including bias identification and mitigation
Build and maintain monitoring, alerting, and observability systems for AI infrastructure
Collaboration & Documentation
Collaborate with analytics and data science teams to understand requirements and deliver solutions
Work with data scientists to ensure data is available in appropriate format and quality for model training and deployment
Maintain comprehensive documentation including data models, mapping rules, and data dictionaries
Partner with internal business stakeholders, technology resources, and external vendors
Requirements
Education & Experience
Bachelor's degree in Computer Science, Engineering, or equivalent work experience
5+ years of experience in designing, building, and maintaining scalable data solutions for large-scale analytics
Proven ability to lead development projects from start to finish with demonstrated results
Technical Skills
Proficiency in Python, Java, or R and open-source frameworks for distributed processing (Hadoop, Spark)
Expert-level SQL and development experience with cloud database environments (Snowflake, Redshift, Databricks)
Hands-on experience with modern cloud data stack tools for code management, versioning (Git), CI/CD, and automation
Experience with orchestration tools (Apache Airflow) and monitoring & alerting systems
Data & AI Expertise
Strong understanding of data modeling, data warehousing, and ETL concepts
Experience with vector databases (Pinecone, Milvus, Weaviate, Chroma)
Proficiency in handling unstructured data formats (JSON, Parquet, text, images, audio, video)
Familiarity with AI/ML model development lifecycle and data requirements for training and deployment
Cloud & Infrastructure
Experience with cloud platforms (AWS, Azure, Google Cloud) and their AI/ML services
Knowledge of containerization and orchestration technologies (Docker, Kubernetes)
Understanding of API development and web standards (REST, GraphQL, gRPC, HTTP, JSON)
Preferred Skills
Preferred Qualifications
Master's degree in Computer Science, Engineering, or equivalent work experience
Experience with cloud-based AI/ML platforms and services
Knowledge of data augmentation techniques for improving AI/ML model performance
Experience with data labeling platforms (Amazon SageMaker Ground Truth, Labelbox)
Understanding of responsible AI principles and data privacy regulations (GDPR, CCPA)
Experience with data governance and observability tools (Datahub, Collibra)
Basic frontend development experience (HTML, CSS, JavaScript)
Tools & Technologies
Programming & Frameworks
Python, Java, R
Apache Spark, Apache Hadoop
FastAPI, Django, Flask
Data & AI Platforms
Snowflake, Redshift, Databricks
Pinecone, Milvus, Weaviate, Chroma
LangChain, LlamaIndex
OpenAI, Azure AI, Anthropic, Cohere
Cloud & Infrastructure
AWS, Azure, Google Cloud Platform
Docker, Kubernetes
Apache Airflow, Apache Kafka
Development Tools
Git, GitHub, GitLab
Jenkins, GitHub Actions
Jupyter Notebooks, Dataiku
Category
Salary
Posted
Location
( Remote )
Share or copy
