AI Integration Engineer

Engineering

|

AI Integration Engineer

Job Description

We are seeking an exceptional AI Integration Engineer who operates at the intersection of development, operations, data, and systems engineering to build solutions for large-scale continuous data transformation and delivery. This role focuses specifically on building and maintaining data pipelines for both structured and unstructured data, enabling the development and deployment of AI/ML models that power our RAG-based document processing and insight generation systems.

Responsibilities

Data Infrastructure & Integration

Design and implement data integrations and ingestion processes for internal and external data sources
Build and maintain scalable data pipelines for ingesting, processing, and transforming unstructured data sources (customer feedback, documents, multimedia content)
Develop data models and mapping rules to transform raw data into actionable insights and structured outputs
Architect and implement semantic layers that integrate analytics data from multiple sources efficiently

AI/ML System Integration

Develop and maintain robust backend APIs and services supporting the entire prompt-to-answer workflow
Implement and optimize retrieval logic including vector search, hybrid search, and advanced information retrieval techniques
Manage document ingestion pipelines including parsing, OCR, chunking, and embedding generation
Support integration of various LLM providers (OpenAI, Azure AI, Anthropic) with internal business data sources

Infrastructure & Operations

Ensure reliability, scalability, and low latency of AI response generation systems
Implement data governance policies and procedures for responsible and ethical use of data in AI applications
Develop data quality monitoring and validation processes specifically for AI/ML datasets, including bias identification and mitigation
Build and maintain monitoring, alerting, and observability systems for AI infrastructure

Collaboration & Documentation

Collaborate with analytics and data science teams to understand requirements and deliver solutions
Work with data scientists to ensure data is available in appropriate format and quality for model training and deployment
Maintain comprehensive documentation including data models, mapping rules, and data dictionaries
Partner with internal business stakeholders, technology resources, and external vendors

Requirements

Education & Experience

Bachelor's degree in Computer Science, Engineering, or equivalent work experience
5+ years of experience in designing, building, and maintaining scalable data solutions for large-scale analytics
Proven ability to lead development projects from start to finish with demonstrated results

Technical Skills

Proficiency in Python, Java, or R and open-source frameworks for distributed processing (Hadoop, Spark)
Expert-level SQL and development experience with cloud database environments (Snowflake, Redshift, Databricks)
Hands-on experience with modern cloud data stack tools for code management, versioning (Git), CI/CD, and automation
Experience with orchestration tools (Apache Airflow) and monitoring & alerting systems

Data & AI Expertise

Strong understanding of data modeling, data warehousing, and ETL concepts
Experience with vector databases (Pinecone, Milvus, Weaviate, Chroma)
Proficiency in handling unstructured data formats (JSON, Parquet, text, images, audio, video)
Familiarity with AI/ML model development lifecycle and data requirements for training and deployment

Cloud & Infrastructure

Experience with cloud platforms (AWS, Azure, Google Cloud) and their AI/ML services
Knowledge of containerization and orchestration technologies (Docker, Kubernetes)
Understanding of API development and web standards (REST, GraphQL, gRPC, HTTP, JSON)

Preferred Skills

Preferred Qualifications

Master's degree in Computer Science, Engineering, or equivalent work experience
Experience with cloud-based AI/ML platforms and services
Knowledge of data augmentation techniques for improving AI/ML model performance
Experience with data labeling platforms (Amazon SageMaker Ground Truth, Labelbox)
Understanding of responsible AI principles and data privacy regulations (GDPR, CCPA)
Experience with data governance and observability tools (Datahub, Collibra)
Basic frontend development experience (HTML, CSS, JavaScript)