ProfitSolv- Senior AI Data Engineer

ProfitSolv is a SaaS business services provider for the legal and accounting industry. We are looking for an Senior AI Data Engineer to join our growing team!

We are seeking a seasoned Sr. AI Data Engineer to support in building a centralized data platform on AWS to unify data across the entire portfolio and power the next generation of AI-driven experiences for our customers.

This is a greenfield building. We have a clearly defined three-phase architecture (Lakehouse standup → database migration → zero-ETL endgame) and executive sponsorship. We’re looking for an engineer who is an equal parts data engineer and AI builder - someone who can write dbt models in the morning and design a RAG pipeline in the afternoon.

What we provide:

Opportunity to Invest in Your Future. We offer a 401K match.
Paid Time Off. Enjoy paid time off and paid holidays.
Great Coverage. Take advantage of health, dental, and vision HSA and FSA policies.
A Great Team. Collaborate with smart, curious, hardworking individuals.
Performance Compensation. Be rewarded for your hard work with performance-based merits.
Remote Work. Want to work from home? No problem!

As a Senior AI Data Engineer, you will:

Design, build, and scale the production medallion lakehouse (Bronze → Silver → Gold) on AWS — Apache Iceberg on S3, AWS Athena, Redshift Serverless, dbt Cloud, and Lake Formation.
Own AWS DMS change-data-capture pipelines from multi-tenant SQL Server source databases into the Bronze layer: task topology, throughput tuning, Multi-AZ recovery, and reconciliation.
Build and maintain Silver layer transformations — port complex application-layer business logic (billing, invoicing, time tracking) into reliable, reconciled SQL transformations.
Design SCD2 patterns for slowly changing dimensions where data correctness is business-critical, such as timekeeper billing rates.
Solve multi-tenant data engineering challenges at scale: partition strategy, tenant isolation in Lake Formation, hot-tenant skew, and cross-tenant analytics where appropriate.
Operate the dbt Cloud environment end-to-end: semantic layer, dbt Mesh, MetricFlow, incremental models, partial refreshes, dependency management, version control, and rapid debugging of broken transformations.
Expose governed metrics and a semantic layer to AI agents via the Model Context Protocol (MCP) — design the agent-facing surface of the lakehouse.
Establish data-quality contracts, freshness SLAs, observability, lineage via Lake Formation, and ongoing reconciliation against source systems.
Monitor and optimize after launch — cost, latency, query performance, Iceberg compaction strategy, and snapshot lifecycle.
Provide technical leadership to two junior data engineers already on the team; partner closely with a senior peer doing parallel work.

This position follows established policies and procedures to keep confidential information secure.

A great fit for this position has:

7+ years building production data platforms; 3+ years on modern lakehouse architectures (Iceberg strongly preferred; Delta or Hudi acceptable).
Deep production experience with AWS DMS at scale: CDC from SQL Server, multi-task topologies, recovery, and throughput tuning.
Expert dbt skills — dbt Cloud preferred, with hands-on experience across semantic layer, dbt Mesh, MetricFlow, incremental models, SCD2, snapshots, tests, and rigorous version control.
Strong dimensional modeling fundamentals; opinionated about Kimball/Inmon tradeoffs and able to defend your choices in design review.
Production experience with AWS Athena (as a transformation engine) and Redshift Serverless (as a query engine), including their adapters in dbt.
Apache Iceberg operational experience: compaction, snapshot expiry, partition evolution, time travel for audit.
AWS Lake Formation for governance, fine-grained access control, and data lineage.
Track record of reverse-engineering complex application-layer business logic into SQL transformations and reconciling against source systems where correctness has financial consequences.
Demonstrated experience operating multi-tenant SaaS data platforms at scale: tenant isolation, partition skew, per-tenant SLA management.
Strong SQL Server / T-SQL literacy on the source side; comfortable reading legacy application code to understand business rules.
Comfortable building data quality, observability, and contract testing into pipelines from day one — not bolted on later.

Additional Desirable Qualifications

Experience exposing a dbt semantic layer or governed metrics to LLM-based agents via MCP or equivalent.
Familiarity with the broader AWS data stack: Glue, Step Functions, EventBridge, MWAA, EMR Serverless.
Background in legal tech, fintech, or other regulated industries where data correctness has real, audited consequences.
Experience with zero-ETL approaches (Aurora to Redshift, Aurora to OpenSearch) and their tradeoffs.
Deep performance-tuning experience: query plans, partitioning strategy, sort keys, column ordering, file size optimization at the Iceberg layer.
Prior experience leading the silver-layer port of a large legacy application schema (300+ tables) into a governed model.
Ability to sit for prolonged periods at a desk and work on a computer.
Must be able to lift up to 15 pounds at times.
Ability to handle stress
Ability to meet work deadlines

Our commitment to you: At ProfitSolv, we are committed to being a diverse and inclusive workplace as an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law. We embrace a diverse group of backgrounds and experiences to connect with clients, solve problems, and innovate.

This is a full time position

Senior AI Data Engineer - Remote (; ...)

Personal Information

Attachments

Other Information