Data Engineering Jun 2023

Rebates Snowflake SSOT

dbt models
20+
pipelines migrated
4
cycle time
days → minutes
Rebates Snowflake SSOT

The setup

In 2022 I was promoted to Lead, Rebates Data & Reporting at Abarca Health and asked to consolidate rebate data ownership scattered across B2B, BI, and Analytics. The first initiative: a Snowflake single source of truth for rebates.

What was hard

Numbers disagreed depending on who you asked. Three different teams owned overlapping pieces of the legacy rebate stack — the kind of setup every PBM analytics team recognizes. SQL Server with stored procedures calling stored procedures, SSIS packages doing transforms that should have lived in models, and report logic that drifted between environments because there were no environments, just shared servers.

The constraint was that nothing could break in flight. Rebate calculations feed quarterly financial reporting and federal MDRP submissions. The audit window doesn’t move, and the C-suite was watching closely — this was the department’s largest, highest-visibility initiative.

The approach

Three architectural decisions shaped the rest of the work.

Stage-per-layout ingest. Each upstream file format gets its own raw stage in Snowflake with metadata preserved verbatim — no premature normalization. dbt picks up from staging, so transformations are auditable and reversible.

dbt across stage and prod. Environment-aware connection profiles, incremental models for the high-volume rebate tables, and dbt tests that fail the build instead of producing a broken dashboard. Manual reconciliation queries — the kind that lived in someone’s \Documents folder — became dbt test assertions.

Python where it earns its keep. Python helpers that were really doing SELECT … GROUP BY got rewritten as Snowflake queries. The Python that actually orchestrated work — file landing, retry handling, lineage tagging — stayed Python.

What changed

The platform now runs 20+ dbt models, 4 migrated ingest pipelines, and consolidates roughly 20 legacy reports behind one warehouse. The practical headline for the business: reporting deliverables that used to take days to produce now take minutes.

Beyond raw speed, the validation layer is what changes the day-to-day. dbt tests catch upstream data-quality issues before they hit the dashboard, so the conversation in standups stopped being “why is this number wrong” and started being “what should this new metric look like.”

Working the program

Two threads ran in parallel with the platform build:

  • MDRP federal program implementation. ASES Medicaid rebates required new data and reporting flows under CMS rules. The Snowflake foundation made this tractable — the same models feed both internal analytics and federal submissions, so we’re not maintaining parallel pipelines.
  • Team scaling. I directed two contractors through requirements gathering, design, development, and deployment. The point of the contractor model was to absorb predictable work without growing a permanent team faster than the platform could support — hiring analytics engineers correctly is harder than building the platform itself.

What I’d do differently

The honest reflection: I’d invest earlier in observability for the dbt runs. We had test coverage and we had logs, but the link between “this run failed” and “this is the upstream file that caused it” was reconstructable rather than immediate. The platform shipped without it; the next iteration earned it the hard way.

If you’re evaluating a similar migration: the sequencing question (ingest first vs. transformations first) is real but solvable. The harder question is which team owns reconciliation during the cutover. Decide that before you write a line of dbt.