Rebates Snowflake SSOT
- dbt models
- 20+
- pipelines migrated
- 4
- cycle time
- days → minutes
The setup
In 2022 I was promoted to Lead, Rebates Data & Reporting at Abarca Health and asked to consolidate rebate data ownership scattered across B2B, BI, and Analytics. The first initiative: a Snowflake single source of truth for rebates.
What was hard
Numbers disagreed depending on who you asked. Three different teams owned overlapping pieces of the legacy rebate stack — the kind of setup every PBM analytics team recognizes. SQL Server with stored procedures calling stored procedures, SSIS packages doing transforms that should have lived in models, and report logic that drifted between environments because there were no environments, just shared servers.
The constraint was that nothing could break in flight. Rebate calculations feed quarterly financial reporting and federal MDRP submissions. The audit window doesn’t move, and the C-suite was watching closely — this was the department’s largest, highest-visibility initiative.
The approach
Three architectural decisions shaped the rest of the work.
Stage-per-layout ingest. Each upstream file format gets its own raw stage in Snowflake with metadata preserved verbatim — no premature normalization. dbt picks up from staging, so transformations are auditable and reversible.
dbt across stage and prod. Environment-aware connection profiles,
incremental models for the high-volume rebate tables, and dbt tests that
fail the build instead of producing a broken dashboard. Manual reconciliation
queries — the kind that lived in someone’s \Documents folder — became
dbt test assertions.
Python where it earns its keep. Python helpers that were really doing
SELECT … GROUP BY got rewritten as Snowflake queries. The Python that
actually orchestrated work — file landing, retry handling, lineage tagging —
stayed Python.
What changed
The platform now runs 20+ dbt models, 4 migrated ingest pipelines, and consolidates roughly 20 legacy reports behind one warehouse. The practical headline for the business: reporting deliverables that used to take days to produce now take minutes.
Beyond raw speed, the validation layer is what changes the day-to-day. dbt tests catch upstream data-quality issues before they hit the dashboard, so the conversation in standups stopped being “why is this number wrong” and started being “what should this new metric look like.”
Working the program
Two threads ran in parallel with the platform build:
- MDRP federal program implementation. ASES Medicaid rebates required new data and reporting flows under CMS rules. The Snowflake foundation made this tractable — the same models feed both internal analytics and federal submissions, so we’re not maintaining parallel pipelines.
- Team scaling. I directed two contractors through requirements gathering, design, development, and deployment. The point of the contractor model was to absorb predictable work without growing a permanent team faster than the platform could support — hiring analytics engineers correctly is harder than building the platform itself.
What I’d do differently
The honest reflection: I’d invest earlier in observability for the dbt runs. We had test coverage and we had logs, but the link between “this run failed” and “this is the upstream file that caused it” was reconstructable rather than immediate. The platform shipped without it; the next iteration earned it the hard way.
If you’re evaluating a similar migration: the sequencing question (ingest first vs. transformations first) is real but solvable. The harder question is which team owns reconciliation during the cutover. Decide that before you write a line of dbt.