Data Engineering Jun 2023

Rebates Snowflake SSOT

dbt models
20+
pipelines migrated
4
cycle time
days → minutes
Rebates Snowflake SSOT

The setup

I joined Abarca Health in 2018 as a Reporting Analyst and grew through Analytics Specialist and Team Lead roles before being promoted to Lead, Rebates Data & Reporting in 2022. The mandate for the new role was to stand up a specialized department and consolidate rebate data ownership that had been scattered across B2B, BI, and Analytics teams. The first major initiative was the work this case study describes: a Snowflake single source of truth for rebates.

What was hard

The legacy rebate stack was the kind of setup every PBM analytics team recognizes — SQL Server with stored procedures calling stored procedures, SSIS packages doing transforms that should have lived in models, and report logic that drifted between environments because there were no environments, just shared servers. Three different teams owned overlapping pieces. Numbers disagreed depending on who you asked.

The constraint was that nothing could break in flight. Rebate calculations feed quarterly financial reporting and federal MDRP submissions. The audit window doesn’t move, and the C-suite was watching closely — this was the department’s largest, highest-visibility initiative.

The approach

Three architectural decisions shaped the rest of the work.

Stage-per-layout ingest. Each upstream file format gets its own raw stage in Snowflake with metadata preserved verbatim — no premature normalization. dbt picks up from staging, so transformations are auditable and reversible.

dbt across stage and prod. Environment-aware connection profiles, incremental models for the high-volume rebate tables, and dbt tests that fail the build instead of producing a broken dashboard. Manual reconciliation queries — the kind that lived in someone’s \Documents folder — became dbt test assertions.

Python where it earns its keep. Python helpers that were really doing SELECT … GROUP BY got rewritten as Snowflake queries. The Python that actually orchestrated work — file landing, retry handling, lineage tagging — stayed Python.

What changed

The platform now runs 20+ dbt models, 4 migrated ingest pipelines, and consolidates roughly 20 legacy reports behind one warehouse. The practical headline for the business: reporting deliverables that used to take days to produce now take minutes.

Beyond raw speed, the validation layer is what changes the day-to-day. dbt tests catch upstream data-quality issues before they hit the dashboard, so the conversation in standups stopped being “why is this number wrong” and started being “what should this new metric look like.”

Working the program

Two threads ran in parallel with the platform build:

  • MDRP federal program implementation. ASES Medicaid rebates required new data and reporting flows under CMS rules. The Snowflake foundation made this tractable — the same models feed both internal analytics and federal submissions, so we’re not maintaining parallel pipelines.
  • Team scaling. I directed two contractors through requirements gathering, design, development, and deployment. The point of the contractor model was to absorb predictable work without growing a permanent team faster than the platform could support — analytics engineering hires correctly is harder than building the analytics engineering platform.

What I’d do differently

The honest reflection: I’d invest earlier in observability for the dbt runs. We had test coverage and we had logs, but the link between “this run failed” and “this is the upstream file that caused it” was reconstructable rather than immediate. The platform shipped without it; the next iteration earned it the hard way.

If you’re evaluating a similar migration: the sequencing question (ingest first vs. transformations first) is real but solvable. The harder question is which team owns reconciliation during the cutover. Decide that before you write a line of dbt.