Overview
INTEGRIS Health is one of Oklahoma's largest integrated healthcare systems. As a Data Engineer Intern, my work sits at the intersection of data infrastructure and business reporting — the kinds of pipelines and dashboards that make operational decision-making possible at a health system scale.
Healthcare data engineering presents a distinct set of challenges: ingestion from multiple source systems with varying data models, strict data quality requirements, and reporting that must be reliable enough to support operational decisions. My role involves building and maintaining the Azure-based data infrastructure that supports enterprise-wide reporting.
The Problem
In enterprise environments, early pipeline implementations often prioritize getting data flowing over efficiency. As data volumes grow and reporting demands increase, those initial pipelines accumulate technical debt — slow runtimes, redundant queries, and fragile dependencies that make each run a liability rather than an asset.
My work at INTEGRIS Health has involved confronting this kind of inherited complexity head-on: taking pipelines built for simpler initial requirements and rebuilding them to handle production-scale data with the reliability and performance that operational reporting demands.
Approach
Data engineering in a healthcare context requires careful attention to data lineage, transformation logic, and downstream report accuracy. My approach to pipeline work at INTEGRIS focuses on:
- Auditing existing pipeline logic to identify bottlenecks and redundant operations
- Restructuring Azure Data Factory pipelines with appropriate parallelism, batching, and resource allocation
- Building reusable integration patterns that can be maintained and extended without deep domain knowledge of each source system
- Validating output against Power BI dashboards to ensure reporting accuracy is preserved after optimization
One significant piece of work involved rewriting a pipeline that had grown unwieldy over time — the result cut runtime from 11 minutes to 1 minute 4 seconds, with a corresponding reduction in per-run compute costs. The rewrite is described in more depth in my article on optimizing Azure Data Factory pipelines.
Tools
- Azure Data Factory — pipeline orchestration and data movement
- SQL Server / T-SQL — transformation logic, stored procedures, schema design
- Power BI — reporting layer connected to the data warehouse
- REST APIs — source system data ingestion
Outcome
My internship work at INTEGRIS Health has given me hands-on experience with the realities of production healthcare data engineering — where reliability, data quality, and performance all matter simultaneously. The role is ongoing as of this writing.
For a fuller picture of my background and other roles, see my About page or connect on LinkedIn ↗.