Open to interesting problems
I build pipelines that move clinical and life-science data from messy reality into systems people can trust.
Experience
Ten years across clinical research, biostatistics, and data engineering.
Data Engineer
Sprinter Health · Menlo Park, CA
Building data infrastructure for in-home preventive care.
Data Engineer II
Verana Health · San Francisco, CA
Authored post-processing components and Airflow orchestration for LLM inference pipelines, integrating Databricks preprocessing with AWS Bedrock batch inference — the org's first production-ready LLM inference capability. Optimized a GPU-intensive SageMaker pipeline to a Spark UDF using spaCy, reducing cost 90%+. Migrated EKS pipelines to Databricks for a 50% processing-time reduction.
Data Scientist / Data Engineer
Thermo Fisher Scientific · South San Francisco, CA
Designed event-driven ETL pipelines from multiple sources to AWS. Python and Java services with CI/CD to serverless. Root-cause analysis using statistical tools, results presented to cross-functional teams.
Biostatistician / Data Scientist
Stanford University — QSU, School of Medicine · Palo Alto, CA
Biostatistics and informatics on the Apple Heart Study (n > 400,000). Built end-to-end demographics + retention ETL on GCP serving an R Shiny dashboard. Trained BERT and LSTM models in PyTorch for clinical-note classification.
Clinical Research Data Analyst II
Department of Veterans Affairs — PAVIR · Palo Alto, CA
Statistical models and hypothesis testing on the VA Corporate Data Warehouse for manuscript submissions.
Quantitative Research Analyst I
Sutter Health — PAMF · Palo Alto, CA
Healthcare-disparity research across 17 racial/ethnic groups and 50 metrics on the full PAMF adult EMR population, using GLMs in SAS.
Selected Projects
Work I've led or contributed to — at companies, in research, on the side. Click any line to expand.
Publications
JAMA Cardiology · Apple Heart Study Investigators incl. S. Gummidipundi
Architecture Diagrams
System designs from production work — hover nodes to highlight connections.
Verana Health — LLM inference pipeline
Production LLM inference: Databricks preprocessing → AWS Bedrock batch inference, orchestrated by Airflow.
Skills
Programming
Data / Frameworks
Cloud
ML / NLP
Now
A few things on my plate.
- ●Ramping at Sprinter Health.
- ●Building symbol-screen on the side.
- ●Advising a marketing law firm on data infrastructure.
- ●Re-reading Designing Data-Intensive Applications.
Contact
Open to interesting problems and conversations. Reach out via email or find me on GitHub and LinkedIn.
© 2026 Santosh Gummidipundi · santosh@santoshg.io