Sathyapal Reddy Peddakkagari logo
Open to AI Data Engineer Roles
Fairfax, VA — Open to RelocateGPA 3.96 / 4.0

SathyapalReddy.

I'm a|

AI Data Engineer building production data pipelines that ship intelligence — from FDA regulatory AI to large-scale clinical ML on Databricks. MS Data Analytics Engineering at George Mason University. Open to AI Data Engineer roles.

3.96GPA
3+Projects
1Publication
2026Graduating
Scroll
Sathyapal Reddy Peddakkagari

Sathyapal Reddy

AI Data Engineer

George Mason University

Fairfax, Virginia 22030

Building the intersection of
Data & AI Engineering

I'm an AI Data Engineer who builds production-grade data systems that ship intelligence — not just dashboards. I design end-to-end pipelines on AWS and Databricks, then layer LLMs, RAG, and ML models on top so data actually drives decisions.

Proven impact at scale: reduced API latency by 40%, supported a 200% increase in query throughput at Virtusa, and built an FDA regulatory AI platform combining OCR, RAG, and python-docx generation for Precise Software Solutions.

Cloud-Scale Data Engineering

AWS, Databricks, Snowflake, PySpark, Kafka

GenAI & ML Integration

LangChain, RAG, FAISS, BERT, LightGBM

End-to-End AI Pipelines

FastAPI, OCR, python-docx, CI/CD

Technical Expertise

A production-tested toolkit spanning cloud data engineering, GenAI, and end-to-end ML.

☁️

Cloud & Data Engineering

AWS S3AWS RDSAWS RedshiftAWS LambdaAWS EMRAWS GlueAWS DataBrewAWS CloudWatchStep FunctionsTerraformDatabricksSnowflakePySparkApache SparkHadoopHiveKafka
🧠

AI / ML & GenAI

LLMsLangChainHugging FaceRAGVector DatabasesFAISSBERTELECTRALLaMA 3.3Google GeminiScikit-learnLightGBMSHAPNLTKspaCy
🗄️

Databases & Modeling

PostgreSQLMSSQLSQL ServerMongoDBVector DBsData ModelingDatabase AdministrationQuery OptimizationIndexing Strategies
💻

Languages & Backend

PythonScalaJavaRTypeScriptSQLFastAPINext.jsReactJSHTMLCSSJavaScript
🛠️

DevOps & Tooling

DockerGitHub ActionsCI/CDPyMuPDFTesseract OCRpython-docxCursorClaude CodeJupyterVS CodePower BIStreamlit

Professional Journey

AI Operations Analyst

Precise Software Solutions, Inc.

Jan 2026 — PresentActive
Smart Inspections — FDA Form 483 AI Drafting Platform
  • Built an end-to-end AI-assisted FDA Form 483 drafting platform on AWS — uploads stored in S3, infrastructure provisioned via Terraform for secure, compliant deployments.
  • Implemented OCR pipeline (PyMuPDF + Tesseract) for handwritten and typed notes; integrated LangChain + Google Gemini to generate draft observations with 21 CFR citations and evidence.
  • Designed a RAG (Retrieval-Augmented Generation) system over FDA guidance PDFs using FAISS vector search, plus a Title 21 CFR citation service for matching and validating regulatory references.
  • Shipped a document-generation pipeline (python-docx) producing FDA Form 483 and EIR .docx files in official format, integrated into a GitHub Actions CI/CD workflow.
PythonFastAPITypeScriptLangChainPostgreSQLNext.jsGoogle GeminiFAISSPyMuPDFTesseractpython-docxAWS S3Terraform

Data Engineer Intern

Virtusa

Jun 2023 — Oct 2023Internship
  • Scaled real-time data processing pipelines with Scala, AWS Lambda, Apache Spark, and Kafka — supporting a 200% increase in SQL Server query throughput for user-acquisition tracking.
  • Migrated on-prem databases to AWS RDS Multi-AZ with high-availability data modeling; used statistical analysis of system metrics to drive performance and uptime for critical growth operations.
  • Optimized SQL performance via advanced indexing and query restructuring; monitored with AWS CloudWatch — cutting API latency by 40%and accelerating product development cycles.
  • Automated infrastructure with AWS Step Functions, EMR, and Terraform; configured MongoDB and A/B-experiment analysis frameworks within data pipelines to validate model accuracy and data integrity.
ScalaAWS LambdaApache SparkKafkaAWS RDSMongoDBStep FunctionsAWS EMRTerraformAWS CloudWatchSQL Server

Production AI Data Systems

End-to-end AI data platforms — regulatory NLP, clinical ML on Spark, and document intelligence.

CapstoneRAG PipelineFDA Regulatory AIOCR + LLM
Jan 2026

Smart Inspections

Precise Software Solutions, Inc.

End-to-end AI-assisted FDA Form 483 drafting platform on AWS — OCR (PyMuPDF, Tesseract) for handwritten + typed inspection notes, LangChain + Google Gemini for draft generation with 21 CFR citations, RAG over FDA guidance PDFs (FAISS), and python-docx generation matching official format. Wired into GitHub Actions CI/CD; Terraform-provisioned infrastructure.

PythonFastAPILangChainNext.jsGoogle GeminiFAISSPostgreSQLTesseractpython-docxAWS S3Terraform
RAGFAISS Vector Search
21 CFRCitation Validation
OCRHandwritten + Typed
ML on SparkClinical AICI/CD
Nov 2025

ReadmitAI

Diabetes hospital-readmission prediction at scale — processed 101,766 inpatient records on Databricks + AWS EMR with PySpark (ICD-9 grouping, patient-level splitting, imputation, scaling, SMOTE). Trained Logistic Regression, Random Forest, XGBoost, and LightGBM; deployed the winning LightGBM model via CI/CD to AWS Lambda for real-time risk stratification. SHAP explainability surfaces top clinical predictors.

PythonPySparkDatabricksAWS EMRLightGBMSHAPAWS LambdaPower BIStreamlit
0.852ROC-AUC Score
101K+Records (Spark)
SHAPExplainability
NLPDocument AILLM
Aug 2025

DocIE

Modular document-intelligence pipeline combining fine-tuned spaCy, ELECTRA (NER), and BERT (Relation Extraction) for structured information extraction from long documents. Integrated LLaMA-3.3 via Groq API with few-shot prompting for cross-section entity linking, plus RoBERTa-SQuAD2.0 QA for semantic search. Interactive Streamlit UI for real-time ingestion.

PythonspaCyELECTRABERTLLaMA-3.3Groq APIStreamlit
NEREntity Recognition
LLMGroq API (LLaMA 3.3)
QARoBERTa SQuAD 2.0

Academic Publications

International Research Journal of Engineering and Technology (IRJET)May 2023

Research Publication in Engineering & Technology

Sathyapal Reddy Peddakkagari et al.

Published in IRJET — a peer-reviewed international journal covering engineering, technology, and applied sciences. Represents undergraduate research completed during B.Tech at Institute of Aeronautical Engineering.

Academic Background

Current

George Mason University

M.S. Data Analytics Engineering

Aug 2024 — May 2026

GPA
3.96 / 4.0
Fairfax, Virginia, USA

Institute of Aeronautical Engineering

B.Tech Computer Engineering

Aug 2019 — May 2023

GPA
3.78 / 4.0
Hyderabad, India

Certifications

Oracle PL/SQL Developer Certified Professional

OracleJun 2023

Oracle Database SQL Certified Associate

OracleApr 2023

Let's Build
the Next AI Data Platform

Open to AI Data Engineer roles — building production data pipelines on AWS and Databricks, then layering LLMs, RAG, and ML to turn data into shipped intelligence. Let's connect.

Available for AI Data Engineer Roles — May 2026
Location
Fairfax, VA — Open to Relocate