const dataScientist = {
name: "Ishan Ojha",
role: "Data Scientist & Engineer",
university: "Arizona State University",
gpa: 3.9,
focus: ["ML", "Data Pipelines", "Analytics"]
};Get To Know More
About Me
Education
M.S. Data Science, Analytics & Engineering
Arizona State University
What I Do
Machine Learning, Data Engineering & Advanced Analytics

I'm a Data Scientist and Data Engineer currently pursuing my Master's at Arizona State University with a 3.9 GPA. I build models and data pipelines that turn raw information into measurable business impact — from machine learning systems to production-grade ETL.
- •Strong in ML modeling, causal inference, and statistical experimentation.
- •Experienced with AWS, Spark, Kafka, Airflow, and production data pipelines.
- •Passionate about data-driven products and solving real-world business problems.
What I've Done
Experience
Data Engineer Intern
Rogers Communications
Jan 2023 - Sep 2023
Toronto, Ontario
- •Engineered 20+ behavioral features including session depth, feature adoption velocity, and inactivity decay from 5M+ subscriber records using CTEs, window functions, and stored procedures, producing analysis-ready datasets for downstream fraud detection scoring workflows.
- •Conducted structured EDA in Python on flagged transaction outputs to identify anomalous patterns including sudden spike activity and dormant account reactivations, diagnosing upstream data quality issues and reducing week-over-week reporting inconsistencies by 20%.
- •Ensured reliable execution of daily fraud risk scoring workflows by building Python batch feature pipelines with idempotent logic, automated failure recovery, and run-level monitoring integrated with CloudWatch alerting.
Data Scientist Intern
Loblaw Companies Limited
May 2022 - Jan 2023
Brampton, Ontario
- •Segmented 2.8M+ customer transactions using K-Means and hierarchical clustering, identifying 5 behaviorally distinct groups by region and promotion responsiveness; campaign targeting on these segments drove an 11% lift in response rate and 14% improvement in conversion.
- •Built regression and tree-based models to estimate promotional demand lift, achieving R² of 0.62 on holdout data via 5-fold cross-validation; tracked prediction distributions and model performance metrics in AWS SageMaker, monitoring for degradation before production handoff.
- •Applied hypothesis testing, confidence intervals, and regression to quantify promotional impact across segments; communicated findings to cross-functional data, product, and business stakeholders via Tableau dashboards translating complex analytical results into actionable recommendations.
Data Analyst Intern
HomeStars
Jun 2021 - Jan 2022
Toronto, Ontario
- •Designed and analyzed an A/B test on business onboarding flow variants across 13K businesses using proportion tests and bootstrapped confidence intervals, identifying a statistically significant 9% lift in signup conversion and presenting findings to drive the winning variant into production.
- •Developed and maintained weekly Tableau dashboards tracking geographic and business type distributions across provinces, incorporating stakeholder feedback to iteratively expand metrics; reports escalated to senior leadership to inform regional marketing spend allocation.
- •Standardized and reconciled tens of thousands of inconsistent business signup records by building a Python-based cleaning pipeline resolving duplicates, missing fields, and formatting inconsistencies, producing reliable datasets for downstream reporting pipelines.
Where I Studied
Education
Arizona State University
Masters of Science, Data Science, Analytics & Engineering
Sep 2024 - May 2026
Tempe, Arizona
GPA: 3.9
York University
Bachelors of Arts (Hons), Information Technology
Jan 2020 - Apr 2024
Toronto, Ontario
Explore My
Skills
Languages & Databases
Browse My Recent
Projects
FinSentEval: Financial Sentiment LLM Benchmark
Jan 2026 – Apr 2026
Built a financial sentiment evaluation framework benchmarking FinBERT against zero-shot, few-shot, and RAG-augmented LLMs across 4,840 labeled news sentences, where FAISS retrieval outperformed static few-shot by 6 F1 points on edge cases. A cascading classifier routed low-confidence predictions to a RAG-LLM pipeline and high-confidence ones to FinBERT, escalating only 19% of predictions while cutting inference cost 3.2x.
Credit Risk Modeling Under Macroeconomic Conditions
Jan 2026 – May 2026
ASU FSE 570 capstone training two LightGBM classifiers on LendingClub loan data (2007–2018) — one borrower-level and one augmented with FRED macro series — using Platt scaling and temporal splits to prevent leakage. Bootstrap testing on the AUC difference found macro features yielded no meaningful lift (95% CI: [−0.0033, −0.0023]); recession stress testing shifted mean predicted default probability from 23.3% to 20.4%.
Banking Risk & AML Detection Models
Jan 2025 – May 2025
Built a probability-of-default model using Weight of Evidence binning and Basel II-aligned scorecard methodology, deployed as a Flask REST API on AWS Lambda + API Gateway with S3 logging and EventBridge-triggered CloudWatch alerts. The AML pipeline combined rule-based baselines with an Isolation Forest, LOF, and autoencoder ensemble, improving Average Precision from 0.08 to 0.16 (Gradient Boosting: AUC 0.78, KS 0.42).
Support Ticket Classification Using BERT Fine-Tuning
Sep 2025 – Dec 2025
Built a multi-class pipeline categorizing 27K support tickets across billing, technical, account, and cancellation categories using the Bitext dataset, establishing a TF-IDF + Logistic Regression baseline at F1 0.81. Fine-tuned bert-base-uncased with HuggingFace Transformers and weighted cross-entropy, reaching F1 0.93 on the held-out set — a 15% relative improvement.
Marketing Uplift Modeling & Causal Inference
Sep 2024 – Dec 2024
Engineered recency, frequency, and spend features from the Hillstrom Email dataset (64K customers), estimating Average Treatment Effect via logistic regression with treatment interaction terms, validated through bootstrapped confidence intervals. T-Learner and uplift decision tree models surfaced heterogeneous effects — top-decile segments showed 18% higher incremental conversion versus random targeting, evaluated with Qini curves and uplift-at-k.
NHL Analytics Data Warehouse (Databricks)
Sep 2025 – Dec 2025
Architected a Medallion Lakehouse (Bronze/Silver/Gold) in Databricks processing 20M+ records with distributed PySpark ETL, Hive-style partitioning, and tuned shuffle sizing, reducing season-level aggregation query times by 35%. Orchestrated reproducible seasonal refreshes via Airflow DAGs with backfill and failure alerting, enforcing end-to-end lineage and schema contracts with Unity Catalog.
E-Commerce Analytics Pipeline (dbt + Airflow)
Jan 2025 – May 2025
Architected a layered dbt Core pipeline (staging, intermediate, marts) over 30M+ order line items, producing a star schema with 2 fact and 3 dimension tables in PostgreSQL and enforcing referential integrity via dbt relationship tests. Engineered idempotent ingestion via Azure Data Factory into Blob Storage, orchestrated through Airflow on Astronomer Astro with staged execution, test gates, and custom macros across 200K+ users.
Real-Time Stock Price Forecasting Pipeline
Feb 2024 – Aug 2024
Built a real-time forecasting system applying ARIMA to live Alpaca Markets WebSocket tick data across 2.5M daily tick events, evaluating accuracy with MAE/RMSE and monitoring prediction drift for model degradation. Engineered streaming infrastructure with Kafka and Spark Structured Streaming to cut data latency from end-of-day batch to sub-5-minute windows, with checkpointing, offset management, and automated replay for recoverability.
Get in Touch
Contact Me
I'm always open to discussing data science roles, engineering projects, or collaboration. Reach out through any of the channels below.



