const dataScientist = {
  name: "Ishan Ojha",
  role: "Data Scientist & Engineer",
  university: "Arizona State University",
  gpa: 3.9,
  focus: ["ML", "Data Pipelines", "Analytics"]
};

Hello, I'm

Ishan Ojha

Data Scientist & Engineer

Get To Know More

About Me

Education

M.S. Data Science, Analytics & Engineering

Arizona State University

GPA: 3.9 / 4.0Tempe, AZCurrent

What I Do

Machine Learning, Data Engineering & Advanced Analytics

Machine LearningData PipelinesCloudAnalytics
Portrait of Ishan Ojha

I'm a Data Scientist and Data Engineer currently pursuing my Master's at Arizona State University with a 3.9 GPA. I build models and data pipelines that turn raw information into measurable business impact — from machine learning systems to production-grade ETL.

  • Strong in ML modeling, causal inference, and statistical experimentation.
  • Experienced with AWS, Spark, Kafka, Airflow, and production data pipelines.
  • Passionate about data-driven products and solving real-world business problems.

What I've Done

Experience

Data Engineer Intern

Rogers Communications

Jan 2023 - Sep 2023

Toronto, Ontario

  • Engineered 20+ behavioral features including session depth, feature adoption velocity, and inactivity decay from 5M+ subscriber records using CTEs, window functions, and stored procedures, producing analysis-ready datasets for downstream fraud detection scoring workflows.
  • Conducted structured EDA in Python on flagged transaction outputs to identify anomalous patterns including sudden spike activity and dormant account reactivations, diagnosing upstream data quality issues and reducing week-over-week reporting inconsistencies by 20%.
  • Ensured reliable execution of daily fraud risk scoring workflows by building Python batch feature pipelines with idempotent logic, automated failure recovery, and run-level monitoring integrated with CloudWatch alerting.

Data Scientist Intern

Loblaw Companies Limited

May 2022 - Jan 2023

Brampton, Ontario

  • Segmented 2.8M+ customer transactions using K-Means and hierarchical clustering, identifying 5 behaviorally distinct groups by region and promotion responsiveness; campaign targeting on these segments drove an 11% lift in response rate and 14% improvement in conversion.
  • Built regression and tree-based models to estimate promotional demand lift, achieving R² of 0.62 on holdout data via 5-fold cross-validation; tracked prediction distributions and model performance metrics in AWS SageMaker, monitoring for degradation before production handoff.
  • Applied hypothesis testing, confidence intervals, and regression to quantify promotional impact across segments; communicated findings to cross-functional data, product, and business stakeholders via Tableau dashboards translating complex analytical results into actionable recommendations.

Data Analyst Intern

HomeStars

Jun 2021 - Jan 2022

Toronto, Ontario

  • Designed and analyzed an A/B test on business onboarding flow variants across 13K businesses using proportion tests and bootstrapped confidence intervals, identifying a statistically significant 9% lift in signup conversion and presenting findings to drive the winning variant into production.
  • Developed and maintained weekly Tableau dashboards tracking geographic and business type distributions across provinces, incorporating stakeholder feedback to iteratively expand metrics; reports escalated to senior leadership to inform regional marketing spend allocation.
  • Standardized and reconciled tens of thousands of inconsistent business signup records by building a Python-based cleaning pipeline resolving duplicates, missing fields, and formatting inconsistencies, producing reliable datasets for downstream reporting pipelines.

Where I Studied

Education

Arizona State University

Masters of Science, Data Science, Analytics & Engineering

Sep 2024 - May 2026

Tempe, Arizona

GPA: 3.9

York University

Bachelors of Arts (Hons), Information Technology

Jan 2020 - Apr 2024

Toronto, Ontario

Explore My

Skills

Languages & Databases

PythonSQLBashPostgreSQLMySQLOracle PL/SQLAmazon Redshift

Browse My Recent

Projects

FinSentEval: Financial Sentiment LLM Benchmark

Jan 2026 – Apr 2026

Built a financial sentiment evaluation framework benchmarking FinBERT against zero-shot, few-shot, and RAG-augmented LLMs across 4,840 labeled news sentences, where FAISS retrieval outperformed static few-shot by 6 F1 points on edge cases. A cascading classifier routed low-confidence predictions to a RAG-LLM pipeline and high-confidence ones to FinBERT, escalating only 19% of predictions while cutting inference cost 3.2x.

FinBERTRAGFAISSLLM Eval

Credit Risk Modeling Under Macroeconomic Conditions

Jan 2026 – May 2026

ASU FSE 570 capstone training two LightGBM classifiers on LendingClub loan data (2007–2018) — one borrower-level and one augmented with FRED macro series — using Platt scaling and temporal splits to prevent leakage. Bootstrap testing on the AUC difference found macro features yielded no meaningful lift (95% CI: [−0.0033, −0.0023]); recession stress testing shifted mean predicted default probability from 23.3% to 20.4%.

LightGBMCalibrationStress Testing

Banking Risk & AML Detection Models

Jan 2025 – May 2025

Built a probability-of-default model using Weight of Evidence binning and Basel II-aligned scorecard methodology, deployed as a Flask REST API on AWS Lambda + API Gateway with S3 logging and EventBridge-triggered CloudWatch alerts. The AML pipeline combined rule-based baselines with an Isolation Forest, LOF, and autoencoder ensemble, improving Average Precision from 0.08 to 0.16 (Gradient Boosting: AUC 0.78, KS 0.42).

WoE ScorecardIsolation ForestAWS Lambda

Support Ticket Classification Using BERT Fine-Tuning

Sep 2025 – Dec 2025

Built a multi-class pipeline categorizing 27K support tickets across billing, technical, account, and cancellation categories using the Bitext dataset, establishing a TF-IDF + Logistic Regression baseline at F1 0.81. Fine-tuned bert-base-uncased with HuggingFace Transformers and weighted cross-entropy, reaching F1 0.93 on the held-out set — a 15% relative improvement.

BERTHuggingFaceNLP

Marketing Uplift Modeling & Causal Inference

Sep 2024 – Dec 2024

Engineered recency, frequency, and spend features from the Hillstrom Email dataset (64K customers), estimating Average Treatment Effect via logistic regression with treatment interaction terms, validated through bootstrapped confidence intervals. T-Learner and uplift decision tree models surfaced heterogeneous effects — top-decile segments showed 18% higher incremental conversion versus random targeting, evaluated with Qini curves and uplift-at-k.

Causal InferenceT-LearnerQini Curves

NHL Analytics Data Warehouse (Databricks)

Sep 2025 – Dec 2025

Architected a Medallion Lakehouse (Bronze/Silver/Gold) in Databricks processing 20M+ records with distributed PySpark ETL, Hive-style partitioning, and tuned shuffle sizing, reducing season-level aggregation query times by 35%. Orchestrated reproducible seasonal refreshes via Airflow DAGs with backfill and failure alerting, enforcing end-to-end lineage and schema contracts with Unity Catalog.

MedallionPySparkUnity CatalogAirflow

E-Commerce Analytics Pipeline (dbt + Airflow)

Jan 2025 – May 2025

Architected a layered dbt Core pipeline (staging, intermediate, marts) over 30M+ order line items, producing a star schema with 2 fact and 3 dimension tables in PostgreSQL and enforcing referential integrity via dbt relationship tests. Engineered idempotent ingestion via Azure Data Factory into Blob Storage, orchestrated through Airflow on Astronomer Astro with staged execution, test gates, and custom macros across 200K+ users.

dbt CoreAirflowAzureStar Schema

Real-Time Stock Price Forecasting Pipeline

Feb 2024 – Aug 2024

Built a real-time forecasting system applying ARIMA to live Alpaca Markets WebSocket tick data across 2.5M daily tick events, evaluating accuracy with MAE/RMSE and monitoring prediction drift for model degradation. Engineered streaming infrastructure with Kafka and Spark Structured Streaming to cut data latency from end-of-day batch to sub-5-minute windows, with checkpointing, offset management, and automated replay for recoverability.

KafkaSpark StreamingARIMA

Get in Touch

Contact Me

I'm always open to discussing data science roles, engineering projects, or collaboration. Reach out through any of the channels below.