Deep Dives

Case Studies

Honest stories from inside every project — the challenges faced, decisions made, and what was learned. Written in plain English so anyone can follow along.

Computer VisionAUC 0.8994

Deepfake Detection System

The first version of the model had an AUC of only 0.52 — essentially random guessing. After extensive debugging, the root cause was found: the class labels were reversed during training (fake=0 when it should be fake=1). This one mistake made the model predict the opposite of what was correct.

01Extracted faces from 400 videos (200 real, 200 fake) using OpenCV Haar cascade — creating 3,200 face crops

02Split dataset 70% training, 15% validation, 15% testing with explicit class ordering fixed

03Phase 1: Froze MobileNetV2 base, trained classification head only — AUC jumped to 0.8634

+3 more steps...

Key Outcome

Final validation AUC of 0.8994 — production-ready accuracy. Works on images, video files, and batch processing. The Grad-CAM visualization shows clients exactly where the manipulation happened, making the AI explainable and trustworthy.

PythonTensorFlowMobileNetV2

Read Full Case Study

Medical AIAUC 0.72

Multi-Disease Medical Diagnosis AI

The NIH Chest X-Ray dataset has 112,000 images, but only 4,999 were available locally. 14 diseases with extreme class imbalance — some diseases like Hernia had very few positive samples compared to negative ones. First training attempt achieved mean AUC of only 0.52.

01Matched 4,999 X-ray images with their multi-label disease annotations from Data_Entry_2017.csv

02Analyzed class distribution — some diseases had less than 1% positive rate

03Implemented weighted BCE loss: weight = total_samples / (2 × disease_positive_count)

+3 more steps...

Key Outcome

Mean AUC 0.72 across 14 diseases on only 4,999 images. This approaches results from published research papers that used 50,000+ images. Best individual disease: Mass detection AUC 0.5774+.

PythonTensorFlowDenseNet121

Read Full Case Study

NLP / LLM

RAG AI Chatbot

Standard chatbots hallucinate answers when asked about specific documents. Businesses need AI that answers questions accurately from their private documents — not from general training data. The challenge was building a system that retrieves the right context before generating an answer.

01Implemented PDF and URL ingestion — any document or website becomes a knowledge base instantly

02Used LangChain text splitter to chunk documents into 500-character segments with 50-character overlap

03Generated embeddings using HuggingFace sentence-transformers (all-MiniLM-L6-v2 — free, fast)

+3 more steps...

Key Outcome

Accurately answers questions from any PDF or website within seconds. Conversation memory allows multi-turn question sessions. The dashboard shows which document chunks were retrieved, making the system fully transparent.

PythonLangChainFAISS

Read Full Case Study

Computer Vision

YOLOv8 Person Detection System

Most object detection systems require expensive GPUs to run in real time. The goal was a person detection system that works on standard laptop CPU hardware — no GPU needed — while still being fast enough for practical use.

01Selected YOLOv8n pre-trained on COCO dataset — person class already included with strong base accuracy

02Ran confidence threshold analysis from 0.1 to 0.9 — found 0.5 as optimal balance

03Built image detection pipeline with bounding boxes, confidence scores, and person count overlay

+3 more steps...

Key Outcome

Achieves approximately 6ms per image on CPU. mAP@0.5 of 0.525 on COCO person class benchmark. Works on any standard laptop without a GPU — making it accessible for real-world deployment.

PythonYOLOv8OpenCV

Read Full Case Study

Deep LearningR² 0.913

Stock Price Prediction LSTM

After training the LSTM model, results were wildly inconsistent between stocks. Tesla R² was 0.9131 but Apple R² was only 0.014 — essentially random. The same architecture, same hyperparameters, completely different outcomes.

01Downloaded 6 years of daily OHLCV data for 5 major stocks via yfinance API

02Built 3-layer stacked LSTM architecture: 128 → 64 → 32 units with dropout regularization

03Used 60-day sliding window sequences — the model sees 60 days to predict the next day

+3 more steps...

Key Outcome

TSLA R² 0.9131, GOOGL R² 0.8324 — strong performance on momentum-driven stocks. Interactive dashboard allows live prediction for any stock symbol worldwide using real-time Yahoo Finance data.

PythonTensorFlowKeras

Read Full Case Study

Computer Vision95%+

Face Mask Detection System

Manufacturing facilities and public spaces needed automated mask compliance monitoring that could run on standard hardware without GPU infrastructure. The system needed to handle 3 classes — mask worn correctly, worn incorrectly, and not worn — with high accuracy.

01Collected dataset with 3 classes: with_mask, without_mask, mask_weared_incorrect

02Applied data augmentation: horizontal flip, rotation (±15°), zoom (0.1), brightness shifts

03Phase 1: Trained MobileNetV2 classification head with base frozen — accuracy ~92%

+3 more steps...

Key Outcome

95%+ accuracy across all 3 classes. Real-time detection at 15+ FPS on CPU. Color-coded alerts make it immediately clear to security personnel when violations occur. Compliance percentage gives management actionable metrics.

PythonTensorFlowMobileNetV2

Read Full Case Study

LLM / Agents

Multi-Agent AI Content System

Marketing agencies spend enormous time manually researching topics, writing content, reviewing quality, and optimizing for SEO — a sequential process that could be fully automated. The challenge was making multiple AI agents work together smoothly without losing context between steps.

01Designed 5 agent roles: Research Agent, Writer Agent, Reviewer Agent, SEO Agent, Orchestrator Agent

02Each agent receives the previous agent's output as context — creating a content pipeline

03Groq Llama 3.3 70B used for Research and Writing (quality priority)

+3 more steps...

Key Outcome

Generates fully researched, written, reviewed, and SEO-optimized articles from any topic in minutes. Live agent logs show the pipeline running in real time — clients see exactly what each agent is doing.

PythonLangChainGroq

Read Full Case Study

Machine LearningAUC 0.8438

Customer Churn Prediction

A telecom company was losing customers without understanding why. The challenge was identifying customers likely to leave before they actually did — allowing the retention team to intervene with targeted offers.

01Loaded 7,043 telecom customer records with 20+ features including usage patterns and contract type

02Cleaned data, handled missing TotalCharges values, encoded categorical variables

03Applied StandardScaler normalization for models sensitive to feature scale

+3 more steps...

Key Outcome

Gradient Boosting achieved best AUC of 0.8438. Feature importance revealed contract type, tenure, and monthly charges as the strongest churn predictors. Single-customer probability predictor allows real-time churn risk scoring.

PythonScikit-learnXGBoost

Read Full Case Study

Machine LearningR² 0.83

House Price Prediction

Raw housing data contains features like total rooms and total bedrooms that are meaningless without context. A house with 20 rooms could be a mansion or a tiny apartment with many residents — the raw number misleads models.

01Loaded California Housing Dataset with 20,640 samples and 8 raw features

02Engineered 4 new features: rooms_per_household, bedrooms_per_room, population_per_household, income_per_room

03Applied log transformation to skewed features and StandardScaler normalization

+3 more steps...

Key Outcome

XGBoost R² 0.83 — explains 83% of price variance. Interactive geographic heatmap shows price patterns across California counties. Custom price predictor allows any property specification to generate a price estimate.

PythonScikit-learnXGBoost

Read Full Case Study

NLP

NLP Sentiment Analysis

Social media text is messy — abbreviations, emojis, slang, misspellings, and sarcasm all challenge standard NLP models. The challenge was building a system robust enough to handle real Twitter/social media text with high accuracy.

01Applied full NLTK text preprocessing: lowercase, remove URLs/mentions/punctuation, stop words, stemming

02Used TF-IDF vectorization with 10,000 most common words as features for ML models

03Trained 4 ML models: Logistic Regression, Naive Bayes, SVM, Random Forest

+3 more steps...

Key Outcome

TF-IDF + Logistic Regression competes with LSTM on short social media text while being 100x faster to train. Word cloud visualization helps clients understand the language patterns in their audience feedback.

PythonTensorFlowLSTM

Read Full Case Study

Deep Learning96-98%

Image Classifier (Cat vs Dog)

The goal was to demonstrate the power difference between building a CNN from scratch versus using transfer learning — and to show exactly why modern AI practitioners almost always choose transfer learning for image classification.

01Loaded 10,000 images (5,000 cats, 5,000 dogs) from Kaggle dataset

02Applied data augmentation: flip, rotation, zoom, brightness shifts to prevent overfitting

03Custom CNN: 3 conv blocks (32-64-128 filters), max pooling, dropout regularization

+3 more steps...

Key Outcome

MobileNetV2 achieves 96–98% test accuracy. Custom CNN achieves ~85%. The transfer learning model uses features learned from 1.4 million ImageNet images, giving it a massive head start over training from scratch.

PythonTensorFlowKeras

Read Full Case Study

Have a Complex AI Challenge?

I solve the hard problems. Let's talk about yours.

Start a Project