Shaheen Nabi

Research Engineer · India

Hi! 👋 I'm Shaheen, a Research Engineer based in India, working on reasoning and thinking models — understanding how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.

I study reasoning across the full model development pipeline: from pre-training data composition and architectural choices to the post-training systems that shape model behavior. My primary focus is the post-training stack — supervised fine-tuning (SFT), preference optimization, RL-based training (including RLVR and related approaches), and test-time compute strategies such as agentic scaffolding that enable models to reason more effectively without requiring larger models.

Alongside training methods, I'm interested in the interpretability of reasoning models — studying the internal mechanisms and representations that support multi-step reasoning, and using mechanistic analysis to diagnose failures such as shortcut reasoning, reward hacking, or unfaithful chain-of-thought.

My goal is to build and study systems that make reasoning models more reliable, more efficient, and easier to scale — combining post-training, inference-time reasoning strategies, rigorous evaluation and benchmarking, and mechanistic analysis to better understand how reasoning emerges and how it can be improved in practice.

I contribute to open source through code and writing, and am actively working toward contributions to large-scale LLM infrastructure and post-training frameworks, with a focus on end-to-end implementations of reasoning-focused training pipelines.

Research Interests

Scaling Foundation Models Data composition, quality, and curation at scale. How pre-training decisions — compute allocation, data mixtures, and training dynamics — shape the capabilities available to post-training.
Continual & Mid-Training How continued pre-training and mid-training interventions affect the reasoning qualities that emerge in later stages. The relationship between training phase, data distribution, and downstream reasoning behavior.
Architectures for Reasoning How architectural choices — hybrid MoE, state space models, and dense-sparse combinations — affect reasoning capabilities. Studying new open model families to understand what architectural decisions make models more or less amenable to post-training reasoning improvements.
Post-Training Pipelines RLVR, SFT, and preference optimization. How reward signals, data quality, and training dynamics in post-training interact with the capabilities laid down during pre-training.
Test-Time Compute & Efficient Reasoning Making models think better at lower cost. Process reward models, adaptive compute budgeting, and search-guided generation. The core question: how do we build models that know when to stop thinking, not just how to think longer.
Interpretability of Reasoning Studying the internal mechanisms and representations that support multi-step reasoning. Using mechanistic analysis to diagnose failure modes — shortcut reasoning, reward hacking, and unfaithful chain-of-thought — and understand how reasoning emerges in large models.
Emerging Reasoning Architectures Actively following new directions: small reasoning models, hierarchical reasoning architectures, and novel thinking paradigms. Interested in how these approaches can inform the next generation of frontier reasoning models and open new possibilities on hard benchmarks.

Outputs

Open-source repositories, writing, and models — in lieu of publications.

2025 · Writing · Substack

Reinforcement Learning Foundations

A technical introduction to the mathematical foundations of RL — MDPs, Bellman equations, policy gradients, and value functions — written for researchers entering the field from a supervised learning background.

Read ↗

2025 · Open-Source · GitHub

Reinforcement Learning: Zero to Hero

A maintained RL repository — MDPs through PPO and DDPG — with clean implementations and math annotations. Written for practitioners building the intuition needed for modern post-training research.

GitHub ↗

2025 · Implementation · Architecture

Attention Variants from Scratch — GQA & MLA

PyTorch implementations of Grouped-Query Attention and Multi-Head Latent Attention. GQA shares KV heads across query groups to reduce cache memory. MLA (as in DeepSeek) compresses KV into a low-rank latent space and up-projects at inference, decoupling cache footprint from model width.

GitHub ↗

2024 · Fine-Tuning · Open Source

Instruction Fine-Tuning of LLaMA 3.2 3B — Kannada

LoRA + 4-bit QLoRA fine-tune of Meta LLaMA 3.2-3B Instruct on 390K Kannada instruction pairs. Merged to FP16 for deployment. Released on Hugging Face Hub; used by hundreds of developers monthly for regional NLP applications.

GitHub ↗ Model ↗

2025 · Applied ML · Dataset

LeafLogic — Agricultural Object Detection & Multi-Agent Pipeline

YOLOv5 detection pipeline (NVIDIA A100) for 100+ crop species. Multi-agent framework for autonomous post-detection research and reporting. Deployed on AWS ECR/EC2. Open-sourced a 25K annotated image dataset on Hugging Face, used by researchers globally.

GitHub ↗ Dataset ↗

Open source. I contribute through code and writing, and am actively working toward pushing contributions into large LLM infrastructure frameworks and post-training pipelines at scale — and actively document my journey so it can help others along the way.

Experience

2025 — Present

Research Engineer, Independent

India

Post-training pipelines, reinforcement learning for LLMs, reasoning model research, and inference-time compute. Implementing training objectives and architectures from frontier research. Building toward contributions to large-scale LLM infrastructure.

Jan – Mar 2025

Data Science Intern

iNeuron.ai · Bengaluru, India

Object detection pipelines on NVIDIA A100. Open-sourced a 25K annotated image dataset. Designed cloud inference infrastructure on AWS ECR, EC2, and Jenkins.

2022

Founder

Lasso Pacific Pvt Ltd · Anantnag, J&K, India

AI and robotics edtech platform for rural learners. Reached 2M+ annual visitors organically. Closed after one year; proceeds reinvested into free tech literacy programs.

Education

2025 – 2028

Bachelor of Arts

Indira Gandhi National Open University

2021 – 2022

Full Stack Data Science

iNeuron Intelligence

2021 – 2023

High School Diploma — Mathematics & Computer Science

J&K Board of School Education