Shaheen Nabi

Research Engineer · India

Hi! 👋 I'm Shaheen, a Research Engineer based in India, working on reasoning and thinking models — understanding how large language models perform multi-step reasoning and how training and post-training methods can improve their reliability, efficiency, and scalability.

I study reasoning across the full model development pipeline: from pre-training data composition and architectural choices to the post-training systems that shape model behavior. My primary focus is the post-training stack — supervised fine-tuning (SFT), preference optimization, RL-based training (including RLVR and related approaches), and test-time compute strategies such as agentic scaffolding that enable models to reason more effectively without requiring larger models.

Alongside training methods, I'm interested in the interpretability of reasoning models — studying the internal mechanisms and representations that support multi-step reasoning, and using mechanistic analysis to diagnose failures such as shortcut reasoning, reward hacking, or unfaithful chain-of-thought.

My goal is to build and study systems that make reasoning models more reliable, more efficient, and easier to scale — combining post-training, inference-time reasoning strategies, rigorous evaluation and benchmarking, and mechanistic analysis to better understand how reasoning emerges and how it can be improved in practice.

I contribute to open source through code and writing, and am actively working toward contributions to large-scale LLM infrastructure and post-training frameworks, with a focus on end-to-end implementations of reasoning-focused training pipelines.

Shaheen Nabi

Research Interests

Outputs

Open-source repositories, writing, and models — in lieu of publications.

2025 · Writing · Substack
Reinforcement Learning Foundations
A technical introduction to the mathematical foundations of RL — MDPs, Bellman equations, policy gradients, and value functions — written for researchers entering the field from a supervised learning background.
2025 · Open-Source · GitHub
Reinforcement Learning: Zero to Hero
A maintained RL repository — MDPs through PPO and DDPG — with clean implementations and math annotations. Written for practitioners building the intuition needed for modern post-training research.
2025 · Implementation · Architecture
Attention Variants from Scratch — GQA & MLA
PyTorch implementations of Grouped-Query Attention and Multi-Head Latent Attention. GQA shares KV heads across query groups to reduce cache memory. MLA (as in DeepSeek) compresses KV into a low-rank latent space and up-projects at inference, decoupling cache footprint from model width.
2024 · Fine-Tuning · Open Source
Instruction Fine-Tuning of LLaMA 3.2 3B — Kannada
LoRA + 4-bit QLoRA fine-tune of Meta LLaMA 3.2-3B Instruct on 390K Kannada instruction pairs. Merged to FP16 for deployment. Released on Hugging Face Hub; used by hundreds of developers monthly for regional NLP applications.
2025 · Applied ML · Dataset
LeafLogic — Agricultural Object Detection & Multi-Agent Pipeline
YOLOv5 detection pipeline (NVIDIA A100) for 100+ crop species. Multi-agent framework for autonomous post-detection research and reporting. Deployed on AWS ECR/EC2. Open-sourced a 25K annotated image dataset on Hugging Face, used by researchers globally.
Open source. I contribute through code and writing, and am actively working toward pushing contributions into large LLM infrastructure frameworks and post-training pipelines at scale — and actively document my journey so it can help others along the way.

Experience

2025 — Present
Research Engineer, Independent
India
Post-training pipelines, reinforcement learning for LLMs, reasoning model research, and inference-time compute. Implementing training objectives and architectures from frontier research. Building toward contributions to large-scale LLM infrastructure.
Jan – Mar 2025
Data Science Intern
iNeuron.ai · Bengaluru, India
Object detection pipelines on NVIDIA A100. Open-sourced a 25K annotated image dataset. Designed cloud inference infrastructure on AWS ECR, EC2, and Jenkins.
2022
Founder
Lasso Pacific Pvt Ltd · Anantnag, J&K, India
AI and robotics edtech platform for rural learners. Reached 2M+ annual visitors organically. Closed after one year; proceeds reinvested into free tech literacy programs.

Education

2025 – 2028
Bachelor of Arts
Indira Gandhi National Open University
2021 – 2022
Full Stack Data Science
iNeuron Intelligence
2021 – 2023
High School Diploma — Mathematics & Computer Science
J&K Board of School Education