Sanjan Baitalik

Hello, I'm Sanjan.#

I am a final-year B.Tech student in Computer Science & Engineering at the Institute of Engineering & Management, Kolkata (CGPA 9.33/10, top 15%). My research sits at the intersection of representation and transfer learning, trustworthy ML (XAI + robustness), online learning under distribution shift, and computer vision — with applications in precision agriculture and remote sensing.

I currently work as a Research Intern at the University of Nebraska–Lincoln (supervised by Dr. Sruti Das Choudhury) and previously worked as a Research Scholar at the University of Calcutta. I have been featured in UNL's news coverage for my research contributions.

Representation / Transfer Learning Trustworthy ML (XAI + Robustness) Online Learning under Distribution Shift Computer Vision (Hyperspectral / Multimodal)
9.33
CGPA
10
Publications
3+
Years Research
4
Research Roles

Academic Background#

Institute of Engineering & Management (IEM), Kolkata
B.Tech in Computer Science & Engineering · CGPA: 9.33/10
Aug 2022 – Jul 2026 (Expected)

Ranked in the top 15% of class by CGPA.

Where I've Worked#

Research Intern — University of Nebraska, Lincoln, USA
Jun 2025 – Present
  • Co-authored a human-centered XAI study combining clustering, SHAP-driven interpretability, and narrative visualization on agronomic and healthcare datasets.
  • Built an interactive visual-analytics pipeline on UNL greenhouse phenotyping data (42 plants, 9 genotypes, 25 days) coupling temporal embeddings, DTW-based clustering, and SHAP/LIME-linked causal views.
  • Engineered HyperProbe, a lightweight Streamlit-based human-in-the-loop hyperspectral analysis tool (517–1700 nm, 243 bands).
  • Featured in the university's news story (August 2025).
Research Scholar — University of Calcutta
Jan 2025 – Dec 2025
  • Developed FH-FAM, a fuzzy-hypergraph feature-selection algorithm achieving 81.43% accuracy with 89.28% feature reduction across 15 datasets.
  • Proposed SIF-HFAM, a strong intuitionistic fuzzy hypergraph framework with greedy (1−1/e)-approximation guarantee, achieving ~78% accuracy while removing ~98% of features.
Student Research Lead — Generative AI CoE, IEM
Nov 2024 – Present
  • Established and led the Generative AI Center of Excellence (CoE) at IEM, with a total of 15+ active student researchers.
  • Led end-to-end research execution and operations, mentored 6 active teams, and spearheaded ReelBook (Pearson collaboration).
Project Intern — bair.ai (IEM Research Foundation)
Aug 2024 – Mar 2025
  • Built MemeMetric, an end-to-end cluster-based cryptocurrency forecasting system with automated reporting and real-time social-media sentiment signals via NLP.
Undergraduate Research Assistant — IEDC (CSE)
Mar 2024 – Aug 2024
  • Co-authored an IEM-HEALS 2024 paper on pharma stock analysis and built TraderBot, a Flask + MongoDB real-time trading simulator.
Study Abroad — National University of Singapore (NUS)
Jul 2023
  • Studied AI, IoT, Machine Learning & Data Analytics — lectured by Dr. Peter Leong, Dr. Eric Cambria, and others.

Selected Papers#

Garden Path Recovery in Causal and Masked Language Models
Sanjan Baitalik, Rajashik Datta
Confidence as a Tie-Breaker: Reassessing Multilingual Hedging Bias in LLM-as-a-Judge Evaluation
Rajashik Datta, Sanjan Baitalik
ReproPheno and ReproPhenoNet: A Large-Scale Multimodal Benchmark Dataset and Deep Learning Framework for Reproductive-Stage Plant Phenotyping
Sanjan Baitalik, Rajashik Datta, Utsho Banerjee, Rajarshi Karmakar, Vincent Stoerger, Himadri Nath Saha, Sruti Das Choudhury
PlantPhenoLM: Phenotype-Genotype Mapping Inference with Multi-Turn LLM Reasoning and Selective Prediction
Rajashik Datta, Sanjan Baitalik, Amit Kumar Das, Sruti Das Choudhury
Conversation as Belief Revision: GreedySAT Revision for Global Logical Consistency in Multi-Turn LLM Dialogues
Sanjan Baitalik, Rajashik Datta, Amit Kumar Das, Sruti Das Choudhury
Fuzzy Hypergraph Feature Association Map for High-Dimensional Feature Selection in Agriculture and Remote Sensing
Rajashik Datta, Sanjan Baitalik, Sruti Das Choudhury, Arup Kumar Chattopadhyay, Amit Kumar Das
MiQ-MCP: Valid and Conditionally Robust Uncertainty Quantification for High-Frequency Financial Time Series via Mondrian Conformalized Quantile Regression
Sanjan Baitalik, Rajashik Datta, Darothi Sarkar, Ayan Chaudhuri, Alongbar Wary
Enhancing Interpretability Through Clustering, Explainable AI, and Narrative Visualization
Sruti Das Choudhury, Rajashik Datta, Sanjan Baitalik
Explanation-First Agentic Forecaster for Stock Market
Sanket Ghosh, Sanjan Baitalik, Rajashik Datta, Romit Mukherjee, Darothi Sarkar, Ayan Chaudhuri
IEMENTech, 2026
Machine Learning-Driven Insights For Stock Market Analysis And Trading
Sanjan Baitalik, Rajashik Datta, Sanket Ghosh, Darothi Sarkar, Ayan Chaudhuri
IRTM, 2024
The COVID-19 Shock: An Analysis Of Impacts And Responses Of Indian Stock Market
Sanket Ghosh, Sanjan Baitalik, Rajashik Datta, Darothi Sarkar
IRTM, 2024
Is Indian Financial Market Ready for Pandemics?
Rajashik Datta, Sanjan Baitalik, Sanket Ghosh, Saugata Ghosh, Swarnendu Ghosh
IEM-HEALS, 2024
From Graphs to Hypergraphs: Submodular Coverage-Based Feature Selection on Intuitionistic Fuzzy Hypergraphs (SIFHFAM)
Sanjan Baitalik, Rajashik Datta, Arup Kumar Chattopadhyay, Amit Kumar Das, Amlan Chakraborty
Pattern Recognition, 2026 (under review)

Research Projects#

Geodesic Optimal Transport (Transfer Geometry)
  • Implemented sliced-Wasserstein OT diagnostics on frozen ResNet-18 features; benchmarked across 48 transfer settings (CIFAR-10/STL-10/SVHN).
  • Strong correlations with zero-shot transfer (Pearson r ≈ −0.71) and low-data adaptation (Spearman ρ ≈ 0.60 at 200-shot).
Grokking + LoRA (Low-Rank Tax)
  • Controlled modular-addition experiments (p=97) comparing full-parameter vs. LoRA-on-frozen-base training for 15k epochs.
  • Reproduced classic grokking (99% train → 99% val) and quantified rank/LR thresholds.
Volatility-Scaled AdaGrad (Online Learning)
  • Implemented VS-AdaGrad, a CPU-efficient drift-aware online optimizer scaling AdaGrad using discounted residual volatility.
  • Reduced regret proxy vs. AdaGrad by 18.4% (small drift) and 19.8% (medium drift); outperformed tuned OGD by 23.7–63.8%.

Technical Toolkit#

Programming

PythonJavaCMATLAB

ML / AI

PyTorchTensorFlowScikit-learnTransformers

XAI

SHAPLIME

Data

PandasNumPy

Tools

LaTeXGitDockerJupyterTensorBoard

Cloud

Google CloudAWS (S3/EC2)

Get in Touch#

I'm always open to research collaborations and discussions. Feel free to reach out!

rayan.baitalik@gmail.com LinkedIn GitHub Download CV
Flag Counter