Sanjan Baitalik — Research Portfolio

About

Hello, I'm Sanjan.#

I am a final-year B.Tech student in Computer Science & Engineering at the Institute of Engineering & Management, Kolkata (CGPA 9.33/10, top 15%). My research sits at the intersection of representation and transfer learning, trustworthy ML (XAI + robustness), online learning under distribution shift, and computer vision — with applications in precision agriculture and remote sensing.

I currently work as a Research Intern at the University of Nebraska–Lincoln (supervised by Dr. Sruti Das Choudhury) and previously worked as a Research Scholar at the University of Calcutta. I have been featured in UNL's news coverage for my research contributions.

Representation / Transfer Learning Trustworthy ML (XAI + Robustness) Online Learning under Distribution Shift Computer Vision (Hyperspectral / Multimodal)

9.33

CGPA

Publications

Years Research

Research Roles

Education

Academic Background#

Institute of Engineering & Management (IEM), Kolkata

B.Tech in Computer Science & Engineering · CGPA: 9.33/10

Aug 2022 – Jul 2026 (Expected)

Ranked in the top 15% of class by CGPA.

Research Experience

Where I've Worked#

Research Intern — University of Nebraska, Lincoln, USA

Supervisor: Dr. Sruti Das Choudhury

Jun 2025 – Present

Co-authored a human-centered XAI study combining clustering, SHAP-driven interpretability, and narrative visualization on agronomic and healthcare datasets.
Built an interactive visual-analytics pipeline on UNL greenhouse phenotyping data (42 plants, 9 genotypes, 25 days) coupling temporal embeddings, DTW-based clustering, and SHAP/LIME-linked causal views.
Engineered HyperProbe, a lightweight Streamlit-based human-in-the-loop hyperspectral analysis tool (517–1700 nm, 243 bands).
Featured in the university's news story (August 2025).

Research Scholar — University of Calcutta

Supervisors: Dr. Arup Kumar Chattopadhyay, Prof. Amit Kumar Das, Prof. Amlan Chakrabarti

Jan 2025 – Dec 2025

Developed FH-FAM, a fuzzy-hypergraph feature-selection algorithm achieving 81.43% accuracy with 89.28% feature reduction across 15 datasets.
Proposed SIF-HFAM, a strong intuitionistic fuzzy hypergraph framework with greedy (1−1/e)-approximation guarantee, achieving ~78% accuracy while removing ~98% of features.

Student Research Lead — Generative AI CoE, IEM

Nov 2024 – Present

Established and led the Generative AI Center of Excellence (CoE) at IEM, with a total of 15+ active student researchers.
Led end-to-end research execution and operations, mentored 6 active teams, and spearheaded ReelBook (Pearson collaboration).

Project Intern — bair.ai (IEM Research Foundation)

Aug 2024 – Mar 2025

Built MemeMetric, an end-to-end cluster-based cryptocurrency forecasting system with automated reporting and real-time social-media sentiment signals via NLP.

Undergraduate Research Assistant — IEDC (CSE)

Mar 2024 – Aug 2024

Co-authored an IEM-HEALS 2024 paper on pharma stock analysis and built TraderBot, a Flask + MongoDB real-time trading simulator.

Study Abroad — National University of Singapore (NUS)

Jul 2023

Studied AI, IoT, Machine Learning & Data Analytics — lectured by Dr. Peter Leong, Dr. Eric Cambria, and others.

Publications

Selected Papers#

Published / Accepted

Garden Path Recovery in Causal and Masked Language Models

Sanjan Baitalik, Rajashik Datta

ACL Student Research Workshop, 2026

PDF Code

Confidence as a Tie-Breaker: Reassessing Multilingual Hedging Bias in LLM-as-a-Judge Evaluation

Rajashik Datta, Sanjan Baitalik

ACL Student Research Workshop, 2026

PDF Code

ReproPheno and ReproPhenoNet: A Large-Scale Multimodal Benchmark Dataset and Deep Learning Framework for Reproductive-Stage Plant Phenotyping

Sanjan Baitalik, Rajashik Datta, Utsho Banerjee, Rajarshi Karmakar, Vincent Stoerger, Himadri Nath Saha, Sruti Das Choudhury

AAAI AgriAI Workshop, 2026

PDF Code

PlantPhenoLM: Phenotype-Genotype Mapping Inference with Multi-Turn LLM Reasoning and Selective Prediction

Rajashik Datta, Sanjan Baitalik, Amit Kumar Das, Sruti Das Choudhury

AAAI Bridge on Logic & AI, 2026

PDF Code

Conversation as Belief Revision: GreedySAT Revision for Global Logical Consistency in Multi-Turn LLM Dialogues

Sanjan Baitalik, Rajashik Datta, Amit Kumar Das, Sruti Das Choudhury

AAAI Bridge on Logic & AI, 2026

PDF Code

Fuzzy Hypergraph Feature Association Map for High-Dimensional Feature Selection in Agriculture and Remote Sensing

Rajashik Datta, Sanjan Baitalik, Sruti Das Choudhury, Arup Kumar Chattopadhyay, Amit Kumar Das

International Journal of Fuzzy Systems, 2026

PDF

MiQ-MCP: Valid and Conditionally Robust Uncertainty Quantification for High-Frequency Financial Time Series via Mondrian Conformalized Quantile Regression

Sanjan Baitalik, Rajashik Datta, Darothi Sarkar, Ayan Chaudhuri, Alongbar Wary

Computational Economics, 2026

PDF Code

Enhancing Interpretability Through Clustering, Explainable AI, and Narrative Visualization

Sruti Das Choudhury, Rajashik Datta, Sanjan Baitalik

Information, 2025

Explanation-First Agentic Forecaster for Stock Market

Sanket Ghosh, Sanjan Baitalik, Rajashik Datta, Romit Mukherjee, Darothi Sarkar, Ayan Chaudhuri

IEMENTech, 2026

PDF

Machine Learning-Driven Insights For Stock Market Analysis And Trading

Sanjan Baitalik, Rajashik Datta, Sanket Ghosh, Darothi Sarkar, Ayan Chaudhuri

IRTM, 2024

The COVID-19 Shock: An Analysis Of Impacts And Responses Of Indian Stock Market

Sanket Ghosh, Sanjan Baitalik, Rajashik Datta, Darothi Sarkar

IRTM, 2024

Is Indian Financial Market Ready for Pandemics?

Rajashik Datta, Sanjan Baitalik, Sanket Ghosh, Saugata Ghosh, Swarnendu Ghosh

IEM-HEALS, 2024

PDF

Submitted / Under Review

From Graphs to Hypergraphs: Submodular Coverage-Based Feature Selection on Intuitionistic Fuzzy Hypergraphs (SIFHFAM)

Sanjan Baitalik, Rajashik Datta, Arup Kumar Chattopadhyay, Amit Kumar Das, Amlan Chakraborty

Pattern Recognition, 2026 (under review)

Projects

Research Projects#

Geodesic Optimal Transport (Transfer Geometry)

Implemented sliced-Wasserstein OT diagnostics on frozen ResNet-18 features; benchmarked across 48 transfer settings (CIFAR-10/STL-10/SVHN).
Strong correlations with zero-shot transfer (Pearson r ≈ −0.71) and low-data adaptation (Spearman ρ ≈ 0.60 at 200-shot).

Code

Grokking + LoRA (Low-Rank Tax)

Controlled modular-addition experiments (p=97) comparing full-parameter vs. LoRA-on-frozen-base training for 15k epochs.
Reproduced classic grokking (99% train → 99% val) and quantified rank/LR thresholds.

Code

Volatility-Scaled AdaGrad (Online Learning)

Implemented VS-AdaGrad, a CPU-efficient drift-aware online optimizer scaling AdaGrad using discounted residual volatility.
Reduced regret proxy vs. AdaGrad by 18.4% (small drift) and 19.8% (medium drift); outperformed tuned OGD by 23.7–63.8%.

Code

Skills

Technical Toolkit#

Programming

PythonJavaCMATLAB

ML / AI

PyTorchTensorFlowScikit-learnTransformers

XAI

SHAPLIME

Data

PandasNumPy

Tools

LaTeXGitDockerJupyterTensorBoard

Cloud

Google CloudAWS (S3/EC2)

Contact

Get in Touch#

I'm always open to research collaborations and discussions. Feel free to reach out!

rayan.baitalik@gmail.com LinkedIn GitHub Download CV