Compared to urban areas, Type 2 diabetes has a higher prevalence and diabetes-related mortality rate in rural communities. Agentic Artificial Intelligence (AI) which refers to systems capable of autonomous reasoning, task planning, and adaptive behavior within a defined context can be a solution to this clinical issue. It utilizes a Large Language Model (LLM) that is created to understand human text and generate an understandable response. Using Retrieval-Augmented Generation (RAG), we can further enhance the capability of this framework by retrieving relevant data from a knowledge base to generate an understandable response. However, with current AI pipelines, it is challenging to evaluate every step that leads to an outcome. The objective of this project is to develop a preliminary agentic AI system that focuses on transparency when it comes to predictions, thereby increasing trust with the user and reducing knowledge-drift. In our research, we trained eight different models integrating machine learning (ML), SHapley Additive exPlanations (SHAP) for feature attribution, LLM variants, and RAG pipeline under varying conditions. While ML is providing good accuracy, we are exploring rule-based methods to adapt to the dynamic nature of underlying documents, and varying treatment guidelines thereby responding to patient needs.
Large language models (LLMs) show promise in clinical decision support but are limited by hallucinations and explainability. This project investigates how different retrieval-augmented generation (RAG) architectures can improve the accuracy, transparency, and clinical reliability of diabetes-related responses, with the ultimate goal of developing a deployable clinical model for diabetic care. We implement and compare standard RAG and graph-based RAG systems that integrate the Medical Information Mart for Intensive Care III (MIMIC-III) database with a locally hosted Ollama LLM. Retrieved clinical records and structured relationships are used to ground model outputs in real patient data. The system is evaluated using Phoenix Arize to trace retrieval pathways, visualize evidence chains, quantify hallucination rates, and monitor response accuracy. By grounding responses in verifiable clinical data and enabling transparent reasoning traces, this work contributes to the development of safer and more explainable artificial intelligence systems for healthcare applications. This project will lay the foundation of an agentic AI-based “Diabetes Coach,” a conversational system aimed at supporting adult patients (aged 18 and older) diagnosed with type 2 diabetes.
Artificial Intelligence (AI) agents are transforming healthcare by automating tasks and improving diagnostic precision. Our project focuses on developing an AI-based system specifically to detect extravascular extension of inferior vena cava (IVC) filter struts on CT scans. Although IVC filters are intended to be temporary, prolonged dwell time increases the likelihood of strut penetration beyond the IVC wall. Extravascular extension, defined as filter struts penetrating beyond the IVC wall into surrounding structures, increases the risk of organ injury, pain, bleeding, and complex retrieval. Interventional radiology (IR) practices often rely on manual tracking systems, which are insufficient when patients transfer care or are lost to follow-up. Many patients are unaware a filter remains in place, and new providers may not recognize associated complications. Building on prior research with Mayo Clinic Health System, we aim to enhance an existing deep learning framework to localize filter struts and quantitatively assess their extension relative to the IVC boundary. After segmentation of the IVC, our model will localize filter struts relative to the vessel wall to improve complication detection. The system will also incorporate large language models (LLMs) to process electronic health records (EHRs) and support automated follow-up flagging for safer long-term patient management.
Clinical imaging datasets for analysis of pancreatic cancer increasingly aggregate scans collected under heterogeneous workflows and annotation strategies. Deep learning models for medical image segmentation are typically evaluated using overlap metrics such as Dice scores, which assumes training data is drawn from heterogeneous distributions. While state-of-the-art segmentation frameworks such as nnU-Net achieve strong benchmark performance, little is known about how data provenance influences the anatomical representations learned by these models. Understanding these effects is critical for interpretability, robustness, and safe deployment in clinical settings. This project aims to investigate whether pancreas CT segmentation models trained on different data sources learn systematically different anatomical priors, even when standard accuracy metrics are similar. To evaluate these effects, we train multiple source-specific nnU-Net models on curated subsets of the PANORAMA pancreas dataset that reflect distinct data collection strategies. We will compare outputs via Dice scores and anatomical descriptors such as predicted volume, connected components, centroid location, spatial extent, and voxel-wise inter-model disagreement maps. Ongoing analysis aims to quantify these differences and demonstrate disagreement mapping as a computationally efficient proxy for anatomical uncertainty.
E-textiles are textiles that integrate various materials such as sensors and power sources directly into the fabrics to enable detection and transmission of data. In education, e-textiles can be used to teach students using hands-on demonstrations of concepts involving electronics and coding. This study was created to test the effectiveness of e-textile workshops when used to teach related skills in undergraduate age levels. During our study, we conducted an interactive workshop with 12 students designed to teach basic circuit design and sewing skills to undergraduate students and increase their interest in these topics. Our participants were led through an activity where they used conductive thread and mechanical components to modify a regular fabric glove into an e-textile. During our study, students reported being interested in e-textiles and received high scores on a circuit design and sewing knowledge quiz after the activity. We found a statistically significant increase in several measures, including participants’ self-reported knowledge and enjoyment of circuit design and enjoyment of sewing. Using the data collected from our study, we plan to design a teaching module that could be deployed and further evaluated in a classroom or extra-curricular setting to teach introductory electronics skills at an undergraduate level.
Optimization algorithms like Gradient Descent serve as the engine for Machine Learning, iteratively adjusting model weights to minimize prediction error. While mathematical theory provides rigorous "upper bounds" on how quickly these algorithms should converge, implementation on real-world datasets often encounter numerical hurdles that theory ignores. We investigate this divergence by comparing the empirical performance of a Logistic Regression model trained using a Patient Survival dataset, against its formal mathematical proofs. We focus on “learning-rate” as the primary variable influencing stability and efficiency. Two distinct factors are monitored: convergence of the loss function, and geometric movement of weights through the search space. The overlaying of theoretical convergence curves onto the observed data can identify algorithmic behavior drift from predicted outcomes. Our empirical study results indicate that as the learning rate approaches a critical threshold, the model experiences oscillations that violate the smooth convergence guaranteed by most convex optimization proofs. We present a rigorous comparison of how mathematical ideals hold up under varying hyperparameters, offering a framework for selecting settings that balance computational efficiency with mathematical reliability – a critical factor in domains like healthcare, cybersecurity, and fraud detection. Future work will increase research depth by incorporating additional predictors (categorical and non-categorical) for training and assessing a high-dimensional model.