Research Projects

Projects

FAIROS: Disciplinary Improvements: Dark Matter Data Commons - A FAIR and Open Science Infrastructure for Astrophysical Discovery
Source of Support:
National Science Foundation (NSF)
Project Period:
October 1, 2025 - September 30, 2028
Award Number:
2531754
Location of Project:
University of Tennessee, Knoxville
Description:
NEXUS-DM is a next-generation Dark Matter Data Commons designed to make experimental data more transparent, reproducible, and accessible to researchers everywhere. Built on FAIR principles, it provides an open platform for storing, curating, and analyzing dark matter experiment data. The commons offers easy-to-use command-line and Python tools, along with AI and machine learning workflows that help reduce bias, remove noise, and improve calibration for reliable, explainable results. Through open tutorials, Jupyter notebooks, and an ACCESS Affinity Group, NEXUS-DM builds a vibrant community that shares knowledge, fosters collaboration, and drives open science and discovery in one of physics’ most intriguing frontiers.
Web Page:
NEXUS-DM
SAFARI: Scientific Analytics, Forensics, and Reproducibility for Workflows in Cyberinfrastructure (CI)
Source of Support:
National Science Foundation (NSF)
Project Period:
October 1, 2025 - September 30, 2028
Award Number:
2530461
Location of Project:
University of Tennessee, Knoxville
Description:
SAFARI brings forensic insight to scientific computing by embedding data analytics directly into workflow systems. This integration ensures that the data and software scientists use are reliable, reusable, and reproducible. By combining provenance tracking, automated verification, and modular artifact management within the Pegasus Workflow Management System, SAFARI makes complex, AI-driven analyses transparent and trustworthy. Through applications such as soil moisture modeling, irrigation forecasting, and wildfire prevention, SAFARI advances secure, scalable cyberinfrastructure that supports national priorities in AI and data-driven science.
Web Page:
SAFARI
POSE: Phase I: Towards an Open-Source Ecosystem for Accelerating High-Resolution Terrain Parameter Computation in Earth Science Applications
Source of Support:
National Science Foundation (NSF)
Project Period:
June 15, 2025 - May 31, 2026
Award Number:
2449103
Location of Project:
University of Tennessee, Knoxville
Description:
OS-EARTH builds a sustainable, open-source software ecosystem for analyzing the shape and structure of Earth’s terrain. Centered on the GEOtiled platform, it enables fast, accurate, and scalable computation of terrain parameters—critical data for understanding wildfire behavior, soil moisture, and agricultural planning. By making high-performance geospatial tools openly available, OS-EARTH lowers technical barriers and empowers researchers, educators, and decision-makers to turn elevation data into actionable insights. The project cultivates an inclusive community of developers and scientists through transparent governance, training workshops, and open documentation, fostering collaboration and long-term sustainability. Through this open-science ecosystem, OS-EARTH strengthens U.S. leadership in geospatial analytics, environmental modeling, and data-driven decision-making.
Web Page:
OS-EARTH
CSSI: Frameworks: Applying Artificial Intelligence Advances to the Next Generation of Workflow Management on Modern Cyberinfrastructure
Source of Support:
National Science Foundation (NSF)
Project Period:
June 15, 2025 - May 30, 2030
Award Number:
2513101
Location of Project:
University of Tennessee, Knoxville
Description:
PegasusAI is a next-generation, open-source AI-driven workflow management framework that empowers researchers and engineers to harness the full computing continuum—from edge devices to clouds and supercomputers. By embedding artificial intelligence throughout the workflow lifecycle—covering composition, smart scheduling, and real-time adaptation—PegasusAI enables flexible, scalable, and efficient scientific discovery. Built to be extensible, community-driven, and deployable on national cyberinfrastructure platforms, this framework accelerates innovation across disciplines while lowering the barrier to using advanced computational resources.
Web Page:
Pegasus AI
Methodology for Explaining Performance Variations Across Compilers and Compiler Options in HPC Applications
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
January 2, 2025 - December 31, 2025
Location of Project:
University of Tennessee, Knoxville
Description:
Performance-portable programming models like RAJA allow scientists to write applications that run efficiently on a wide range of computer architectures. Yet, achieving consistent performance often depends on compiler decisions that are difficult to observe and explain. This project introduces an explainable performance analysis framework that links compiler behavior to runtime results, helping developers make informed optimization choices. Using Caliper for detailed data collection and Thicket for multi-dimensional analysis, the study applies an iterative methodology across RAJAPerf kernels, open-source benchmarks, and LLNL’s MARBL simulation to advance understanding and efficiency in high-performance computing.
Study Performance Portability of the Vector Particle-In-Cell Project (VPIC) across Architectures (Stage 3)
Source of Support:
Los Alamos National Laboratory
Project Period:
October 17, 2024 - September 1, 2026
Location of Project:
University of Tennessee, Knoxville
Description:
VPIC (Vector Particle-In-Cell) is a high-performance plasma simulation code optimized for the world’s largest supercomputers. Using the Kokkos performance-portability framework, this project enhances VPIC’s efficiency across diverse architectures while preserving portability. Through vectorization, algorithmic refinement, and mixed-precision strategies, it advances scalable plasma simulations for next-generation computing systems.
LLNL-LDRD-Software Stack Development for Next Generation Exascale Platforms
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
March 26, 2024 - September 30, 2026
Location of Project:
University of Tennessee, Knoxville
Description:
Developing a robust, high-performance software stack is key to powering the next generation of exascale supercomputers. The LLNL-LDRD Software Stack Development for Next-Generation Exascale Platforms project builds portable libraries, runtimes, and tools that enable scientific applications to scale efficiently across diverse architectures—from CPUs and GPUs to emerging accelerators. Emphasizing performance portability, modularity, and sustainability, the work advances national computational capabilities and empowers researchers to achieve faster, more reliable scientific discovery on future exascale systems.
Improvement of Checkpointing Performance for Reproducibility Studies
Source of Support:
Argonne National Laboratory ANL
Project Period:
February 26, 2024 - December 31, 2025
Location of Project:
University of Tennessee, Knoxville
Description:
Enhancing the reproducibility of high-performance computing (HPC) applications is essential as computational resources become increasingly diverse and heterogeneous. This projectleverages intermediate checkpoints and hash-based validation with user-defined error bounds to detect divergences early in execution. By organizing checkpoint data through Merkle trees, it efficiently captures significant differences while reducing I/O overhead, enabling reliable and transparent large-scale scientific computing.
SENSORY: Software Ecosystem for Knowledge Discovery - a Data-Driven Framework for Soil Moisture Applications
EAGER: A Comprehensive Approach for Generating, Sharing, Searching, and Using High- Resolution Terrain Parameters
CIF21 DIBBs: PD: Cyberinfrastructure Tools for Precision Agriculture in the 21st Century
Source of Support:
National Science Foundation (NSF)
Project Period:
Jun 01, 2021 - May 30, 2026
Oct 01, 2023 - Sep 30, 2026
Jul 01, 2017 - Oct, 2018
Award Number:
2103845, 2334945, 1724843
Location of Project:
University of Tennessee, Knoxville
Description:
SENSORY brings together advances in large-scale data generation and cloud cyber-infrastructure to build a unified, data-driven software ecosystem for the environmental sciences. From fine-grained soil sensor networks to global satellite measurements, the platform enables seamless analysis, visualization, and knowledge extraction from diverse data collections. By bridging multi-disciplinary communities and delivering actionable insights to real-world applications, SENSORY empowers researchers, practitioners, and decision-makers to tackle environmental challenges with scalable, transparent tools.
Web Page:
SENSORY
Collaborative Research: SHF: Small: Model-driven Design and Optimization of Dataflows for Scientific Applications
Source of Support:
National Science Foundation (NSF)
Project Period:
October 1, 2023 - September 30, 2026
Award Number:
2331152
Location of Project:
University of Tennessee, Knoxville
Description:
A domain-spanning taxonomy maps common data-flow motifs—from simple producer–consumer patterns to complex multi-producer/multi-consumer pipelines—onto real scientific applications, enabling a deeper understanding of data movement across disciplines. This project builds a middleware layer that orchestrates these pipelines across HPC, cloud, and edge platforms using a two-step approach to reduce data loss and minimize inefficiencies in data production and consumption.
Web Page:
RobustScience
SHF: Small: Methods, Workflows, and Data Commons for Reducing Training Costs in Neural Architecture Search on High-Performance Computing Platforms
Source of Support:
National Science Foundation (NSF)
Project Period:
Oct 01, 2022 - Sep 30, 2026
Award Number:
2223704
Location of Project:
University of Tennessee, Knoxville
Description:
Analytics for Neural Networks (A4NN) reduces the computational cost of training neural networks while ensuring their explainability, reproducibility, and near-optimal performance. The project introduces a flexible fitness-prediction method that uses parametric modeling to forecast a network’s future performance, allowing unpromising training runs to be stopped early. By combining analytics and machine learning, A4NN accelerates model discovery, optimizes high-performance computing resources, and promotes sustainable, data-driven AI research.
Web Page:
A4NN
OAC: Piloting the National Science Data Fabric: A Platform Agnostic Testbed for Democratizing Data Delivery
Source of Support:
National Science Foundation (NSF)
Project Period:
Oct 01, 2021 - Mar 31, 2026
Award Number:
2138811
Location of Project:
University of Tennessee, Knoxville
Description:
The National Science Data Fabric (NSDF) is a nationwide testbed advancing the democratization of data-driven science through an equitable, federated cyberinfrastructure platform. By seamlessly connecting storage, compute, and networking resources with an integrated software stack, NSDF provides researchers with scalable, easy-to-use tools for data access, analysis, and sharing. A strong focus on education, outreach, and community engagement ensures inclusive participation across institutions—including a broader pool of universities across USA—building a sustainable and open ecosystem for collaborative scientific discovery.
Web Page:
NSDF
AI TechX
Source of Support:
University of Tennessee’s AI TechX Initiative
Project Period:
Jul 1, 2025 - Jun 30, 2026
Location of Project:
University of Tennessee, Knoxville
Description:
Advancing AI inference performance through hardware–software co-design, this collaboration unites IBM’s expertise in accelerator technologies and AI infrastructure with UTK’s leadership in domain-specific AI applications. The project develops a suite of mini-applications that serve as realistic testbeds for benchmarking, optimization, and evaluation across diverse accelerator architectures. By bridging hardware innovation with application-driven design, it drives breakthroughs in efficiency, scalability, and AI system performance for the broader research community.
ANACIN-X: Analysis and Modeling of Non-determinism and Associated Costs in eXtreme Scale Applications
Source of Support:
National Science Foundation (NSF): CCF
Project Period:
Aug 1, 2019 - Jul 31, 2025
Award Number:
1900888
Location of Project:
University of Tennessee, Knoxville
Description:
ANACIN-X investigates the nondeterminism in MPI-based high-performance computing (HPC) applications, where even runs with identical inputs on the same machine can produce different execution paths, random bugs, or divergent results. By analyzing and modeling these sources of variability, the project quantifies the recording overheads of Record-and-Replay (R&R) tools and develops new strategies to scale them for the exascale era. Through advanced event-graph analysis and trace mining, ANACIN-X advances the reliability, reproducibility, and debuggability of next-generation scientific applications.
Web Page:
Anacin-x
Data-Aware Scheduling with the Convergence of HPC and Cloud
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
Aug 1, 2023 - Dec 31, 2024
Location of Project:
University of Tennessee, Knoxville
Description:
Data-Aware Scheduling with the Convergence of HPC and Cloud develops intelligent scheduling strategies that optimize data locality and performance across complex scientific workflows. Building on LLNL’s DYAD framework, which facilitates data sharing between producer and consumer tasks within the Flux workload management system, the project extends DYAD with runtime performance tracking and data-aware scheduling policies. By integrating data-movement strategies for converged HPC–Cloud environments, this effort enhances workflow efficiency and enables scalable, transparent data sharing across hybrid computing platforms.
Leveraging Kokkos Abstractions to Automate Checkpointing
Source of Support:
Argonne National Laboratory ANL
Project Period:
May 03, 2021 - Dec 31, 2023
Location of Project:
University of Tennessee, Knoxville
Description:
Leveraging Kokkos Abstractions to Automate Checkpointing explores how memory and execution patterns in performance-portable applications can be automatically captured and preserved. By combining Kokkos abstractions with the VELOC checkpointing framework, the project develops efficient methods to ensure data persistence and recovery across diverse hardware platforms. This integration advances fault tolerance and accelerates reproducibility for next-generation high-performance computing applications.
Leverage Containerized Environments for Reproducibility and Traceability of Scientific Workflows - the case study of Analytics for Neural Network Workflows
Source of Support:
Sandia National Laboratories
Project Period:
Jul 15, 2020 – Jul 14, 2022
Location of Project:
University of Tennessee, Knoxville
Description:
Leverage Containerized Environments for Reproducibility and Traceability of Scientific Workflows develops a prototype framework that uses container technologies to improve the transparency and reliability of scientific workflows. By encapsulating each workflow component—data, software, and execution context—within an individual container environments, the project enables automatic metadata collection, clear record trails, and strong links between data and metadata. This approach simplifies the reproduction and verification of results across computing platforms, strengthening trust and traceability in computational science.
Flux Scheduler Specializations: Improving Workflow Performance with Scheduler Structure and Policy Tuning
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
Apr 1, 2020 – Mar 31, 2022
Location of Project:
University of Tennessee, Knoxville
Description:
Flux Scheduler Specializations: Improving Workflow Performance with Scheduler Structure and Policy Tuning investigates how the Flux workload manager can be optimized to enhance workflow performance on large-scale systems. By modeling and tuning scheduler configurations and policies, the project identifies strategies that maintain efficiency even under system stress, such as fragmentation or resource contention. This work strengthens the adaptability and scalability of scientific workflows running on next-generation high-performance computing platforms.
Augmenting Hatchet to support scalability and replicability solutions for HPC applications
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
Aug 1, 2020 – Jul 31, 2023
Location of Project:
University of Tennessee, Knoxville
Description:
Augmenting Hatchet to Support Scalability and Replicability Solutions for HPC Applications enhances the Hatchet performance analysis framework to diagnose and address performance bottlenecks in large-scale scientific applications. Leveraging Hatchet’s powerful query language and analysis tools, the project investigates scalability and replicability challenges in workloads of interest to Lawrence Livermore National Laboratory (LLNL). The resulting tools enable researchers to identify root causes of inefficiencies and improve the reliability and performance of next-generation high-performance computing systems.
Collaborative Research: PPoSS: Planning: Performance Scalability, Trust, and Reproducibility: A Community Roadmap to Robust Science in High-throughput Applications
Source of Support:
National Science Foundation (NSF)
Project Period:
Oct 1, 2020 – Sep 30, 2022
Award Number:
2028923
Location of Project:
University of Tennessee, Knoxville
Description:
Collaborative Research: PPoSS—Performance, Scalability, Trust, and Reproducibility: A Community Roadmap to Robust Science in High-Throughput Applications brings together a cross-disciplinary community to chart a path toward more reliable and reproducible computational science. Through a series of interactive virtual world cafés, participants collaborate to identify challenges and define actionable strategies for achieving robust, high-throughput, and trustworthy scientific workflows. This effort lays the groundwork for a national roadmap to strengthen the performance, scalability, and integrity of data-driven research.
Web Page:
RobustScience
Collaborative Research: EAGER: Advancing Reproducibility in Multi-Messenger Astrophysics
Source of Support:
National Science Foundation (NSF)
Project Period:
Aug 1, 2020 – Aug 31, 2022
Award Number:
2041977
Location of Project:
University of Tennessee, Knoxville
Description:
Collaborative Research: EAGER—Advancing Reproducibility in Multi-Messenger Astrophysics strengthens the foundation of open and reproducible science in one of the most data-intensive frontiers of modern research. By analyzing the reproducibility processes behind landmark discoveries such as LIGO’s gravitational-wave detections and the Event Horizon Telescope’s first black hole image, the project develops best practices, documentation standards, and data-sharing methods for the astrophysics community. These efforts lay the groundwork for a sustainable roadmap toward transparent, verifiable, and collaborative discovery across multi-messenger astronomy.
Study Performance Portability of the Vector Particle-In-Cell Project (VPIC) across architectures
Source of Support:
Los Alamos National Laboratory
Project Period:
May 18, 2020 - July 31, 2024
Location of Project:
University of Tennessee, Knoxville
Description:
Study of Performance Portability of the Vector Particle-In-Cell (VPIC) Code across Architectures examines how plasma simulation performance can be maintained and optimized as computing architectures evolve. The project evaluates how VPIC—a large-scale, high-performance plasma simulation code—adapts to new hardware by analyzing the trade-offs introduced by performance-portability frameworks. Through systematic benchmarking and analysis, the study advances understanding of how to achieve efficient, scalable, and portable performance on next-generation supercomputers.
JDRD: Empowering Training and Validation Stages in Al-Orchestrated Workflows
Source of Support:
Science Alliance - University of Tennessee, Knoxville
Project Period:
Oct 1, 2019 – Sep 30, 2021
Location of Project:
University of Tennessee, Knoxville
Description:
JDRD: Empowering Training and Validation Stages in AI-Orchestrated Workflows advances the design of AI-driven scientific workflows that integrate experimental, computational, and data processing steps across domains. Focusing on the training and deployment of neural networks, the project explores how models trained on clean, simulated data can be adapted to perform reliably on real-world, noisy, and adversarial datasets. By developing and integrating mitigation strategies, the effort strengthens the robustness, trustworthiness, and scientific utility of AI-orchestrated workflows.
EAGER: Reproducibility in Computational and Data-Enabled Science-Paradigms, Practices, and Infrastructure
Source of Support:
National Science Foundation (NSF)
Project Period:
Aug 16, 2019 – Aug 15, 2022
Award Number:
1941443
Location of Project:
University of Tennessee, Knoxville
Description:
EAGER: Reproducibility in Computational and Data-Enabled Science—Paradigms, Practices, and Infrastructure strengthens understanding of how the scientific community can ensure trustworthy and repeatable results in an era defined by large-scale computing and data. Building on the 2019 National Academies report on Reproducibility and Replication in Science, the project examines how its recommendations translate into practical frameworks, tools, and cultural practices for computational and data-driven research. By connecting reproducibility principles to real-world scientific workflows, the effort advances transparent, verifiable, and sustainable discovery across disciplines.
Analytics for Molecular Dynamics (A4MD)
Source of Support:
National Science Foundation (NSF): IIS and Advanced Cyberinfrastructure (OAC)
Project Period:
Jun 1, 2018 – Sep 30, 2023
Award Number:
1841758
Location of Project:
University of Tennessee, Knoxville
Description:
Analytics for Molecular Dynamics (A4MD) addresses the growing data analysis challenges of large-scale molecular dynamics (MD) simulations running on next-generation supercomputers. By integrating machine learning, data analytics, and workflow management with high-performance computing (HPC), A4MD enables the real-time analysis of MD data as it is produced. This interdisciplinary effort accelerates scientific insight, making molecular simulation more scalable, automated, and data-driven for diverse research communities.
Web Page:
Analytics4MD
CIF21 DIBBs: PD: Cyberinfrastructure Tools for Precision Agriculture in the 21st Century
Source of Support:
National Science Foundation (NSF): Advanced Cyberinfrastructure (OAC)
Project Period:
Jul 1, 2017 – Oct, 2018
Award Number:
1724843
Location of Project:
University of Tennessee, Knoxville
Description:
CIF21 DIBBs: Cyberinfrastructure Tools for Precision Agriculture in the 21st Century advances data-driven agriculture through the development of SOMOSPIE—a cyberinfrastructure platform that integrates computer science and environmental data to support precision farming. By combining large-scale datasets on soils, landscapes, climate, and ecosystems with advanced computational and ecoinformatics tools, the project enables real-time analysis and informed decision-making for sustainable agricultural practices. This interdisciplinary effort bridges cyberinfrastructure and environmental science, empowering researchers and farmers to better understand and manage complex agroecosystems.
Web Page:
SOMOSPIE
Study of Data-intensive Workflows on Next-generation Systems with Emphasis on Memory Access
Source of Support:
Sandia National Laboratories
Project Period:
Aug 1, 2019 – Jul 31, 2020
Location of Project:
University of Tennessee, Knoxville
Description:
Study of Data-Intensive Workflows on Next-Generation Systems with Emphasis on Memory Access investigates how memory behavior affects the performance, efficiency, and reproducibility of data-intensive applications on emerging high-performance computing systems. The project develops a C++ suite of mini-applications to measure and analyze memory access patterns, data management costs, and power consumption. These insights inform the design of more efficient, replicable workflows optimized for next-generation architectures.
Moving towards self-adjusting scheduling policies for high performance workflows with Flux’s fully hierarchical scheduling
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
Feb 8, 2019 – Jan 31, 2020
Location of Project:
University of Tennessee, Knoxville
Description:
Moving Towards Self-Adjusting Scheduling Policies for High-Performance Workflows with Flux’s Fully Hierarchical Scheduling explores how adaptive scheduling strategies can improve workflow efficiency on large-scale computing systems. Using the Flux workload manager, the project systematically studies hierarchical scheduling models and develops methods that allow workflows to dynamically select optimal scheduling policies at runtime. This project enhances system utilization, scalability, and performance for diverse scientific applications.
Driving Next-Generation Schedulers with Machine Learning-Based Application Patterns
Source of Support:
Lawrence Livermore National Laboratory
Project Period:
Aug 1, 2018 – Jul 31, 2020
Location of Project:
University of Tennessee, Knoxville
Description:
Driving Next-Generation Schedulers with Machine Learning-Based Application Patterns applies machine learning to improve how HPC schedulers handle irregular and dynamic job behaviors. By identifying and modeling application performance patterns, the project integrates this knowledge into multi-objective scheduling frameworks to enhance system utilization and workflow throughput. Building on prior collaborations with Lawrence Livermore National Laboratory, this work advances adaptive, intelligent scheduling strategies for next-generation supercomputing environments.
Collaborative: EAGER: Exploring and Advancing the State of the Art in Robust Science in Gravitational Wave Physics
Source of Support:
National Science Foundation (NSF): Advanced Cyberinfrastructure (OAC) #1823372
Project Period:
May 31, 2018 – Apr 30, 2020
Award Number:
1841399
Location of Project:
University of Tennessee, Knoxville
Description:
Collaborative: EAGER—Exploring and Advancing the State of the Art in Robust Science in Gravitational Wave Physics strengthens the reliability and reproducibility of discoveries in gravitational wave research. By surveying and analyzing the LIGO scientific workflows—which combine experimental data, large-scale computation, and data processing—the project identifies opportunities to improve workflow transparency, validation, and automation. These insights help shape best practices for robust, data-driven science in one of physics’ most groundbreaking fields.
Building a “Miniature” Version of the ORNL‘s Summit supercomputer for Computational Science Research at UTK
Source of Support:
2019 IBM Global University Program Shared University Research Award
Project Period:
Jun 21, 2019 - June 20, 2024
Location of Project:
University of Tennessee, Knoxville
Description:
Building a “Miniature” Version of ORNL’s Summit Supercomputer for Computational Science Research at UTK expands the university’s capacity for high-performance computing (HPC) research and education. Supported by the IBM Global University Program, the award enabled the acquisition of a supercomputer system that mirrors the architecture of Oak Ridge National Laboratory’s Summit, once the world’s fastest supercomputer. This resource empowers UTK researchers and students to conduct advanced computational experiments, foster innovation, and train the next generation of HPC scientists.