SAFARI: Scientific Analytics, Forensics, and Reproducibility for Workflows in Cyberinfrastructures

Funded by the National Science Foundation (NSF) under grant numbers #2530461

About SAFARI

SAFARI (Scientific Analytics, Forensics, and Reproducibility for Workflows in CI) embeds forensic data analytics into cyberinfrastructure (CI) services to ensure reliability, integrity, and reproducibility of scientific workflows, particularly in Earth science but with broad applicability across domains.
Establish Trustworthiness of Scientific Workflows: Integrate forensic data analytics to validate provenance, detect vulnerabilities, and confirm the authenticity and integrity of data and computational steps.
Ensure Reproducibility of Results: Embed provenance tracking, annotation, and verification frameworks to guarantee that scientific findings are consistent, transparent, and replicable even under non-determinism and platform variability.
Enhance Reusability of Workflow Components: Break down complex, monolithic workflows into modular, containerized building blocks that can be reused across high-throughput computing platforms and diverse science domains.
Advance Robust and Resilient Open Science: Deliver a suite of CI services—including artifact repositories, scalable automation, and open documentation—while engaging Earth scientists, computational experts, and ACCESS affinity groups to foster community adoption and workforce training.

SAFARI and FAIR

SAFARI extends the FAIR principles by embedding forensic data analytics into workflows, ensuring that scientific artifacts are:
Findable – Provides a searchable artifact commons of workflows, tools, and data containers enriched with provenance metadata, making scientific artifacts easy to discover and trace.
Interoperable – Uses modular, containerized components and metadata-rich abstractions to enable workflows to run seamlessly across high-throughput, cloud, and HPC platforms.
Accessible – Embeds forensic data analytics directly into Pegasus workflows, ensuring transparent documentation, open practices, and reproducibility services available to the community.
Reusable – Delivers validated, provenance-tracked artifacts and open documentation that can be reliably reapplied across domains, enhancing long-term scientific trust and collaboration.
Power up your projects! Tap into our software, datasets, and detailed docs now:

Publications

Raul Sirvent, Rocio Carratala-Saez, Amal Gueroudji, Tanzima Islam, Line Pouchard, and Michela Taufer. Reproducibility for hpc and distributed environments: Committees, nondeterminism, performance and workflows. In Proceedings of the 3rd ACM Conference on Reproducibility and Replicability (ACM REP), Vancouver, Canada, July 2025. ACM.

@inproceedings{sirvent2025reproducibility,
author = {Raul Sirvent and Rocio Carratala-Saez and Amal Gueroudji and Tanzima Islam and Line Pouchard and Michela Taufer},
title = {Reproducibility for HPC and Distributed Environments: Committees, Nondeterminism, Performance and Workflows},
booktitle = {Proceedings of the 3rd ACM Conference on Reproducibility and Replicability (ACM REP)},
year = {2025},
address = {Vancouver, Canada},
month = {July},
publisher = {ACM}
URL = Missing
DOI = Missing
}

Nigel Tan, Kevin Assogba, Jay Asworth, Befikir Bogale, M. Mustafa Rafique, Franck Cappello, Michela Taufer, and Bogdan Nicolae. Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results. In Proceedings of the 25th ACM/IFIP International Middleware Conference (Middleware), Hong Kong, China, December Page 26 of 82 2024. ACM.

@inproceedings{tan2024affordable,
Author = {Nigel Tan and Kevin Assogba and Jay Asworth and Befikir Bogale and M. Mustafa Rafique and Franck Cappello and Michela Taufer and Bogdan Nicolae},
title = {Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results},
booktitle = {Proceedings of the 25th ACM/IFIP International Middleware Conference (Middleware)},
year = {2024},
address = {Hong Kong, China},
month = {December},
publisher = {ACM}
URL = Missing
DOI = Missing
}

Paula Olaya, Dominic Kennedy, Ricardo Llamas, Leobardo Valera, Rodrigo Vargas, Jay Lofstead, and Michela Taufer. Building Trust in Earth Science Findings through Data Traceability and Results Explainability. IEEE Trans. Parallel Distributed Syst. (TPDS), 34(2):704–717, 2023. 10.1109/TPDS.2022.3220539.

@article{olaya2023trust,
author = {Paula Olaya and Dominic Kennedy and Ricardo Llamas and Leobardo Valera and Rodrigo Vargas and Jay Lofstead and Michela Taufer},
title = {Building Trust in Earth Science Findings through Data Traceability and Results Explainability},
journal = {IEEE Transactions on Parallel and Distributed Systems (TPDS)},
volume = {34},
number = {2},
pages = {704--717},
year = {2023},
URL = Missing
doi = {10.1109/TPDS.2022.3220539}

Dominic Kennedy, Paula Olaya, Jay Lofstead, Rodrigo Vargas, and Michela Taufer. Augmenting Singularity to Generate Fine-grained Workflows, Record Trails, and Data Provenance. In Proceedings of the 18th IEEE International Conference on e-Science (eScience), pages 1–2, Salt Lake City, Utah, USA, October 2022. IEEE Computer Society. (Short paper).

@inproceedings{kennedy2022augmenting,
author = {Dominic Kennedy and Paula Olaya and Jay Lofstead and Rodrigo Vargas and Michela Taufer},
title = {Augmenting Singularity to Generate Fine-grained Workflows, Record Trails, and Data Provenance},
booktitle = {Proceedings of the 18th IEEE International Conference on e-Science (eScience)},
pages = {1--2},
year = {2022},
address = {Salt Lake City, Utah, USA},
month = {October},
publisher = {IEEE Computer Society},
URL = Missing
DOI = Missing
}

Michela Taufer, Ewa Deelman, Rafael Ferreira da Silva, Trilce Estrada, Mary Hall, and Miron Livny. A Roadmap to Robust Science for High-throughput Applications: The Developers’ Perspective. In Proceedings of the IEEE Cluster Conference (CLUSTER), pages 1–2, Portland, Oregon, September 2021. IEEE Computer Society. (Short paper)

@inproceedings{taufer2021robust_developers,
author = {Michela Taufer and Ewa Deelman and Rafael Ferreira da Silva and Trilce Estrada and Mary Hall and Miron Livny}
title = {A Roadmap to Robust Science for High-throughput Applications: The Developers’ Perspective},
booktitle = {Proceedings of the IEEE Cluster Conference (CLUSTER)},
pages = {1--2},
year = {2021},
address = {Portland, Oregon},
month = {September},
publisher = {IEEE Computer Society},
URL = Missing
DOI = Missing
}

Michela Taufer, Ewa Deelman, Rafael Ferreira da Silva, Trilce Estrada, and Mary Hall. Roadmap to Robust Science for High-throughput Applications: The Scientists’ Perspective. In Proceedings of the 20th IEEE International Conference on eScience, pages 1–2, Innsbruck, Austria, September 2021. IEEE Computer Society. (Short paper).

@inproceedings{taufer2021robust_scientists,
author = {Michela Taufer and Ewa Deelman and Rafael Ferreira da Silva and Trilce Estrada and Mary Hall},
title = {A Roadmap to Robust Science for High-throughput Applications: The Scientists’ Perspective},
booktitle = {Proceedings of the 20th IEEE International Conference on eScience},
pages = {1--2},
year = {2021},
address = {Innsbruck, Austria},
month = {September}
publisher = {IEEE Computer Society},
URL = Missing
DOI = Missing
}

Xin, Yufeng and Fu, Shih-Wen and Mandal, Anirban and Tanaka, Ryan and Rynge, Mats and Vahi, Karan and Deelman, Ewa. Data Integrity Error Localization in Networked Systems with Missing Data. In Proc. ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, 2022, pp. 341–346, doi: 10.1109/ICC45855.2022.9838996.

@inproceedings{xin-icc-2022,
author = {Xin, Yufeng and Fu, Shih-Wen and Mandal, Anirban and Tanaka, Ryan and Rynge, Mats and Vahi, Karan and Deelman, Ewa},
booktitle = {ICC 2022 - IEEE International Conference on Communications},
title = {Data Integrity Error Localization in Networked Systems with Missing Data},
year = {2022},
volume = {},
number = {},
pages = {341-346},
doi = {10.1109/ICC45855.2022.9838996},
note = {Funding Acknowledgments: NSF 1839900}
}

Burkat, Krzysztof and Pawlik, Maciej and Balis, Bartosz and Malawski, Maciej and Vahi, Karan and Rynge, Mats and da Silva, Rafael Ferreira and Deelman, Ewa. Serverless Containers – Rising Viable Approach to Scientific Workflows. In Proc. 2021 IEEE 17th International Conference on eScience (eScience), Innsbruck, Austria, 2021, pp. 40–49, doi: 10.1109/eScience51609.2021.00014.

@inproceedings{burkat-escience-2021,
author = {Burkat, Krzysztof and Pawlik, Maciej and Balis, Bartosz and Malawski, Maciej and Vahi, Karan and Rynge, Mats and da Silva, Rafael Ferreira and Deelman, Ewa},
booktitle = {2021 IEEE 17th International Conference on eScience (eScience)},
title = {Serverless Containers – Rising Viable Approach to Scientific Workflows},
year = {2021},
volume = {},
number = {},
pages = {40-49},
doi = {10.1109/eScience51609.2021.00014}
}

Xin, Yufeng and Fu, Shih-Wen and Mandal, Anirban and Baldin, Ilya and Tanaka, Ryan and Rynge, Mats and Vahi, Karan and Deelman, Ewa and Abhinit, Ishan and Von, Welch. Root Cause Analysis of Data Integrity Errors in Networked Systems with Incomplete Information. In Proc. 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea (South), 2021, pp. 735–740, doi: 10.1109/ICTC52510.2021.9621124.

@inproceedings{xin-ictc-2021,
author = {Xin, Yufeng and Fu, Shih-Wen and Mandal, Anirban and Baldin, Ilya and Tanaka, Ryan and Rynge, Mats and Vahi, Karan and Deelman, Ewa and Abhinit, Ishan and Von, Welch},
booktitle = {2021 International Conference on Information and Communication Technology Convergence (ICTC)},
title = {Root Cause Analysis of Data Integrity Errors in Networked Systems with Incomplete Information},
year = {2021},
volume = {},
number = {},
pages = {735-740},
doi = {10.1109/ICTC52510.2021.9621124}
}