PODPAC: open-source Python software for enabling harmonized, plug-and-play processing of disparate earth observation data sets and seamless transition onto the serverless cloud by earth scientists

Ueckermann, Mattheus P; Bieszczad, Jerry; Entekhabi, Dara; Shapiro, Marc L; Callendar, David R; Sullivan, David; Milloy, Jeffrey

dc.contributor.author	Ueckermann, Mattheus P
dc.contributor.author	Bieszczad, Jerry
dc.contributor.author	Entekhabi, Dara
dc.contributor.author	Shapiro, Marc L
dc.contributor.author	Callendar, David R
dc.contributor.author	Sullivan, David
dc.contributor.author	Milloy, Jeffrey
dc.date.accessioned	2021-09-20T17:31:00Z
dc.date.available	2021-09-20T17:31:00Z
dc.date.issued	2020-08-28
dc.identifier.uri	https://hdl.handle.net/1721.1/131933
dc.description.abstract	Abstract In this paper, we present the Pipeline for Observational Data Processing, Analysis, and Collaboration (PODPAC) software. PODPAC is an open-source Python library designed to enable widespread exploitation of NASA earth science data by enabling multi-scale and multi-windowed access, exploration, and integration of available earth science datasets to support analysis and analytics; automatic accounting for geospatial data formats, projections, and resolutions; simplified implementation and parallelization of geospatial data processing routines; standardized sharing of data and algorithms; and seamless transition of algorithms and data products from local development to distributed, serverless processing on commercial cloud computing environments. We describe the key elements of PODPAC’s architecture, including Nodes for unified encapsulation of disparate scientific data sources; Algorithms for plug-and-play processing and harmonization of multiple data source Nodes; and Lambda functions for serverless execution and sharing of new data products via the cloud. We provide an overview of our open-source code implementation and testing process for development and deployment of PODPAC. We describe our interactive, JupyterLab-based end-user documentation including quick-start examples and detailed use case studies. We conclude with examples of PODPAC’s application to: encapsulate data sources available on Amazon Web Services (AWS) Open Data repository; harmonize processing of multiple earth science data sets for downscaling of NASA Soil Moisture Active Passive (SMAP) soil moisture data; and deploy a serverless SMAP-based drought monitoring application for use access from mobile devices. We postulate that PODPAC will also be an effective tool for wrangling and standardizing massive earth science data sets for use in model training for machine learning applications.	en_US
dc.publisher	Springer Berlin Heidelberg	en_US
dc.relation.isversionof	https://doi.org/10.1007/s12145-020-00506-0	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Springer Berlin Heidelberg	en_US
dc.title	PODPAC: open-source Python software for enabling harmonized, plug-and-play processing of disparate earth observation data sets and seamless transition onto the serverless cloud by earth scientists	en_US
dc.type	Article	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2020-11-18T04:25:01Z
dc.language.rfc3066	en
dc.rights.holder	Springer-Verlag GmbH Germany, part of Springer Nature
dspace.embargo.terms	Y
dspace.date.submission	2020-11-18T04:25:01Z
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 12145_2020_506_ReferencePDF.pdf
Size:: 1.744Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record