SatCat: Cloud-Native Archives of Convenience for Remote Sensing Workloads
Andrew Pawloski
May 6th, 2019
Monolithic data access techniques are inefficient.
How can users more effectively use remotely-stored datasets?
Cloud Optimized GeoTIFFs (COGs)
- Regular GeoTIFFs
- Tiled
- Support HTTP GET Range Requests
- End users download subset range of the GeoTIFF
Image credit: James Norton (Element 84)
Image credit: James Norton (Element 84)
Zarr
- Multi-dimensional arrays saved in discrete chunks
- Each chunk is a file
- Clients can pull only the chunks they need
There are context-specific advantages to interacting with data in particular formats.
How can we utilize existing Earth observation datasets in these cloud-native formats?
SatCat
- Corpus of GOES-16 Full Disk product imagery (Level 2)
- 20TB of raw data
- On-demand AoC creation from original netCDF
- Possible with S3 (thanks NOAA!) and spot EC2 instances