Overview
geefetch ships with 19 built-in datasets, but Google Earth Engine
hosts thousands of collections. You can register any GEE collection for
use with read_gee() and collect_gee_data()
using gee_register_dataset().
Browsing built-in datasets
library(geefetch)
# Full catalogue
gee_datasets()
# Filter by domain
gee_datasets(domain = "Vegetation")
gee_datasets(domain = "Soil (Global)")Registering a custom dataset
To add a new GEE collection, you need:
- The GEE collection ID (find it in the GEE Data Catalog)
- The band name(s) to extract
- The spatial resolution (scale in metres)
- The temporal resolution (
"daily","8day","16day","monthly","5day", or"static")
Example: Global Surface Water
gee_register_dataset(
name = "gsw_occurrence",
collection = "JRC/GSW1_4/GlobalSurfaceWater",
bands = "occurrence",
scale = 30L,
temporal = "static",
description = "JRC Global Surface Water Occurrence 30m",
domain = "Hydrology",
citation = "Pekel et al. (2016). doi:10.1038/nature20584"
)## v Registered custom dataset "gsw_occurrence".
## i Collection: "JRC/GSW1_4/GlobalSurfaceWater"
## i Available via read_gee("gsw_occurrence") and collect_gee_data().
Example: MODIS Land Cover
gee_register_dataset(
name = "modis_lc",
collection = "MODIS/061/MCD12Q1",
bands = "LC_Type1",
scale = 500L,
temporal = "static",
description = "MODIS Land Cover Type 1 (IGBP) 500m",
domain = "Land cover",
scale_factor = 1,
offset = 0
)Using registered datasets
Once registered, the dataset works with the dispatcher and batch extraction exactly like built-in datasets:
What the generic handler does (and doesn’t do)
Custom datasets use the generic handler, which:
- Loads the collection
- Filters by date (for time-series datasets)
- Selects the specified bands
- Applies
scale_factorandoffset - Extracts raster or point values
The generic handler does not:
- Apply QA masking (no
qa_bandprocessing) - Compute derived indices (e.g., NDVI from two bands)
- Handle complex multi-band logic
If you need QA masking or computed indices, consider opening a GitHub issue to request a specialised handler.
Session scope
Registered datasets persist for the current R session only. To use
them across sessions, add the gee_register_dataset() call
to your script or .Rprofile.
Contributing handlers upstream
If your dataset is widely useful, consider contributing a built-in handler to geefetch:
- Fork the repository
- Add metadata to
.GEE_METAinR/handler_registry.R - Add an alias to
.GEE_ALIASES - Write a handler in
R/handlers.R(or rely on the generic handler) - Write a convenience alias in
R/read_*.R - Add tests in
tests/testthat/ - Open a pull request