Accessing EOPF Sentinel Zarr using Rarr

Introduction

This notebook demonstrates how to retrieve remotely stored Zarr data using the Rarr package in R. We will explore how to read and visualize Zarr data (zarrays) and their metadata. For subsequent analysis and visualization we turn the zarray into stars objects which are more suitable for spatial data.

Tip

This notebook has a sibling, which demonstrates how to access the same data using the GDAL Zarr driver. You can find it here.

What we will learn

  • ✏️ How to edit URLs of Zarr stores to make them readable for GDAL
  • 🔎 Which read-functions and arguments to use in stars and terra
  • 🚧 Current limitations of these packages

Prerequisites

To follow this notebook, you will need to load a STAC item in Zarr format from the EOPF Zarr STAC Catalog. If you are new to STAC inside the R environment, we suggest to follow the Access the EOPF Zarr STAC API with R tutorial. The item we are using for this example mirrors the one used in the Sentinel-2 L1C MSI Zarr Product Exploration notebook which is hosted on EODC’s object storage and is accessible under the URL below.

zarr_url = "https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458.zarr"

Import packages

library(Rarr)      # reading; BiocManager::install("Rarr")
library(stars)     # spatial objects
library(dplyr)     # table operations
library(jsonlite)  # parsing metadata
library(mapview)   # interactive maps

Metadata exploration with Rarr

Important

In this notebook we show the current capabilities and limitations of Rarr in terms of reading Zarr array(s) and their metadata. Successful attempts are marked with a ✅, failed attempts with a ⛔.

Using the Rarr::zarr_overview function we can get meta data for all assets in the remote Zarr store as a table. Here list all arrays and their data type, compression, dimensions, etc.

zarr_content = zarr_overview(zarr_url, as_data_frame = T) |> 
  mutate(var = sub(zarr_url, "", path)) |> 
  select(var, everything(), -path) 

filter(zarr_content, grepl("reflectance", var))
                                  var nchunks data_type compressor          dim
1  /measurements/reflectance/r10m/b02      36    uint16      blosc 10980, 10980
2  /measurements/reflectance/r10m/b03      36    uint16      blosc 10980, 10980
3  /measurements/reflectance/r10m/b04      36    uint16      blosc 10980, 10980
4  /measurements/reflectance/r10m/b08      36    uint16      blosc 10980, 10980
5    /measurements/reflectance/r10m/x       1     int64      blosc        10980
6    /measurements/reflectance/r10m/y       1     int64      blosc        10980
7  /measurements/reflectance/r20m/b05      36    uint16      blosc   5490, 5490
8  /measurements/reflectance/r20m/b06      36    uint16      blosc   5490, 5490
9  /measurements/reflectance/r20m/b07      36    uint16      blosc   5490, 5490
10 /measurements/reflectance/r20m/b11      36    uint16      blosc   5490, 5490
11 /measurements/reflectance/r20m/b12      36    uint16      blosc   5490, 5490
12 /measurements/reflectance/r20m/b8a      36    uint16      blosc   5490, 5490
13   /measurements/reflectance/r20m/x       1     int64      blosc         5490
14   /measurements/reflectance/r20m/y       1     int64      blosc         5490
15 /measurements/reflectance/r60m/b01      36    uint16      blosc   1830, 1830
16 /measurements/reflectance/r60m/b09      36    uint16      blosc   1830, 1830
17 /measurements/reflectance/r60m/b10      36    uint16      blosc   1830, 1830
18   /measurements/reflectance/r60m/x       1     int64      blosc         1830
19   /measurements/reflectance/r60m/y       1     int64      blosc         1830
    chunk_dim
1  1830, 1830
2  1830, 1830
3  1830, 1830
4  1830, 1830
5       10980
6       10980
7    915, 915
8    915, 915
9    915, 915
10   915, 915
11   915, 915
12   915, 915
13       5490
14       5490
15   305, 305
16   305, 305
17   305, 305
18       1830
19       1830

We can see that all Sentinel-2 bands are present, but split into directories (or groups) based on spatial resolution (10, 20, 60 meters), which have different dimensions (number and size of pixels).

                                                   var nchunks  data_type
1                           /conditions/geometry/angle       1 unicode224
2                            /conditions/geometry/band       1  unicode96
3                        /conditions/geometry/detector       1      int64
4                 /conditions/geometry/mean_sun_angles       1    float64
5   /conditions/geometry/mean_viewing_incidence_angles       1    float64
6                      /conditions/geometry/sun_angles       1    float64
7        /conditions/geometry/viewing_incidence_angles       4    float64
8                               /conditions/geometry/x       1      int64
9                               /conditions/geometry/y       1      int64
10        /conditions/mask/detector_footprint/r10m/b02      36      uint8
11        /conditions/mask/detector_footprint/r10m/b03      36      uint8
12        /conditions/mask/detector_footprint/r10m/b04      36      uint8
13        /conditions/mask/detector_footprint/r10m/b08      36      uint8
14          /conditions/mask/detector_footprint/r10m/x       1      int64
15          /conditions/mask/detector_footprint/r10m/y       1      int64
16        /conditions/mask/detector_footprint/r20m/b05      36      uint8
17        /conditions/mask/detector_footprint/r20m/b06      36      uint8
18        /conditions/mask/detector_footprint/r20m/b07      36      uint8
19        /conditions/mask/detector_footprint/r20m/b11      36      uint8
20        /conditions/mask/detector_footprint/r20m/b12      36      uint8
21        /conditions/mask/detector_footprint/r20m/b8a      36      uint8
22          /conditions/mask/detector_footprint/r20m/x       1      int64
23          /conditions/mask/detector_footprint/r20m/y       1      int64
24        /conditions/mask/detector_footprint/r60m/b01      36      uint8
25        /conditions/mask/detector_footprint/r60m/b09      36      uint8
26        /conditions/mask/detector_footprint/r60m/b10      36      uint8
27          /conditions/mask/detector_footprint/r60m/x       1      int64
28          /conditions/mask/detector_footprint/r60m/y       1      int64
29        /conditions/mask/l1c_classification/r60m/b00      36      uint8
30          /conditions/mask/l1c_classification/r60m/x       1      int64
31          /conditions/mask/l1c_classification/r60m/y       1      int64
32                /conditions/meteorology/cams/aod1240       1    float32
33                 /conditions/meteorology/cams/aod469       1    float32
34                 /conditions/meteorology/cams/aod550       1    float32
35                 /conditions/meteorology/cams/aod670       1    float32
36                 /conditions/meteorology/cams/aod865       1    float32
37               /conditions/meteorology/cams/bcaod550       1    float32
38               /conditions/meteorology/cams/duaod550       1    float32
39          /conditions/meteorology/cams/isobaricInhPa       1    float64
40               /conditions/meteorology/cams/latitude       1    float64
41              /conditions/meteorology/cams/longitude       1    float64
42                 /conditions/meteorology/cams/number       1      int64
43               /conditions/meteorology/cams/omaod550       1    float32
44               /conditions/meteorology/cams/ssaod550       1    float32
45                   /conditions/meteorology/cams/step       1      int64
46               /conditions/meteorology/cams/suaod550       1    float32
47                /conditions/meteorology/cams/surface       1    float64
48                   /conditions/meteorology/cams/time       1      int64
49             /conditions/meteorology/cams/valid_time       1      int64
50                      /conditions/meteorology/cams/z       1    float32
51         /conditions/meteorology/ecmwf/isobaricInhPa       1    float64
52              /conditions/meteorology/ecmwf/latitude       1    float64
53             /conditions/meteorology/ecmwf/longitude       1    float64
54                   /conditions/meteorology/ecmwf/msl       1    float32
55                /conditions/meteorology/ecmwf/number       1      int64
56                     /conditions/meteorology/ecmwf/r       1    float32
57                  /conditions/meteorology/ecmwf/step       1      int64
58               /conditions/meteorology/ecmwf/surface       1    float64
59                  /conditions/meteorology/ecmwf/tco3       1    float32
60                  /conditions/meteorology/ecmwf/tcwv       1    float32
61                  /conditions/meteorology/ecmwf/time       1      int64
62                   /conditions/meteorology/ecmwf/u10       1    float32
63                   /conditions/meteorology/ecmwf/v10       1    float32
64            /conditions/meteorology/ecmwf/valid_time       1      int64
65                  /measurements/reflectance/r10m/b02      36     uint16
66                  /measurements/reflectance/r10m/b03      36     uint16
67                  /measurements/reflectance/r10m/b04      36     uint16
68                  /measurements/reflectance/r10m/b08      36     uint16
69                    /measurements/reflectance/r10m/x       1      int64
70                    /measurements/reflectance/r10m/y       1      int64
71                  /measurements/reflectance/r20m/b05      36     uint16
72                  /measurements/reflectance/r20m/b06      36     uint16
73                  /measurements/reflectance/r20m/b07      36     uint16
74                  /measurements/reflectance/r20m/b11      36     uint16
75                  /measurements/reflectance/r20m/b12      36     uint16
76                  /measurements/reflectance/r20m/b8a      36     uint16
77                    /measurements/reflectance/r20m/x       1      int64
78                    /measurements/reflectance/r20m/y       1      int64
79                  /measurements/reflectance/r60m/b01      36     uint16
80                  /measurements/reflectance/r60m/b09      36     uint16
81                  /measurements/reflectance/r60m/b10      36     uint16
82                    /measurements/reflectance/r60m/x       1      int64
83                    /measurements/reflectance/r60m/y       1      int64
84                    /quality/l1c_quicklook/r10m/band       1      int64
85                     /quality/l1c_quicklook/r10m/tci     108      uint8
86                       /quality/l1c_quicklook/r10m/x       1      int64
87                       /quality/l1c_quicklook/r10m/y       1      int64
88                              /quality/mask/r10m/b02      36      uint8
89                              /quality/mask/r10m/b03      36      uint8
90                              /quality/mask/r10m/b04      36      uint8
91                              /quality/mask/r10m/b08      36      uint8
92                                /quality/mask/r10m/x       1      int64
93                                /quality/mask/r10m/y       1      int64
94                              /quality/mask/r20m/b05      36      uint8
95                              /quality/mask/r20m/b06      36      uint8
96                              /quality/mask/r20m/b07      36      uint8
97                              /quality/mask/r20m/b11      36      uint8
98                              /quality/mask/r20m/b12      36      uint8
99                              /quality/mask/r20m/b8a      36      uint8
100                               /quality/mask/r20m/x       1      int64
101                               /quality/mask/r20m/y       1      int64
102                             /quality/mask/r60m/b01      36      uint8
103                             /quality/mask/r60m/b09      36      uint8
104                             /quality/mask/r60m/b10      36      uint8
105                               /quality/mask/r60m/x       1      int64
106                               /quality/mask/r60m/y       1      int64
    compressor          dim    chunk_dim
1        blosc            2            2
2        blosc           13           13
3        blosc            7            7
4        blosc            2            2
5        blosc        13, 2        13, 2
6        blosc    2, 23, 23    2, 23, 23
7        blosc 13, 7, 2.... 7, 4, 2,....
8        blosc           23           23
9        blosc           23           23
10       blosc 10980, 10980   1830, 1830
11       blosc 10980, 10980   1830, 1830
12       blosc 10980, 10980   1830, 1830
13       blosc 10980, 10980   1830, 1830
14       blosc        10980        10980
15       blosc        10980        10980
16       blosc   5490, 5490     915, 915
17       blosc   5490, 5490     915, 915
18       blosc   5490, 5490     915, 915
19       blosc   5490, 5490     915, 915
20       blosc   5490, 5490     915, 915
21       blosc   5490, 5490     915, 915
22       blosc         5490         5490
23       blosc         5490         5490
24       blosc   1830, 1830     305, 305
25       blosc   1830, 1830     305, 305
26       blosc   1830, 1830     305, 305
27       blosc         1830         1830
28       blosc         1830         1830
29       blosc   1830, 1830     305, 305
30       blosc         1830         1830
31       blosc         1830         1830
32       blosc         9, 9         9, 9
33       blosc         9, 9         9, 9
34       blosc         9, 9         9, 9
35       blosc         9, 9         9, 9
36       blosc         9, 9         9, 9
37       blosc         9, 9         9, 9
38       blosc         9, 9         9, 9
39        <NA>                          
40       blosc            9            9
41       blosc            9            9
42        <NA>                          
43       blosc         9, 9         9, 9
44       blosc         9, 9         9, 9
45        <NA>                          
46       blosc         9, 9         9, 9
47        <NA>                          
48        <NA>                          
49        <NA>                          
50       blosc         9, 9         9, 9
51        <NA>                          
52       blosc            9            9
53       blosc            9            9
54       blosc         9, 9         9, 9
55        <NA>                          
56       blosc         9, 9         9, 9
57        <NA>                          
58        <NA>                          
59       blosc         9, 9         9, 9
60       blosc         9, 9         9, 9
61        <NA>                          
62       blosc         9, 9         9, 9
63       blosc         9, 9         9, 9
64        <NA>                          
65       blosc 10980, 10980   1830, 1830
66       blosc 10980, 10980   1830, 1830
67       blosc 10980, 10980   1830, 1830
68       blosc 10980, 10980   1830, 1830
69       blosc        10980        10980
70       blosc        10980        10980
71       blosc   5490, 5490     915, 915
72       blosc   5490, 5490     915, 915
73       blosc   5490, 5490     915, 915
74       blosc   5490, 5490     915, 915
75       blosc   5490, 5490     915, 915
76       blosc   5490, 5490     915, 915
77       blosc         5490         5490
78       blosc         5490         5490
79       blosc   1830, 1830     305, 305
80       blosc   1830, 1830     305, 305
81       blosc   1830, 1830     305, 305
82       blosc         1830         1830
83       blosc         1830         1830
84       blosc            3            3
85       blosc 3, 10980.... 1, 1830,....
86       blosc        10980        10980
87       blosc        10980        10980
88       blosc 10980, 10980   1830, 1830
89       blosc 10980, 10980   1830, 1830
90       blosc 10980, 10980   1830, 1830
91       blosc 10980, 10980   1830, 1830
92       blosc        10980        10980
93       blosc        10980        10980
94       blosc   5490, 5490     915, 915
95       blosc   5490, 5490     915, 915
96       blosc   5490, 5490     915, 915
97       blosc   5490, 5490     915, 915
98       blosc   5490, 5490     915, 915
99       blosc   5490, 5490     915, 915
100      blosc         5490         5490
101      blosc         5490         5490
102      blosc   1830, 1830     305, 305
103      blosc   1830, 1830     305, 305
104      blosc   1830, 1830     305, 305
105      blosc         1830         1830
106      blosc         1830         1830

Now that we have explored the content, let’s get some related metadata. Rarr provides the read_zattrs function for this purpose, which looks for the .zattrs file inside the Zarr store.

⛔ Unfortunately we were not successful when applying it to our remote store.

read_zattrs(zarr_url)
Error in read_zattrs(zarr_url): The group or array does not contain attributes (.zattrs)
read_zattrs(file.path(zarr_url, ".zattrs"))
Error in read_zattrs(file.path(zarr_url, ".zattrs")): The group or array does not contain attributes (.zattrs)

✅ As a simple workaround we can read the JSON directly:

zattrs = read_json(file.path(zarr_url, ".zattrs"))
names(zattrs)
[1] "other_metadata"  "other_metadata4" "stac_discovery" 
                                                   var nchunks  data_type
1                           /conditions/geometry/angle       1 unicode224
2                            /conditions/geometry/band       1  unicode96
3                        /conditions/geometry/detector       1      int64
4                 /conditions/geometry/mean_sun_angles       1    float64
5   /conditions/geometry/mean_viewing_incidence_angles       1    float64
6                      /conditions/geometry/sun_angles       1    float64
7        /conditions/geometry/viewing_incidence_angles       4    float64
8                               /conditions/geometry/x       1      int64
9                               /conditions/geometry/y       1      int64
10        /conditions/mask/detector_footprint/r10m/b02      36      uint8
11        /conditions/mask/detector_footprint/r10m/b03      36      uint8
12        /conditions/mask/detector_footprint/r10m/b04      36      uint8
13        /conditions/mask/detector_footprint/r10m/b08      36      uint8
14          /conditions/mask/detector_footprint/r10m/x       1      int64
15          /conditions/mask/detector_footprint/r10m/y       1      int64
16        /conditions/mask/detector_footprint/r20m/b05      36      uint8
17        /conditions/mask/detector_footprint/r20m/b06      36      uint8
18        /conditions/mask/detector_footprint/r20m/b07      36      uint8
19        /conditions/mask/detector_footprint/r20m/b11      36      uint8
20        /conditions/mask/detector_footprint/r20m/b12      36      uint8
21        /conditions/mask/detector_footprint/r20m/b8a      36      uint8
22          /conditions/mask/detector_footprint/r20m/x       1      int64
23          /conditions/mask/detector_footprint/r20m/y       1      int64
24        /conditions/mask/detector_footprint/r60m/b01      36      uint8
25        /conditions/mask/detector_footprint/r60m/b09      36      uint8
26        /conditions/mask/detector_footprint/r60m/b10      36      uint8
27          /conditions/mask/detector_footprint/r60m/x       1      int64
28          /conditions/mask/detector_footprint/r60m/y       1      int64
29        /conditions/mask/l1c_classification/r60m/b00      36      uint8
30          /conditions/mask/l1c_classification/r60m/x       1      int64
31          /conditions/mask/l1c_classification/r60m/y       1      int64
32                /conditions/meteorology/cams/aod1240       1    float32
33                 /conditions/meteorology/cams/aod469       1    float32
34                 /conditions/meteorology/cams/aod550       1    float32
35                 /conditions/meteorology/cams/aod670       1    float32
36                 /conditions/meteorology/cams/aod865       1    float32
37               /conditions/meteorology/cams/bcaod550       1    float32
38               /conditions/meteorology/cams/duaod550       1    float32
39          /conditions/meteorology/cams/isobaricInhPa       1    float64
40               /conditions/meteorology/cams/latitude       1    float64
41              /conditions/meteorology/cams/longitude       1    float64
42                 /conditions/meteorology/cams/number       1      int64
43               /conditions/meteorology/cams/omaod550       1    float32
44               /conditions/meteorology/cams/ssaod550       1    float32
45                   /conditions/meteorology/cams/step       1      int64
46               /conditions/meteorology/cams/suaod550       1    float32
47                /conditions/meteorology/cams/surface       1    float64
48                   /conditions/meteorology/cams/time       1      int64
49             /conditions/meteorology/cams/valid_time       1      int64
50                      /conditions/meteorology/cams/z       1    float32
51         /conditions/meteorology/ecmwf/isobaricInhPa       1    float64
52              /conditions/meteorology/ecmwf/latitude       1    float64
53             /conditions/meteorology/ecmwf/longitude       1    float64
54                   /conditions/meteorology/ecmwf/msl       1    float32
55                /conditions/meteorology/ecmwf/number       1      int64
56                     /conditions/meteorology/ecmwf/r       1    float32
57                  /conditions/meteorology/ecmwf/step       1      int64
58               /conditions/meteorology/ecmwf/surface       1    float64
59                  /conditions/meteorology/ecmwf/tco3       1    float32
60                  /conditions/meteorology/ecmwf/tcwv       1    float32
61                  /conditions/meteorology/ecmwf/time       1      int64
62                   /conditions/meteorology/ecmwf/u10       1    float32
63                   /conditions/meteorology/ecmwf/v10       1    float32
64            /conditions/meteorology/ecmwf/valid_time       1      int64
65                  /measurements/reflectance/r10m/b02      36     uint16
66                  /measurements/reflectance/r10m/b03      36     uint16
67                  /measurements/reflectance/r10m/b04      36     uint16
68                  /measurements/reflectance/r10m/b08      36     uint16
69                    /measurements/reflectance/r10m/x       1      int64
70                    /measurements/reflectance/r10m/y       1      int64
71                  /measurements/reflectance/r20m/b05      36     uint16
72                  /measurements/reflectance/r20m/b06      36     uint16
73                  /measurements/reflectance/r20m/b07      36     uint16
74                  /measurements/reflectance/r20m/b11      36     uint16
75                  /measurements/reflectance/r20m/b12      36     uint16
76                  /measurements/reflectance/r20m/b8a      36     uint16
77                    /measurements/reflectance/r20m/x       1      int64
78                    /measurements/reflectance/r20m/y       1      int64
79                  /measurements/reflectance/r60m/b01      36     uint16
80                  /measurements/reflectance/r60m/b09      36     uint16
81                  /measurements/reflectance/r60m/b10      36     uint16
82                    /measurements/reflectance/r60m/x       1      int64
83                    /measurements/reflectance/r60m/y       1      int64
84                    /quality/l1c_quicklook/r10m/band       1      int64
85                     /quality/l1c_quicklook/r10m/tci     108      uint8
86                       /quality/l1c_quicklook/r10m/x       1      int64
87                       /quality/l1c_quicklook/r10m/y       1      int64
88                              /quality/mask/r10m/b02      36      uint8
89                              /quality/mask/r10m/b03      36      uint8
90                              /quality/mask/r10m/b04      36      uint8
91                              /quality/mask/r10m/b08      36      uint8
92                                /quality/mask/r10m/x       1      int64
93                                /quality/mask/r10m/y       1      int64
94                              /quality/mask/r20m/b05      36      uint8
95                              /quality/mask/r20m/b06      36      uint8
96                              /quality/mask/r20m/b07      36      uint8
97                              /quality/mask/r20m/b11      36      uint8
98                              /quality/mask/r20m/b12      36      uint8
99                              /quality/mask/r20m/b8a      36      uint8
100                               /quality/mask/r20m/x       1      int64
101                               /quality/mask/r20m/y       1      int64
102                             /quality/mask/r60m/b01      36      uint8
103                             /quality/mask/r60m/b09      36      uint8
104                             /quality/mask/r60m/b10      36      uint8
105                               /quality/mask/r60m/x       1      int64
106                               /quality/mask/r60m/y       1      int64
    compressor          dim    chunk_dim
1        blosc            2            2
2        blosc           13           13
3        blosc            7            7
4        blosc            2            2
5        blosc        13, 2        13, 2
6        blosc    2, 23, 23    2, 23, 23
7        blosc 13, 7, 2.... 7, 4, 2,....
8        blosc           23           23
9        blosc           23           23
10       blosc 10980, 10980   1830, 1830
11       blosc 10980, 10980   1830, 1830
12       blosc 10980, 10980   1830, 1830
13       blosc 10980, 10980   1830, 1830
14       blosc        10980        10980
15       blosc        10980        10980
16       blosc   5490, 5490     915, 915
17       blosc   5490, 5490     915, 915
18       blosc   5490, 5490     915, 915
19       blosc   5490, 5490     915, 915
20       blosc   5490, 5490     915, 915
21       blosc   5490, 5490     915, 915
22       blosc         5490         5490
23       blosc         5490         5490
24       blosc   1830, 1830     305, 305
25       blosc   1830, 1830     305, 305
26       blosc   1830, 1830     305, 305
27       blosc         1830         1830
28       blosc         1830         1830
29       blosc   1830, 1830     305, 305
30       blosc         1830         1830
31       blosc         1830         1830
32       blosc         9, 9         9, 9
33       blosc         9, 9         9, 9
34       blosc         9, 9         9, 9
35       blosc         9, 9         9, 9
36       blosc         9, 9         9, 9
37       blosc         9, 9         9, 9
38       blosc         9, 9         9, 9
39        <NA>                          
40       blosc            9            9
41       blosc            9            9
42        <NA>                          
43       blosc         9, 9         9, 9
44       blosc         9, 9         9, 9
45        <NA>                          
46       blosc         9, 9         9, 9
47        <NA>                          
48        <NA>                          
49        <NA>                          
50       blosc         9, 9         9, 9
51        <NA>                          
52       blosc            9            9
53       blosc            9            9
54       blosc         9, 9         9, 9
55        <NA>                          
56       blosc         9, 9         9, 9
57        <NA>                          
58        <NA>                          
59       blosc         9, 9         9, 9
60       blosc         9, 9         9, 9
61        <NA>                          
62       blosc         9, 9         9, 9
63       blosc         9, 9         9, 9
64        <NA>                          
65       blosc 10980, 10980   1830, 1830
66       blosc 10980, 10980   1830, 1830
67       blosc 10980, 10980   1830, 1830
68       blosc 10980, 10980   1830, 1830
69       blosc        10980        10980
70       blosc        10980        10980
71       blosc   5490, 5490     915, 915
72       blosc   5490, 5490     915, 915
73       blosc   5490, 5490     915, 915
74       blosc   5490, 5490     915, 915
75       blosc   5490, 5490     915, 915
76       blosc   5490, 5490     915, 915
77       blosc         5490         5490
78       blosc         5490         5490
79       blosc   1830, 1830     305, 305
80       blosc   1830, 1830     305, 305
81       blosc   1830, 1830     305, 305
82       blosc         1830         1830
83       blosc         1830         1830
84       blosc            3            3
85       blosc 3, 10980.... 1, 1830,....
86       blosc        10980        10980
87       blosc        10980        10980
88       blosc 10980, 10980   1830, 1830
89       blosc 10980, 10980   1830, 1830
90       blosc 10980, 10980   1830, 1830
91       blosc 10980, 10980   1830, 1830
92       blosc        10980        10980
93       blosc        10980        10980
94       blosc   5490, 5490     915, 915
95       blosc   5490, 5490     915, 915
96       blosc   5490, 5490     915, 915
97       blosc   5490, 5490     915, 915
98       blosc   5490, 5490     915, 915
99       blosc   5490, 5490     915, 915
100      blosc         5490         5490
101      blosc         5490         5490
102      blosc   1830, 1830     305, 305
103      blosc   1830, 1830     305, 305
104      blosc   1830, 1830     305, 305
105      blosc         1830         1830
106      blosc         1830         1830

The same information can be retrieved with sf::gdal_utils('mdiminfo', ...).

Reading Zarr data with Rarr

Reading zarrays from a remote Zarr store with Rarr and turning them into useful spatial R objects is not as straightforward as with GDAL-based stars and terra functions. We will demonstrate the current capabilities and limitations of Rarr by using a reference object retrieved through the GDAL Zarr driver which is loaded in the background.

# GDAL-based reference object
zz_gdal
stars object with 2 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
       Min. 1st Qu. Median      Mean 3rd Qu.   Max.
b01  0.1299  0.1542 0.1628 0.3541124  0.4304 1.6556
dimension(s):
  from   to offset delta                refsys x/y
x    1 1830  3e+05    60 WGS 84 / UTM zone 32N [x]
y    1 1830  5e+06   -60 WGS 84 / UTM zone 32N [y]

We start with an naive attempt to read the entire Zarr store using Rarr::read_zarr_array.

⛔ Rarr, like stars and terra, is not capable of reading multiple arrays from the remote Zarr store simultaneously.

read_zarr_array(zarr_url)
Error: 

✅ But if a single band array is selected we are able to access the desired information.

band_variable = "/measurements/reflectance/r60m/b01"

zarray = read_zarr_array(paste0(zarr_url, band_variable))
str(zarray)
 int [1:1830, 1:1830] 9548 8117 7158 5906 7007 7523 5999 6570 7165 6666 ...

Raster geometry

The function returns a matrix (2D array) lacking any spatial metadata (e.g. Coordinate Reference System) or map coordinates. Let’s plot it together with the reference object for comparison.

image(zarray)
image(zz_gdal$b01)

Rarr

GDAL
Caution

The raw zarray has flipped X- and Y-dimensions! But we can access X and Y coordinates for each pixel separately and handle the issue by transposing them later (using t()).

# retrieve X and Y coordinates
zarray_60m_x = paste0(zarr_url, "/measurements/reflectance/r60m/x") |> 
  read_zarr_array()
zarray_60m_y = paste0(zarr_url, "/measurements/reflectance/r60m/y") |> 
  read_zarr_array()

print(range(zarray_60m_x))
[1] 300030 409770
print(range(zarray_60m_y))
[1] 4890270 5000010

Regardless of their ordering, are the coordinate values consistent with those retrieved via GDAL?

print(range(st_get_dimension_values(zz_gdal, "x")))
[1] 300030 409770
print(range(st_get_dimension_values(zz_gdal, "y")))
[1] 4890270 5000010

✅ Yes!

Reflectance values

Before building the spatial object we should also check if the actual data values are consistent between both methods.

head(t(zarray)[1,])     # flipped zarray
[1] 9548 8117 7158 5906 7007 7523
head(zz_gdal$b01[1,])   # reference arrray
[1] 0.8548 0.7117 0.6158 0.4906 0.6007 0.6523
Caution

By comapring the first few values of each array we can observe 2 important differences:

  • zarray contains integers values while zz_gdal has scaled values (float) with factor 10000.
  • after rescaling, min and max for example still differ (0.1299 vs. 0.2299 / 1.6556 vs. 1.7556), indicating an offset of 0.1.

💡 Explanation: The data provider applies scale and offset to store more memory efficient integer values. A scale factor of 10000 preserves 4 decimals, while an offset of 0.1 avoids division by 0. When reading raw data like that we need to apply scale and offset manually.

This infrormation is important metadata, but currently not accessible via Rarr. At this point we can only find it by digging deep into the metadata retrieved by GDAL:

# GDAL-comaptible VSI URL
vsi_url = paste0("ZARR:/vsicurl/", dplyr::as_label(zarr_url)) 

# get entire metadata
info = sf::gdal_utils("mdiminfo", source = vsi_url, quiet = T) |> fromJSON()

# access scale and offset for 60 meter reflectance bands
(scale = info$groups$measurements$groups$reflectance$groups$r60m$arrays$b01$scale)
[1] 1e-04
(offset = info$groups$measurements$groups$reflectance$groups$r60m$arrays$b01$offset)
[1] -0.1

Constructing a stars object from zarray components

Let’s combine all pieces of the puzzle to form a spatial object: - data array - scale and offset - X and Y coordinates - CRS

# read CRS from Zarr attributes
zarr_crs = zattrs$stac_discovery$properties$`proj:epsg`

# band name
var = basename(band_variable)

# read and transpose data matrix 
# apply scale and offset
# convert to stars
zz = st_as_stars(t(zarray) * scale + offset) |>
  st_set_dimensions(1, names = "x", values = zarray_60m_x, is_raster = TRUE) |> 
  st_set_dimensions(2, names = "y", values = zarray_60m_y, is_raster = TRUE) |> 
  setNames(var)

# assign CRS
st_crs(zz) = st_crs(zarr_crs)

# print the object
zz
stars object with 2 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
       Min. 1st Qu. Median      Mean 3rd Qu.   Max.
b01  0.1299  0.1542 0.1628 0.3541124  0.4304 1.6556
dimension(s):
  from   to offset delta                refsys point x/y
x    1 1830  3e+05    60 WGS 84 / UTM zone 32N FALSE [x]
y    1 1830  5e+06   -60 WGS 84 / UTM zone 32N FALSE [y]

We can compare the newly constructed object with reference read via GDAL. They should have identical geometry and values.

identical(zz[[1]] |> str(),
          zz_gdal[[1]] |> str())
 num [1:1830, 1:1830] 0.855 0.753 0.7 0.756 0.718 ...
 num [1:1830, 1:1830] 0.855 0.753 0.7 0.756 0.718 ...
[1] TRUE

Looks good! Let’s visualize both objects.

# plot as maps
plot(zz, axes = TRUE)

Rarr
plot(zz_gdal, axes = TRUE)

GDAL

The identical maps show Sentinel-2 band 1 (coastal aerosol band) reflectance at 60m spatial resolution for north-western Italy with the Alps in the West and the city of Torino in the North-East.

Wrapping it all up in a function

We can enclose these steps in a single function with some flexibility for variable names and resolutions. This allows us to retrieve any band at any available resolution from the remote Zarr store. Since scale and offset are not accessible via Rarr, we need to provide them as function arguments.

st_read_zarray = function(path, var, res, scale = 1, offset = 0, ...){
  
  # room for checks and input tests:
  # stopifnot valid zarr url / variable / resolution ...

  # get metadata including CRS
  zattrs = jsonlite::read_json(file.path(path, ".zattrs"))
  zarr_crs = zattrs$stac_discovery$properties$`proj:epsg`
  
  # zattrs$stac_discovery$properties$`eopf:resolutions`
  res_char = switch(as.character(res),
    "10" = "r10m",
    "20" = "r20m", 
    "60" = "r60m",
    stop("Resolution must be one of 10, 20, or 60.")
  )

  # ...: make use of index arguments of read_zarr_array if desired
  zarray = file.path(path, "measurements/reflectance", res_char, var) |> 
    read_zarr_array(...)
  zarray_x = file.path(path, "measurements/reflectance", res_char, "x") |> 
    read_zarr_array(...)
  zarray_y = file.path(path, "measurements/reflectance", res_char, "y") |> 
    read_zarr_array(...)
    
  # transpose matrix before converting to stars
  z = st_as_stars(t(zarray) * scale + offset) |>
  st_set_dimensions(1, names = "x", values = zarray_60m_x, is_raster = TRUE) |> 
  st_set_dimensions(2, names = "y", values = zarray_60m_y, is_raster = TRUE) |> 
    setNames(var)
  st_crs(z) = st_crs(zarr_crs)
  return(z)
}

Read and combine multiple zarrays

✅ Now we can read multiple bands, combine them to a single multi-band object, and visualize them in a static map…

system.time({
  b01 = st_read_zarray(zarr_url, "b01", 60, scale=0.0001, offset = 0.1)
  b09 = st_read_zarray(zarr_url, "b09", 60, scale=0.0001, offset = 0.1)
  multi_band = c(b01, b09) |> merge(name="band") |> setNames("reflectance")
})
   user  system elapsed 
  4.042   0.066  20.762 
plot(multi_band, axes = TRUE)
downsample set to 1

… or by generating an interactive visualization:

mapview(multi_band)

Benchmark

Using Rarr to read remote Zarr stores is just one of several options. R packages stars and terra provide functionality, that uses GDAl’s Zarr driver in the background. To see if there is a benefit to each of the methods, we can compare their execution time and memory allocation with a little benchmark.

# GDAL-compatible VSI URL
vsi_prefix = "ZARR:/vsicurl/"
vsi_url = paste0(vsi_prefix, dplyr::as_label(zarr_url)) 

# functions for reading a single Sentinel-2 band to memory
f_stars = function(){rs = read_stars(paste(vsi_url, band_variable, sep = ":"))}
f_mdim = function(){sm = read_mdim(vsi_url, band_variable, proxy = FALSE) |> 
  setNames(basename(band_variable))}
f_rast = function(){tr = terra::rast(vsi_url, band_variable) |> terra::toMemory()}
f_rarr = function(){rr = st_read_zarray(zarr_url, "b01", 60)}
bench::mark(f_stars(), f_mdim(), f_rast(), f_rarr(), 
  check = F, iterations = 5)
# A tibble: 4 × 6
  expression      min   median `itr/sec` mem_alloc `gc/sec`
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
1 f_stars()      8.2s    8.38s    0.120     25.6MB   0     
2 f_mdim()      7.64s    7.79s    0.124     26.3MB   0.0247
3 f_rast()      8.26s    8.47s    0.119     83.2KB   0     
4 f_rarr()      11.1s   11.54s    0.0869   148.5MB   0.0869

The benchmark does not reveal a clear winner in terms of speed. However, Rarr seems not to be faster than GDAL but allocates more memory compared to stars and terra.

💪 Now it is your turn

  • 🔭 Task: Supply more arguments to st_read_zarray and pass a subset to read only parts on an array. Visit ?Rarr::read_zarr_array for more details.
  • 💾 Task: Try writing zarrays to disk using Rarr::write_zarr_array().
  • 📖 Read the Rarr documentation.

Conclusion

In this notebook we have demonstrated how to access remote Zarr stores in R using the Rarr package. We have seen that it is possible to read specific data arrays, but read_zarr_array lacks the ability of accessing anything other than the raw array. To combine it with a CRS or coordinates we need to leverage a spatial class like stars and manually set the relevant metadata. Further, the benchmark did not reveal a significant benefit regarding processing time or memory allocation over “traditional” GDAL-based data retrieval.

What’s next?

Further tests should evaluate wether data retrieved via Rarr versus GDAL are identical in terms of spatial extent and offset.

Explore this related notebook which demonstrates how to access the same Zarr data using the GDAL Zarr driver via the stars and terra packages.