An Exception was encountered at ‘In [11]’.
ROF global monthly, annual, seasonal flows analysis #
Use the following datasets
reach-D19 gauge link ascii
D19 flow site geopackage
D19 discharge netCDF
history netCD including river discharge
Read monthly history files from archive.
Reference data: monthly discharge estimates at 922 big river mouths from Dai et al. 2019 data (D19)
Plotting time seriese (annual, seasonal cycle).
if D19 referece data is available, scatter plots at Large 24 selected rivers against D19 referece data
Annual flow summary table at large 50 selected rivers.
if D19 referece data is available, Scatter plot of annual flow against D19 reference data.
run only if reference flow is available
error statistics (%bias, rmse, correlation) at all 922 river sites.
plot error statistic on the global map
plot boxplots including case(s) for each error metric
The Python version: 3.11.4
xarray 2025.4.0
pandas 2.2.3
geopandas 1.0.1
ERROR 1: PROJ: proj_create_from_database: Open of /glade/work/hannay/miniconda3/envs/cupid-analysis/share/proj failed
1. Setup #
# Parameter Defaults
# parameters are set in CUPiD's config.yml file
# when running interactively, manually set the parameters below
# global parameters
CESM_output_dir = "" # e.g., "/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CESM_output_for_testing"
case_name = None # case name: e.g., "b.e30_beta02.BLT1850.ne30_t232.104"
base_case_name = None # base case name: e.g., "b.e23_alpha17f.BLT1850.ne30_t232.092"
start_date = "" # simulation starting date: e.g., "0001-01-01"
end_date = "" # simulation ending date: "0100-01-01"
base_start_date = "" # base simulation starting date: "0001-01-01"
base_end_date = "" # base simulation ending date: "0100-01-01"
serial = True # use dask LocalCluster
lc_kwargs = {}
# rof parameters
analysis_name = "" # Used for Figure png names
climo_nyears = 10 # number of years to compute the climatology
grid_name = "f09_f09_mosart" # ROF grid name used in case
base_grid_name = (
grid_name # spcify ROF grid name for base_case in config.yml if different than case
)
figureSave = False
# Parameters
case_name = "b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192"
base_case_name = "b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.188"
CESM_output_dir = "/glade/derecho/scratch/hannay/archive"
base_case_output_dir = "/glade/derecho/scratch/gmarques/archive"
start_date = "0002-01-01"
end_date = "0021-12-01"
base_start_date = "0002-01-01"
base_end_date = "0021-12-01"
obs_data_dir = (
"/glade/campaign/cesm/development/cross-wg/diagnostic_framework/CUPiD_obs_data"
)
ts_dir = None
lc_kwargs = {"threads_per_worker": 1}
serial = True
analysis_name = ""
grid_name = "f09_f09_mosart"
climo_nyears = 10
figureSave = False
subset_kwargs = {}
product = "/glade/work/hannay/CUPiD/examples/key_metrics/computed_notebooks//rof/global_discharge_gauge_compare_obs.ipynb"
# ROF additional setup
setup = load_yaml("./setup/setup.yaml")
ancillary_dir = setup[
"ancillary_dir"
] # ancillary directory including ROF domain, river network data etc.
ref_flow_dir = setup["ref_flow_dir"] # including observed or reference flow data
case_meta = setup["case_meta"] # Case metadata
reach_gpkg = setup["reach_gpkg"] # river reach geopackage meta
if analysis_name:
analysis_name = case_name
if base_grid_name:
base_grid_name = grid_name
case_dic = {
case_name: {
"grid": grid_name,
"sim_period": slice(f"{start_date}", f"{end_date}"),
"climo_nyrs": min(climo_nyears, int(end_date[:4]) - int(start_date[:4]) + 1),
},
base_case_name: {
"grid": grid_name,
"sim_period": slice(f"{base_start_date}", f"{base_end_date}"),
"climo_nyrs": min(
climo_nyears, int(base_end_date[:4]) - int(base_start_date[:4]) + 1
),
},
}
Dasks#
2. Loading data #
2.1. Monthly/annual flow netCDFs#
month_data (xr dataset) - read from archive
year_data (xr dataset) - computed from monthly
seas_data (xr dataset) - computed from monthly
Finished loading b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
File <timed exec>:28
File /glade/work/hannay/miniconda3/envs/cupid-analysis/lib/python3.11/site-packages/xarray/backends/api.py:1597, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
1594 paths = _find_absolute_paths(paths, engine=engine, **kwargs)
1596 if not paths:
-> 1597 raise OSError("no files to open")
1599 paths1d: list[str | ReadBuffer]
1600 if combine == "nested":
OSError: no files to open
2.2 Large river ID and name ascii#
big_river_50: dictionary {site_id:river name}
big_river_24: dictionary {site_id:river name}
2.3. reach-D19 gauge link csv#
gauge_reach_lnk (dataframe)
2.4 D19 flow site shapefile#
gauge_shp (dataframe)
CPU times: user 12.6 ms, sys: 10.7 ms, total: 23.3 ms
Wall time: 575 ms
2.5 D19 discharge data#
ds_q_obs_mon (xr datasets)
ds_q_obs_yr (xr datasets)
dr_q_obs_seasonal (xr datasets)
CPU times: user 141 ms, sys: 11.5 ms, total: 153 ms
Wall time: 262 ms
<timed exec>:6: DeprecationWarning: cftime_range() is deprecated, please use xarray.date_range(..., use_cftime=True) instead.
2.6 Get indices in observation and simulation for gauge name (processing)#
gauge_plot (dictionary)
Execution using papermill encountered an exception here and stopped:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[11], line 15
8 gauge_id = ds_q.id.values[gauge_ix][0] ## guage ID
9 seg_id = (
10 gauge_reach_lnk[case]
11 .loc[gauge_reach_lnk[case]["gauge_id"] == gauge_id]["route_id"]
12 .values
13 ) # matching reach ID in river network
14 seg_ix = np.argwhere(
---> 15 reachID[case] == seg_id
16 ) # matching reach index in river network
17 if len(seg_ix) == 0:
18 seg_ix = -999
KeyError: 'b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.188'
3. Analysis for 24 large rivers #
3.1 Annual flow series#
3.2. Annual cycle at monthly step#
3.3. scatter plots of monthly flow - obs vs sim#
3.4. scatter plots of annual flow#
4. Anaysis for Large 50 rivers #
4.1 Summary tables#
4.2. scatter plot of annual mean flow#
5. Anaysis for all 922 sites #
5.1 Compute metris at all the sites (no plots nor tables)#
5.2. Spatial metric map#
5.4 Boxplots of Error metrics (RMSE, %bias, and correlation coefficient)#
Boxplot distribution is based on metrics sampled at 922 sites.
The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 - Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.