-
Notifications
You must be signed in to change notification settings - Fork 4
198 add val data to training revised #208
198 add val data to training revised #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jsadler2! I have a couple pretty minor questions just to confirm some of the assignments. I feel comfortable merging this PR whenever you are ready.
@@ -1,2 +1,2 @@ | |||
n_unique_siteids,n_unique_latlon,n_all_WQP_sites,n_all_nwis_daily_sites,n_all_nwis_inst_sites,n_unique_nwis_sites,n_sites_intersect_WQPNWIS,n_obsdays_total,n_obsdays_nwis,n_obsdays_discrete | |||
1987,1936,1980,24,10,27,20,92403,56868,35535 | |||
1987,1936,1980,24,10,27,20,91961,56426,35535 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1,2 +1 @@ | |||
tar_name,hash | |||
p2_wqp_data_subset,56f5edeb7d42b18b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -212,6 +211,7 @@ p2a_targets_list <- list( | |||
# we need these to make the prepped data file, so force a dependency of this | |||
# target on p2a_well_obs_data. | |||
p2a_well_obs_data | |||
p2a_well_obs_data_zarr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to include, although I'm curious why you added p2a_well_obs_data_zarr
in addition to the dependency on p2a_well_obs_data
. Was there a change that didn't get picked up by p2a_well_obs_data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why, but the Zarr file was, for some reason, not being rebuilt as expected before I added it here.
I'm 94% sure this is reflecting a lack of understanding on my part, but I'm thinking it's not worth the time at this point to try to really figure out this behavior.
@@ -124,8 +120,7 @@ rule make_predictions: | |||
def filter_predictions(all_preds_file, partition, out_file): | |||
df_preds = pd.read_feather(all_preds_file) | |||
all_sites = df_preds.site_id.unique() | |||
trn_sites = all_sites[(~np.isin(all_sites, config["validation_sites"])) & | |||
(~np.isin(all_sites, config["test_sites"]))] | |||
trn_sites = all_sites[~np.isin(all_sites, config["validation_sites"])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind confirming what this line in the .smk
file is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsadler2 confirmed that the ~
refers to the inverse so what this line is doing is subsetting all_sites
for the sites that are not within the validation_sites
vector from the config file.
- "01475530" | ||
- "01475548" | ||
train_start_date: '1980-10-01' | ||
train_end_date: '2015-10-01' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like from p2a_site_splits
that records on 2015-10-01
are treated as validation data, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
** This PR supercedes #202. Updated version of that after our discussion on our 3/1 meeting. **
I moved the old validation data into the training set and I moved the old "test" set into the validation set.
This also starts the training on the water year (#204) and excludes any provisional data (#135).
closes #198
closes #204
closes #159
closes #166