Skip to content
This repository was archived by the owner on May 28, 2024. It is now read-only.

Add static river/catchment attribute data #165

Merged
merged 9 commits into from
Oct 18, 2022

Conversation

lekoenig
Copy link
Collaborator

This PR adds new river/catchment attributes to the file 1_fetch/in/target_sciencebase_attributes.csv so that these features get downloaded, processed, and added to a formatted table that contains the static input feature datasets we're interested in (p2_seg_attr_data).

The changes in this PR include the following steps:

  • changes to 1_fetch/in/target_sciencebase_attributes.csv to specify new attributes to collect. I gathered these based on my own hypotheses for features that might be important, but I expect we'll ultimately end up using only some portion of them. Please also feel free to suggest any other features that you think should be included ⭐
  • processing the downloaded attributes. Most of the datasets are minimally processed since they are already referenced to the NHDPlusv2 network. However, we do further process the land cover data by aggregating the NLCD index groups (e.g. 90 and 91) into reclassified groups (e.g. "wetland") based on the lookup table in 1_fetch/in/nlcd_landcover_reclassification.csv. This file is new and is committed here so that others will have it when cloning the repo.
  • Combining all of the individual attribute data frames into a single table in p2_seg_attr_data using the function combine_nhdv2_attr().
  • Summarizing the attribute values. As a quick method of QA/QC the pipeline now includes a target in 3_visualize that outputs a csv file containing summary statistics (min/mean/max) for all of the processed attributes. I created a new function, summarize_static_attributes() to create and save that table.

This PR is just focused on downloading and formatting the attribute data. We'll use some method of feature selection to finalize the input variables to the LSTM models in #164.

Closes #51

@lekoenig lekoenig requested a review from galengorski October 14, 2022 21:27
@galengorski
Copy link
Collaborator

Hey @lekoenig , this looks great to me. The only other attribute that I had thought of is a calculation of the amount of reservoir storage/mean flow at each site. I have seen this as a way to capture the degree of hydrologic alteration. I'm not sure if it makes sense for our sites as so many are on the same rivers and they will have very similar values. Otherwise I think it looks great.

Comment on lines +14 to +15
HYDROLOGY_DAMS,NID_STORAGEYYYY,"The maximum dam storage (in acre-feet) defined as the total storage space in all reservoirs in a flowline catchment below the maximum attainable water surface elevation, including any surcharge storage, of dams built on or before YYYY. Value is based on dams built on or before YYYY, where YYYY is the last year for the decade of record (for example 1960 spans 1951 - 1960). The exceptions are 1930 and 2013 (1930 and before and 2010 to 2010 respectively).",acre-feet,https://www.sciencebase.gov/catalog/item/58c301f2e4b0f37a93ed915a
HYDROLOGY_DAMS,NORM_STORAGEYYYY,"The normal dam storage (in acre-feet) defined as the total storage space in all reservoirs in a flowline catchment that is below the normal retention level, including dead and inactive storage and excluding any flood control or surcharge storage. Value is based on dams built on or before YYYY, where YYYY is the last year for the decade of record (for example 1960 spans 1951 - 1960). The exceptions are 1930 and 2013 (1930 and before and 2010 to 2010 respectively).",acre-feet,https://www.sciencebase.gov/catalog/item/58c301f2e4b0f37a93ed915a
Copy link
Collaborator Author

@lekoenig lekoenig Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other attribute that I had thought of is a calculation of the amount of reservoir storage/mean flow at each site. I have seen this as a way to capture the degree of hydrologic alteration.

Thanks, @galengorski - great suggestion! After our conversation last week I added these two variables from the larger Wieczorek dataset (NID_STORAGE and NORM_STORAGE). Are these the variables you were referring to? If not, do you have a link to alternative datasets that address reservoir storage?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that sounds great!

@lekoenig lekoenig merged commit 6f336ff into USGS-R:main Oct 18, 2022
@lekoenig lekoenig deleted the 51-add-static-features branch October 18, 2022 14:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Gather additional static input variables for baseline LSTM v2
3 participants