-
Notifications
You must be signed in to change notification settings - Fork 4
Use provisional NWIS data? #135
Comments
There are 304 unique days in our dataset with DO records marked as "provisional:" 01473500: 55 days (10/7/2021 - 11/30/2021) As of 11/2/2022, the time series underlying 64 of those records have now been approved but 240 are still listed as "provisional." Since it's been nearly 12 months since most of those records were collected, I've contacted the PA WSC requesting more information. |
I've been checking in on these data periodically, here's an update as of 3/1/2023: 01473500: 55 days (10/7/2021 - 11/30/2021) # 1) 01473500
dat_01473500 <- dataRetrieval::readNWISuv(siteNumber = "01473500", parameterCd = "00300", startDate = "2021-10-07", endDate = "2021-11-30")
unique(dat_01473500$X_00300_00000_cd)
#> [1] "A" "P"
dat_01473500 %>% group_by(X_00300_00000_cd) %>% summarize(n = n())
#> # A tibble: 2 x 2
#> X_00300_00000_cd n
#> <chr> <int>
#> 1 A 44
#> 2 P 5232
# 2) 014755300
dat_01475530 <- dataRetrieval::readNWISuv(siteNumber = "01475530", parameterCd = "00300", startDate = "2021-09-05", endDate = "2021-12-05")
unique(dat_01475530$X_00300_00000_cd)
#> [1] "A"
# 3) 01475548
dat_01475548 <- dataRetrieval::readNWISuv(siteNumber = "01475548", parameterCd = "00300", startDate = "2021-08-25", endDate = "2021-12-31")
unique(dat_01475548$X_00300_00000_cd)
#> [1] "A"
# 4) 01480617
dat_01480617 <- dataRetrieval::readNWISuv(siteNumber = "01480617", parameterCd = "00300", startDate = "2021-10-05", endDate = "2021-12-05")
unique(dat_01480617$X_00300_00000_cd)
#> [1] "A"
#5) 01481500
dat_01481500 <- dataRetrieval::readNWISuv(siteNumber = "01481500", parameterCd = "00300", startDate = "2021-12-05", endDate = "2021-12-31")
unique(dat_01481500$X_00300_00000_cd)
#> [1] "A"
So it looks like 01473500 is the only data that still hasn't been approved. |
Inventory provisional data after re-pulling the data on 3/1/2023 (using library(tidyverse)
targets::tar_load(p1_daily_data)
p1_daily_data %>% group_by(Value_cd) %>% summarize(n = n())
#> # A tibble: 4 x 2
#> Value_cd n
#> <chr> <int>
#> 1 A 55552
#> 2 A e 1
#> 3 P 55
#> 4 NA 146
# Value_Max_cd and Value_Min_cd have the same output as below, so only showing Value_cd (mean)
filter(p1_daily_data, Value_cd == "P") %>% group_by(site_no) %>% summarize(n = n())
#> # A tibble: 1 x 2
#> site_no n
#> <chr> <int>
#> 1 01473500 55
# now check instantaneous data
tar_load(p1_inst_data)
p1_inst_data %>% group_by(Value_Inst_cd) %>% summarize(n = n())
#> # A tibble: 6 x 2
#> Value_Inst_cd n
#> <chr> <int>
#> 1 A 1863536
#> 2 A e 1
#> 3 P 5271
#> 4 P *** 9
#> 5 P Ssn 2
#> 6 NA 2
# 01473500 is the biggest source of provisional codes
filter(p1_inst_data, Value_Inst_cd %in% c("P", "P ***", "P Ssn")) %>%
group_by(site_no, Value_Inst_cd) %>%
summarize(n = n(), .groups = 'drop')
#> # A tibble: 5 x 3
#> site_no Value_Inst_cd n
#> <chr> <chr> <int>
#> 1 01473500 P 5270
#> 2 01473500 P Ssn 1
#> 3 01475548 P *** 9
#> 4 01481000 P 1
#> 5 01481000 P Ssn 1
filter(p1_inst_data, site_no == "01473500", Value_Inst_cd %in% c("P", "P ***", "P Ssn")) %>%
mutate(date = as.Date(dateTime)) %>%
pull(date) %>%
range()
#> [1] "2021-10-07" "2021-12-01"
# How many observation-days do we have now across mean/min/max (in other words, how
# many days have at least one non-NA value for mean-DO, max-DO, or min-DO)?
tar_load(p2_daily_combined)
dim(p2_daily_combined)
#> [1] 56830 10 |
Inventory provisional data after re-pulling the data on 3/1/2023 (updating NWIS pull date to match proposed validation time period library(tidyverse)
targets::tar_load(p1_daily_data)
p1_daily_data %>% group_by(Value_cd) %>% summarize(n = n())
#> # A tibble: 5 x 2
#> Value_cd n
#> <chr> <int>
#> 1 A 55943
#> 2 A e 1
#> 3 P 1005
#> 4 P *** 2
#> 5 NA 146
# Value_Max_cd and Value_Min_cd have the same output as below, so only showing Value_cd (mean)
filter(p1_daily_data, Value_cd == "P") %>% group_by(site_no) %>% summarize(n = n())
#> # A tibble: 6 x 2
#> site_no n
#> <chr> <int>
#> 1 01473500 270
#> 2 01475530 204
#> 3 01480617 58
#> 4 01480870 214
#> 5 01481000 214
#> 6 01481500 45
# Here are the date ranges for those provisional daily data
filter(p1_daily_data, Value_Min_cd == "P") %>%
group_by(site_no) %>%
summarize(min_date = min(Date), max_date = max(Date))
#> # A tibble: 6 x 3
#> site_no min_date max_date
#> <chr> <date> <date>
#> 1 01473500 2021-10-07 2022-10-01
#> 2 01475530 2022-03-09 2022-10-01
#> 3 01480617 2022-08-05 2022-10-01
#> 4 01480870 2022-02-26 2022-10-01
#> 5 01481000 2022-03-02 2022-10-01
#> 6 01481500 2022-08-18 2022-10-01
# now check instantaneous data
tar_load(p1_inst_data)
p1_inst_data %>% group_by(Value_Inst_cd) %>% summarize(n = n())
#> # A tibble: 6 x 2
#> Value_Inst_cd n
#> <chr> <int>
#> 1 A 1901164
#> 2 A e 1
#> 3 P 116403
#> 4 P *** 94
#> 5 P Ssn 5
#> 6 NA 2
# some sites have a decent amount of provisional data
filter(p1_inst_data, Value_Inst_cd %in% c("P", "P ***", "P Ssn")) %>%
group_by(site_no, Value_Inst_cd) %>%
summarize(n = n(), .groups = 'drop')
#> # A tibble: 13 x 3
#> site_no Value_Inst_cd n
#> <chr> <chr> <int>
#> 1 01473500 P 25947
#> 2 01473500 P Ssn 2
#> 3 01475530 P 19742
#> 4 01475530 P Ssn 1
#> 5 01475548 P 19521
#> 6 01475548 P *** 9
#> 7 01475548 P Ssn 1
#> 8 01480617 P 5536
#> 9 01480870 P 20819
#> 10 01480870 P *** 85
#> 11 01481000 P 20561
#> 12 01481000 P Ssn 1
#> 13 01481500 P 4277
# when does the provisional instantaneous data start and stop?
filter(p1_daily_data, Value_Min_cd %in% c("P", "P ***", "P Ssn")) %>%
group_by(site_no) %>%
summarize(min_date = min(Date), max_date = max(Date))
#> # A tibble: 6 x 3
#> site_no min_date max_date
#> <chr> <date> <date>
#> 1 01473500 2021-10-07 2022-10-01
#> 2 01475530 2022-03-09 2022-10-01
#> 3 01480617 2022-08-05 2022-10-01
#> 4 01480870 2022-02-26 2022-10-01
#> 5 01481000 2022-03-02 2022-10-01
#> 6 01481500 2022-08-18 2022-10-01
# How many observation-days do we have now across mean/min/max (in other words, how
# many days have at least one non-NA value for mean-DO, max-DO, or min-DO)?
dim(p2_daily_combined)
#> [1] 58379 10
|
Thanks for checking on all of that, @lekoenig. Good to know the extent of the provisional data. That said, I'm not sure it will affect us too much. In the lastest runs, I set the end of the test set to Sorry for not looking at those dates earlier! I guess knowing that we were stopping at |
Oh, OK thanks for confirming those dates! I hadn't realized until yesterday that the NWIS You're right, if we stick with |
From @lekoenig in #134:
Should we omit provisional data from the model input files?
The text was updated successfully, but these errors were encountered: