Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access YAML annotations #881

Closed
wants to merge 9 commits into from
Closed

Conversation

ischoegl
Copy link
Member

@ischoegl ischoegl commented Jun 26, 2020

Checklist

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage
  • The pull request is ready for review

If applicable, fill in the issue number this pull request is fixing

Fixes #1013

Addresses annotations discussed in Cantera/enhancements#52

Changes proposed in this pull request

  • preserve top level description field
  • preserve note fields for Species, SpeciesThermo, Reaction, TransportData, and ThermoPhase
  • fix an issue where yaml-cpp does not recognize strings containing integers contained in quotes, e.g. '1234' … already merged

Thoughts: description and note fields are part of the canonical YAML implementation (see ck2yaml); the explicit inclusion ensures that Sphinx docstrings are generated.

Some C++ classes already have a AnyMap input attribute, which may be used instead of a string m_note as envisioned in the initial draft.

@ischoegl ischoegl marked this pull request as draft June 26, 2020 23:20
@codecov
Copy link

codecov bot commented Jun 27, 2020

Codecov Report

Merging #881 (922760d) into main (81cffde) will increase coverage by 0.03%.
The diff coverage is 96.55%.

❗ Current head 922760d differs from pull request most recent head 30ea112. Consider uploading reports for the commit 30ea112 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #881      +/-   ##
==========================================
+ Coverage   72.39%   72.43%   +0.03%     
==========================================
  Files         364      364              
  Lines       46433    46489      +56     
==========================================
+ Hits        33616    33672      +56     
  Misses      12817    12817              
Impacted Files Coverage Δ
include/cantera/base/Solution.h 100.00% <ø> (ø)
include/cantera/kinetics/Reaction.h 100.00% <ø> (ø)
include/cantera/thermo/SpeciesThermoInterpType.h 90.00% <ø> (ø)
include/cantera/thermo/ThermoPhase.h 31.16% <ø> (ø)
include/cantera/transport/TransportData.h 66.66% <ø> (ø)
src/base/Solution.cpp 93.65% <90.47%> (-1.70%) ⬇️
src/kinetics/Reaction.cpp 90.76% <100.00%> (+0.08%) ⬆️
src/thermo/Species.cpp 98.96% <100.00%> (+0.06%) ⬆️
src/thermo/SpeciesThermoInterpType.cpp 64.86% <100.00%> (+13.25%) ⬆️
src/thermo/ThermoPhase.cpp 77.33% <100.00%> (+0.15%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81cffde...30ea112. Read the comment docs.

@ischoegl ischoegl force-pushed the support-yaml-notes branch from a33e030 to 80df2ba Compare June 27, 2020 04:48
@ischoegl ischoegl force-pushed the support-yaml-notes branch 3 times, most recently from 75a3181 to a9f2065 Compare June 27, 2020 17:05
@ischoegl ischoegl marked this pull request as ready for review June 27, 2020 17:30
@ischoegl ischoegl force-pushed the support-yaml-notes branch from a9f2065 to d15ae54 Compare June 29, 2020 19:42
@speth speth changed the base branch from master to main June 30, 2020 23:15
@ischoegl
Copy link
Member Author

ischoegl commented Jul 2, 2020

@speth and @tsikes ... any comments would be welcome.

@speth
Copy link
Member

speth commented Jul 3, 2020

As you have anticipated, this overlaps with Cantera/enhancements#11, and my preference would be to provide metadata not used directly by Cantera as a more generic auxiliary data structure. In the long term, I'd like to think that we could better organize this metadata rather than just calling it the "note" associated with a species / reaction / etc, which is about all we can do when translating from Chemkin input.

I'll see if I can extract the implementation I have of making the "extra" input data available in Python -- right now, it's a bit tangled up in my work in progress on YAML serialization, which I don't think will be ready for the 2.5.0 release. If that doesn't work out, I would consider as an interim option a version of this which reads the "note" directly from the AnyMap associated with each of these objects, and with the new methods marked as experimental methods which may be changed or removed without notice from future Cantera versions, as I have done on a few other methods related to handling of input data, for example:

*
* @warning This function is an experimental part of the %Cantera API and
* may be changed or removed without notice.
*/
void applyUnits(const UnitSystem& units);

@ischoegl
Copy link
Member Author

ischoegl commented Jul 3, 2020

@speth ... thanks for your feedback.

[...] my preference would be to provide metadata not used directly by Cantera as a more generic auxiliary data structure.

I am aware of this goal; however, description and note are somewhat special, as they are directly used by ck2yaml (and presumably will be used by yaml2ck as promoted by @tsikes). So from that perspective, those fields are not completely custom, which is what I meant by some of my comments elsewhere. As such, associated properties deserve having a Python docstring (which likely won't be available for a generic custom auxiliary field).

If that doesn't work out, I would consider as an interim option a version of this which reads the "note" directly from the AnyMap associated with each of these objects, and with the new methods marked as experimental methods which may be changed or removed without notice from future Cantera versions

I have no problem with that.

Edit: The implementation / underlying data structure is secondary (whether AnyMap or string), as long as the Python interface remains intact.

@tsikes
Copy link
Contributor

tsikes commented Jul 3, 2020

@speth, with how you envision the implementation of notes, how would they be accessible? I think keeping auxiliary information separate from the data required for Cantera to compute makes sense. I did find this implementation convenient because when I am iterating through species/reactions it is nice to be able to call the note directly. I started to write a version of soln2ck (soon to be yaml2ck) with this implementation, so I don't want to waste my time if the underlying data structure is going to change.

Edit: By underlying data structure, I meant the structure of the Python interface.

@speth
Copy link
Member

speth commented Jul 5, 2020

I am aware of this goal; however, description and note are somewhat special, as they are directly used by ck2yaml (and presumably will be used by yaml2ck as promoted by @tsikes)

Just because ck2yaml currently creates a field named note doesn't mean should formalize that as the way of storing metadata. I know there are multiple pseudo-standards for organizing metadata in Chemkin mechanism comments which could be used to populate more semantically useful fields. For example, I know RMG puts the SMILES for each species as the comment following each species name in the SPECIES section, and information on the source of the reaction rate before each reaction. If ck2yaml gets smart enough to do this, I'd rather the note field be empty except when it doesn't know what else to do.

In this case, I think I would want yaml2ck or soln2ck to be able to have some level of customization for how it structures the comments surrounding any given species / thermo / reaction entry, so that the metadata can be structured in a way that is useful to whatever tool / user is going to consume the mechanism.

@speth, with how you envision the implementation of notes, how would they be accessible? I think keeping auxiliary information separate from the data required for Cantera to compute makes sense.

My current draft implementation provides a Python data structure made of simple types (e.g. nested maps and lists of strings and numbers) which correspond to the YAML entry that would recreate that same object. This includes both the fields used by Cantera as well as those that are not. But maybe Cantera/enhancements#52 is a better place to discuss how to do this without generating extra work as the YAML interface matures.

@ischoegl
Copy link
Member Author

ischoegl commented Jul 6, 2020

@speth ... thank you for your thoughts.

Just because ck2yaml currently creates a field named note doesn't mean should formalize that as the way of storing metadata.

I believe that's not necessarily the case. By including the fields you chose, you have created a precedent that will have to remain backward-compatible, right? I.e. if someone uses a YAML file generated by ck2yaml in 2.5, and then revisits a problem in, say, 3.1, the YAML note fields should still be supported/readable? I'm not saying that there won't be better ways of storing metadata in the future, but I'm not sure that note is easily replaced now that it is de facto formalized.

Regarding the overlap with ongoing work on Cantera/enhancements#11: it appears that there's a need to access note, etc. as soon as 2.5 is released, which ultimately motivated my PR. I.e. I'd like to discuss it some more before kicking things back to Cantera/enhancements#52.

My current draft implementation provides a Python data structure made of simple types (e.g. nested maps and lists of strings and numbers) which correspond to the YAML entry that would recreate that same object. This includes both the fields used by Cantera as well as those that are not.

In terms of YAML, I see the appeal of being fully flexible, but am afraid that this will result in clutter, especially if core-Cantera fields are difficult to differentiate from custom fields. Also, doesn't this approach mean that there will be a lot of redundant information (i.e. information stored by both object and AnyMap/AnyValue)? Would it make sense to populate values using a AnyMap.pop (Python dictionary inspired) rather than AnyMap.at, and store what's left over as an AnyMap custom? For the reverse, you could re-populate an AnyMap with class variables before emitting to YAML? I admit that a lot of this depends on personal coding style, but as mentioned above, I believe there is a need to differentiate essential fields from auxiliary ones.

[..] so that the metadata can be structured in a way that is useful to whatever tool / user is going to consume the mechanism.

Case in point. I'm not sure that it can be anticipated what users expect from YAML (and this goes beyond metadata), and I'm also not sure that ck2yaml can be programmed to anticipate all of them (it would certainly make sense to detect SMILES, though). I'd like to see Cantera as the quarry in the original sense, where users can use easily configurable building blocks. I'm not sure that it can be anticipated what those use cases will be ...

Other thoughts: ... it may make sense to rename note and description to meta effective immediately (I.e. prior to the official release of YAML) and define it as a generic AnyValue field. It thus can be a string (as in the current implementation) for 2.5 or any other structure down the road.

The advantage of this would be that meta is a field name that likely won’t have to change, and keeps all options open for the future. I’d still argue that it should be an official field, and not part of the auxiliary custom fields.

@speth
Copy link
Member

speth commented Jul 9, 2020

I believe that's not necessarily the case. By including the fields you chose, you have created a precedent that will have to remain backward-compatible, right? I.e. if someone uses a YAML file generated by ck2yaml in 2.5, and then revisits a problem in, say, 3.1, the YAML note fields should still be supported/readable? I'm not saying that there won't be better ways of storing metadata in the future, but I'm not sure that note is easily replaced now that it is de facto formalized.

My point was more that ck2yaml isn't the only way to generate a YAML file. I would hope that eventually there will be input files that are generated directly as Cantera-compatible YAML files, for example from RMG, which may or may not want to use a field called note. If a user has an input file with note fields, then yes, I would expect Cantera to continue making that available, once the feature of making all information from the input file is finished.

Other thoughts: ... it may make sense to rename note and description to meta effective immediately (I.e. prior to the official release of YAML) and define it as a generic AnyValue field. It thus can be a string (as in the current implementation) for 2.5 or any other structure down the road.

I don't think we should restrict the organization of data not used directly by Cantera. I would like others who generate YAML files to be able to add fields without imposing this kind of structure.

As to your other comments, I would suggest that this discussion wait until I share a draft implementation of Cantera/enhancements#11. I think I agree with most of your goals, but it's apparently hard to show that what I'm doing actually achieves those ends without concrete examples. With the understanding that this addition will provide access to all YAML data, my suggestion is to close this PR.

@ischoegl
Copy link
Member Author

ischoegl commented Jul 9, 2020

I don't think we should restrict the organization of data not used directly by Cantera. I would like others who generate YAML files to be able to add fields without imposing this kind of structure.

The only constraint would be to use the field meta for metadata. This would ensure that metadata can be located in a predictable attribute; there would be no restrictions regarding the content of this field. See Cantera/enhancements#56.

With the understanding that this addition will provide access to all YAML data, my suggestion is to close this PR.

I have no objection to closing once the discussions on Cantera/enhancements#52 and Cantera/enhancements#56 have converged. Specifically, I don't see the point why it doesn't make sense to reserve a field name where users can expect to find metadata (if any are specified)?

If there's agreement that meta makes sense, then Cantera 2.5 could introduce attributes that access metadata where the underlying storage structure is replaced once Cantera/enhancements#11 is in place, with next to no change of any of the external interfaces.

@bryanwweber
Copy link
Member

@ischoegl As I said on Cantera/enhancements#56, I have reservations about including access to YAML fields by attributes on a class instance. Since (my understanding of) the main use case for these fields right now is to support yaml2ck proposed in Cantera/enhancements#52, I would prefer to wait for the more flexible implementation @speth has planned, and accept that yaml2ck won't be quite feature complete until that is released.

There seems to be a tension right now between releasing 2.5.0 ASAP, even though YAML serialization won't be ready, and providing some sort of limited access to certain fields in the YAML. I'm personally in the camp of let's get 2.5.0 out now and accept that it won't be feature-complete. It has been 2 years since 2.4.0 and there are over 800 commits on main since then (probably approaching 900 by now), including a lot of really useful features and substantially improved packaging with conda. The latter would make a big difference for any instructors who might use Cantera this fall...

To specifically address this:

Specifically, I don't see the point why it doesn't make sense to reserve a field name where users can expect to find metadata (if any are specified)?

I said on Cantera/enhancements#56 that what is metadata for Cantera may not be metadata for another tool. Rather than restricting those other tools to have to use a specific field if they want to support Cantera, I'd rather be flexible.

@ischoegl
Copy link
Member Author

I would prefer to wait for the more flexible implementation @speth has planned, and accept that yaml2ck won't be quite feature complete until that is released.

As I have mentioned, there isn’t really a contradiction of implementations whatsoever: what I was proposing is a front end that allows access to existing YAML data. All it would have done is formalize access.

@ischoegl ischoegl closed this Jul 10, 2020
@ischoegl ischoegl mentioned this pull request Jul 10, 2020
4 tasks
@ischoegl
Copy link
Member Author

ischoegl commented Apr 14, 2021

With #984 now merged and Cantera/enhancements#11 closed, I am reopening this PR. It will require some updates of course.

@ischoegl ischoegl reopened this Apr 14, 2021
@ischoegl ischoegl changed the title Preserve YAML annotations Access YAML annotations Apr 14, 2021
@tsikes
Copy link
Contributor

tsikes commented Apr 14, 2021

I will take a look at this as soon as I can. I'm happy that we'll have access to more fields!

@ischoegl
Copy link
Member Author

@tsikes ... not sure whether the PR is necessary at all (will likewise have to look at it again). I mainly re-opened it to ensure that the associated work won't get forgotten.

@ischoegl ischoegl force-pushed the support-yaml-notes branch from d15ae54 to 888d17d Compare April 15, 2021 16:38
@ischoegl ischoegl added the Input Input parsing and conversion (for example, ck2yaml) label Apr 15, 2021
@ischoegl ischoegl force-pushed the support-yaml-notes branch from 888d17d to c09833c Compare April 16, 2021 02:03
@ischoegl ischoegl force-pushed the support-yaml-notes branch from c09833c to c871bee Compare April 16, 2021 03:52
@ischoegl ischoegl force-pushed the support-yaml-notes branch from c871bee to 41540b8 Compare April 16, 2021 04:17
@ischoegl
Copy link
Member Author

ischoegl commented Apr 16, 2021

@speth (and @tsikes) ... I looked over the old PR and blended it with the serialization that was just implemented in #984. After the update, this PR has the exact same external interface for getters as before, and adds setters that had been only partially implemented. So at this point:

  • description and note fields are formalized and have getters and setters in both C++ and Python
  • description was - as far as I could see - missing from the serialization, so I left the original implementation for this addition Edit: this is now added (together with other non-essential fields at the base level of the YAML file)
  • note accesses (and potentially overwrites) input AnyMap, so there is no parallel storage (i.e. this is the major change from the previous implementation)

In a nutshell, the note getter is at this point a convenience function (a shorthand for input_data["note"]; an equivalent for the setter currently does not exist, as there is no python_to_anymap implemented. I still believe that providing these access functions has benefits for the community (e.g. yaml2ck in Cantera/enhancements#52); while the implementation may be refined later on, the properties themselves have a historical justification.

As an aside, it may make sense to add the fix for fields at the base level of the YAML input that are currently not buffered to Solution.h. It's a minor addition and easy to implement. Edit: now added

@ischoegl ischoegl requested review from speth and removed request for speth April 16, 2021 04:30
@bryanwweber
Copy link
Member

@ischoegl Is this still worth pursuing? Since python_to_anymap is now implemented, are there any updates?

@ischoegl
Copy link
Member Author

ischoegl commented Jul 1, 2021

As it stands, it contains a fix to #1013, and I’d like to hear @tsikes’ opinion on this too. I believe there are some useful bits here, but won’t invest time unless I know where this is going. The introduction of python_to_anymap imho doesn’t make everything here obsolete.

@tsikes
Copy link
Contributor

tsikes commented Jul 3, 2021

From what I understand python_to_anymap would allow access to most fields of interest, but without this change access to the description would be lost. Depending on how people use the description, this could be useful information that should be preserved as well.

@ischoegl
Copy link
Member Author

Superseded by #1129

@ischoegl ischoegl closed this Oct 23, 2021
@ischoegl ischoegl deleted the support-yaml-notes branch January 8, 2022 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Input Input parsing and conversion (for example, ck2yaml)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serialization of root-level information in YAML
4 participants