Access YAML annotations #881

ischoegl · 2020-06-26T23:20:03Z

Checklist

There is a clear use-case for this code change
The commit message has a short title & references relevant issues
Build passes (scons build & scons test) and unit tests address code coverage
The pull request is ready for review

If applicable, fill in the issue number this pull request is fixing

Addresses annotations discussed in Cantera/enhancements#52

Changes proposed in this pull request

preserve top level description field
preserve note fields for Species, SpeciesThermo, Reaction, TransportData, and ThermoPhase
~~fix an issue where yaml-cpp does not recognize strings containing integers contained in quotes, e.g. '1234'~~ … already merged

Thoughts: description and note fields are part of the canonical YAML implementation (see ck2yaml); the explicit inclusion ensures that Sphinx docstrings are generated.

~~Some C++ classes already have a AnyMap input attribute, which may be used instead of a string m_note as envisioned in the initial draft.~~

codecov · 2020-06-27T00:12:47Z

Codecov Report

Merging #881 (922760d) into main (81cffde) will increase coverage by 0.03%.
The diff coverage is 96.55%.

❗ Current head 922760d differs from pull request most recent head 30ea112. Consider uploading reports for the commit 30ea112 to get more accurate results

@@            Coverage Diff             @@
##             main     #881      +/-   ##
==========================================
+ Coverage   72.39%   72.43%   +0.03%     
==========================================
  Files         364      364              
  Lines       46433    46489      +56     
==========================================
+ Hits        33616    33672      +56     
  Misses      12817    12817

Impacted Files	Coverage Δ
include/cantera/base/Solution.h	`100.00% <ø> (ø)`
include/cantera/kinetics/Reaction.h	`100.00% <ø> (ø)`
include/cantera/thermo/SpeciesThermoInterpType.h	`90.00% <ø> (ø)`
include/cantera/thermo/ThermoPhase.h	`31.16% <ø> (ø)`
include/cantera/transport/TransportData.h	`66.66% <ø> (ø)`
src/base/Solution.cpp	`93.65% <90.47%> (-1.70%)`	⬇️
src/kinetics/Reaction.cpp	`90.76% <100.00%> (+0.08%)`	⬆️
src/thermo/Species.cpp	`98.96% <100.00%> (+0.06%)`	⬆️
src/thermo/SpeciesThermoInterpType.cpp	`64.86% <100.00%> (+13.25%)`	⬆️
src/thermo/ThermoPhase.cpp	`77.33% <100.00%> (+0.15%)`	⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81cffde...30ea112. Read the comment docs.

ischoegl · 2020-07-02T14:17:28Z

@speth and @tsikes ... any comments would be welcome.

speth · 2020-07-03T17:07:30Z

As you have anticipated, this overlaps with Cantera/enhancements#11, and my preference would be to provide metadata not used directly by Cantera as a more generic auxiliary data structure. In the long term, I'd like to think that we could better organize this metadata rather than just calling it the "note" associated with a species / reaction / etc, which is about all we can do when translating from Chemkin input.

I'll see if I can extract the implementation I have of making the "extra" input data available in Python -- right now, it's a bit tangled up in my work in progress on YAML serialization, which I don't think will be ready for the 2.5.0 release. If that doesn't work out, I would consider as an interim option a version of this which reads the "note" directly from the AnyMap associated with each of these objects, and with the new methods marked as experimental methods which may be changed or removed without notice from future Cantera versions, as I have done on a few other methods related to handling of input data, for example:

cantera/include/cantera/base/AnyMap.h

Lines 506 to 510 in 047e74b

    
                * 
        
                * @warning This function is an experimental part of the %Cantera API and 
        
                *     may be changed or removed without notice. 
        
                */ 
        
               void applyUnits(const UnitSystem& units);

ischoegl · 2020-07-03T17:29:32Z

@speth ... thanks for your feedback.

[...] my preference would be to provide metadata not used directly by Cantera as a more generic auxiliary data structure.

I am aware of this goal; however, description and note are somewhat special, as they are directly used by ck2yaml (and presumably will be used by yaml2ck as promoted by @tsikes). So from that perspective, those fields are not completely custom, which is what I meant by some of my comments elsewhere. As such, associated properties deserve having a Python docstring (which likely won't be available for a generic custom auxiliary field).

If that doesn't work out, I would consider as an interim option a version of this which reads the "note" directly from the AnyMap associated with each of these objects, and with the new methods marked as experimental methods which may be changed or removed without notice from future Cantera versions

I have no problem with that.

Edit: The implementation / underlying data structure is secondary (whether AnyMap or string), as long as the Python interface remains intact.

tsikes · 2020-07-03T18:38:06Z

@speth, with how you envision the implementation of notes, how would they be accessible? I think keeping auxiliary information separate from the data required for Cantera to compute makes sense. I did find this implementation convenient because when I am iterating through species/reactions it is nice to be able to call the note directly. I started to write a version of soln2ck (soon to be yaml2ck) with this implementation, so I don't want to waste my time if the underlying data structure is going to change.

Edit: By underlying data structure, I meant the structure of the Python interface.

speth · 2020-07-05T15:06:48Z

I am aware of this goal; however, description and note are somewhat special, as they are directly used by ck2yaml (and presumably will be used by yaml2ck as promoted by @tsikes)

Just because ck2yaml currently creates a field named note doesn't mean should formalize that as the way of storing metadata. I know there are multiple pseudo-standards for organizing metadata in Chemkin mechanism comments which could be used to populate more semantically useful fields. For example, I know RMG puts the SMILES for each species as the comment following each species name in the SPECIES section, and information on the source of the reaction rate before each reaction. If ck2yaml gets smart enough to do this, I'd rather the note field be empty except when it doesn't know what else to do.

In this case, I think I would want yaml2ck or soln2ck to be able to have some level of customization for how it structures the comments surrounding any given species / thermo / reaction entry, so that the metadata can be structured in a way that is useful to whatever tool / user is going to consume the mechanism.

@speth, with how you envision the implementation of notes, how would they be accessible? I think keeping auxiliary information separate from the data required for Cantera to compute makes sense.

My current draft implementation provides a Python data structure made of simple types (e.g. nested maps and lists of strings and numbers) which correspond to the YAML entry that would recreate that same object. This includes both the fields used by Cantera as well as those that are not. But maybe Cantera/enhancements#52 is a better place to discuss how to do this without generating extra work as the YAML interface matures.

ischoegl · 2020-07-06T00:31:51Z

@speth ... thank you for your thoughts.

Just because ck2yaml currently creates a field named note doesn't mean should formalize that as the way of storing metadata.

I believe that's not necessarily the case. By including the fields you chose, you have created a precedent that will have to remain backward-compatible, right? I.e. if someone uses a YAML file generated by ck2yaml in 2.5, and then revisits a problem in, say, 3.1, the YAML note fields should still be supported/readable? I'm not saying that there won't be better ways of storing metadata in the future, but I'm not sure that note is easily replaced now that it is de facto formalized.

Regarding the overlap with ongoing work on Cantera/enhancements#11: it appears that there's a need to access note, etc. as soon as 2.5 is released, which ultimately motivated my PR. I.e. I'd like to discuss it some more before kicking things back to Cantera/enhancements#52.

My current draft implementation provides a Python data structure made of simple types (e.g. nested maps and lists of strings and numbers) which correspond to the YAML entry that would recreate that same object. This includes both the fields used by Cantera as well as those that are not.

In terms of YAML, I see the appeal of being fully flexible, but am afraid that this will result in clutter, especially if core-Cantera fields are difficult to differentiate from custom fields. Also, doesn't this approach mean that there will be a lot of redundant information (i.e. information stored by both object and AnyMap/AnyValue)? Would it make sense to populate values using a AnyMap.pop (Python dictionary inspired) rather than AnyMap.at, and store what's left over as an AnyMap custom? For the reverse, you could re-populate an AnyMap with class variables before emitting to YAML? I admit that a lot of this depends on personal coding style, but as mentioned above, I believe there is a need to differentiate essential fields from auxiliary ones.

[..] so that the metadata can be structured in a way that is useful to whatever tool / user is going to consume the mechanism.

Case in point. I'm not sure that it can be anticipated what users expect from YAML (and this goes beyond metadata), and I'm also not sure that ck2yaml can be programmed to anticipate all of them (it would certainly make sense to detect SMILES, though). I'd like to see Cantera as the quarry in the original sense, where users can use easily configurable building blocks. I'm not sure that it can be anticipated what those use cases will be ...

Other thoughts: ... it may make sense to rename note and description to meta effective immediately (I.e. prior to the official release of YAML) and define it as a generic AnyValue field. It thus can be a string (as in the current implementation) for 2.5 or any other structure down the road.

The advantage of this would be that meta is a field name that likely won’t have to change, and keeps all options open for the future. I’d still argue that it should be an official field, and not part of the auxiliary custom fields.

speth · 2020-07-09T20:37:21Z

I believe that's not necessarily the case. By including the fields you chose, you have created a precedent that will have to remain backward-compatible, right? I.e. if someone uses a YAML file generated by ck2yaml in 2.5, and then revisits a problem in, say, 3.1, the YAML note fields should still be supported/readable? I'm not saying that there won't be better ways of storing metadata in the future, but I'm not sure that note is easily replaced now that it is de facto formalized.

My point was more that ck2yaml isn't the only way to generate a YAML file. I would hope that eventually there will be input files that are generated directly as Cantera-compatible YAML files, for example from RMG, which may or may not want to use a field called note. If a user has an input file with note fields, then yes, I would expect Cantera to continue making that available, once the feature of making all information from the input file is finished.

Other thoughts: ... it may make sense to rename note and description to meta effective immediately (I.e. prior to the official release of YAML) and define it as a generic AnyValue field. It thus can be a string (as in the current implementation) for 2.5 or any other structure down the road.

I don't think we should restrict the organization of data not used directly by Cantera. I would like others who generate YAML files to be able to add fields without imposing this kind of structure.

As to your other comments, I would suggest that this discussion wait until I share a draft implementation of Cantera/enhancements#11. I think I agree with most of your goals, but it's apparently hard to show that what I'm doing actually achieves those ends without concrete examples. With the understanding that this addition will provide access to all YAML data, my suggestion is to close this PR.

ischoegl · 2020-07-09T21:30:13Z

I don't think we should restrict the organization of data not used directly by Cantera. I would like others who generate YAML files to be able to add fields without imposing this kind of structure.

The only constraint would be to use the field meta for metadata. This would ensure that metadata can be located in a predictable attribute; there would be no restrictions regarding the content of this field. See Cantera/enhancements#56.

With the understanding that this addition will provide access to all YAML data, my suggestion is to close this PR.

I have no objection to closing once the discussions on Cantera/enhancements#52 and Cantera/enhancements#56 have converged. Specifically, I don't see the point why it doesn't make sense to reserve a field name where users can expect to find metadata (if any are specified)?

If there's agreement that meta makes sense, then Cantera 2.5 could introduce attributes that access metadata where the underlying storage structure is replaced once Cantera/enhancements#11 is in place, with next to no change of any of the external interfaces.

bryanwweber · 2020-07-10T11:42:20Z

@ischoegl As I said on Cantera/enhancements#56, I have reservations about including access to YAML fields by attributes on a class instance. Since (my understanding of) the main use case for these fields right now is to support yaml2ck proposed in Cantera/enhancements#52, I would prefer to wait for the more flexible implementation @speth has planned, and accept that yaml2ck won't be quite feature complete until that is released.

There seems to be a tension right now between releasing 2.5.0 ASAP, even though YAML serialization won't be ready, and providing some sort of limited access to certain fields in the YAML. I'm personally in the camp of let's get 2.5.0 out now and accept that it won't be feature-complete. It has been 2 years since 2.4.0 and there are over 800 commits on main since then (probably approaching 900 by now), including a lot of really useful features and substantially improved packaging with conda. The latter would make a big difference for any instructors who might use Cantera this fall...

To specifically address this:

Specifically, I don't see the point why it doesn't make sense to reserve a field name where users can expect to find metadata (if any are specified)?

I said on Cantera/enhancements#56 that what is metadata for Cantera may not be metadata for another tool. Rather than restricting those other tools to have to use a specific field if they want to support Cantera, I'd rather be flexible.

ischoegl · 2020-07-10T13:26:44Z

I would prefer to wait for the more flexible implementation @speth has planned, and accept that yaml2ck won't be quite feature complete until that is released.

As I have mentioned, there isn’t really a contradiction of implementations whatsoever: what I was proposing is a front end that allows access to existing YAML data. All it would have done is formalize access.

ischoegl · 2021-04-14T18:05:29Z

With #984 now merged and Cantera/enhancements#11 closed, I am reopening this PR. It will require some updates of course.

tsikes · 2021-04-14T23:39:43Z

I will take a look at this as soon as I can. I'm happy that we'll have access to more fields!

ischoegl · 2021-04-14T23:50:40Z

@tsikes ... not sure whether the PR is necessary at all (will likewise have to look at it again). I mainly re-opened it to ensure that the associated work won't get forgotten.

ischoegl · 2021-04-16T04:30:06Z

@speth (and @tsikes) ... I looked over the old PR and blended it with the serialization that was just implemented in #984. After the update, this PR has the exact same external interface for getters as before, and adds setters that had been only partially implemented. So at this point:

description and note fields are formalized and have getters and setters in both C++ and Python
description was - as far as I could see - missing from the serialization, ~~so I left the original implementation for this addition~~ Edit: this is now added (together with other non-essential fields at the base level of the YAML file)
note accesses (and potentially overwrites) input AnyMap, so there is no parallel storage (i.e. this is the major change from the previous implementation)

In a nutshell, the note getter is at this point a convenience function (a shorthand for input_data["note"]; an equivalent for the setter currently does not exist, as there is no python_to_anymap implemented. I still believe that providing these access functions has benefits for the community (e.g. yaml2ck in Cantera/enhancements#52); while the implementation may be refined later on, the properties themselves have a historical justification.

~~As an aside, it may make sense to add the fix for fields at the base level of the YAML input that are currently not buffered to Solution.h. It's a minor addition and easy to implement.~~ Edit: now added

bryanwweber · 2021-07-01T13:55:06Z

@ischoegl Is this still worth pursuing? Since python_to_anymap is now implemented, are there any updates?

ischoegl · 2021-07-01T21:55:57Z

As it stands, it contains a fix to #1013, and I’d like to hear @tsikes’ opinion on this too. I believe there are some useful bits here, but won’t invest time unless I know where this is going. The introduction of python_to_anymap imho doesn’t make everything here obsolete.

tsikes · 2021-07-03T02:59:23Z

From what I understand python_to_anymap would allow access to most fields of interest, but without this change access to the description would be lost. Depending on how people use the description, this could be useful information that should be preserved as well.

ischoegl · 2021-10-23T22:50:52Z

Superseded by #1129

ischoegl marked this pull request as draft June 26, 2020 23:20

ischoegl force-pushed the support-yaml-notes branch from a33e030 to 80df2ba Compare June 27, 2020 04:48

ischoegl mentioned this pull request Jun 27, 2020

Export Solution to Chemkin Format Cantera/enhancements#52

Closed

ischoegl force-pushed the support-yaml-notes branch 3 times, most recently from 75a3181 to a9f2065 Compare June 27, 2020 17:05

ischoegl mentioned this pull request Jun 27, 2020

New CI test "CI / Build docs (pull_request)" fails #882

Closed

ischoegl marked this pull request as ready for review June 27, 2020 17:30

ischoegl force-pushed the support-yaml-notes branch from a9f2065 to d15ae54 Compare June 29, 2020 19:42

speth changed the base branch from master to main June 30, 2020 23:15

This was referenced Jul 6, 2020

Implement serialization to YAML format Cantera/enhancements#11

Closed

Formalize YAML metadata fields Cantera/enhancements#56

Closed

ischoegl closed this Jul 10, 2020

ischoegl mentioned this pull request Jul 10, 2020

Minor fixes in AnyValue and scons #899

Merged

4 tasks

ischoegl reopened this Apr 14, 2021

ischoegl changed the title ~~Preserve YAML annotations~~ Access YAML annotations Apr 14, 2021

ischoegl force-pushed the support-yaml-notes branch from d15ae54 to 888d17d Compare April 15, 2021 16:38

ischoegl added the Input Input parsing and conversion (for example, ck2yaml) label Apr 15, 2021

ischoegl force-pushed the support-yaml-notes branch from 888d17d to c09833c Compare April 16, 2021 02:03

ischoegl added 2 commits April 15, 2021 22:10

[CI] Switch selected tests to h2o2.yaml

b776f7d

[Input] Make YAML description available in Solution

93ea7e9

ischoegl force-pushed the support-yaml-notes branch from c09833c to c871bee Compare April 16, 2021 03:52

ischoegl added 5 commits April 15, 2021 23:17

[Input] Access YAML note field for Species

67ddd2a

[Input] Access YAML note field for SpeciesThermo

c27bd36

[Input] Access YAML note field for Reaction

472cd4d

[Input] Access YAML note field for TransportData

71d7ed4

[Input] Access YAML note field for ThermoPhase

41540b8

ischoegl force-pushed the support-yaml-notes branch from c871bee to 41540b8 Compare April 16, 2021 04:17

ischoegl requested review from speth and removed request for speth April 16, 2021 04:30

[Input] Preserve non-essential YAML fields

2e066ee

ischoegl mentioned this pull request Apr 16, 2021

Serialization of root-level information in YAML #1013

Closed

ischoegl requested a review from speth April 16, 2021 12:52

[Input] Implement access to root level fields of the YAML input

30ea112

ischoegl mentioned this pull request Oct 23, 2021

Read YAML using C++ newSolution in Python / access root level data #1129

Merged

5 tasks

ischoegl closed this Oct 23, 2021

ischoegl deleted the support-yaml-notes branch January 8, 2022 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access YAML annotations #881

Access YAML annotations #881

ischoegl commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 27, 2020 •

edited

Loading

ischoegl commented Jul 2, 2020

speth commented Jul 3, 2020

ischoegl commented Jul 3, 2020 •

edited

Loading

tsikes commented Jul 3, 2020 •

edited

Loading

speth commented Jul 5, 2020

ischoegl commented Jul 6, 2020 •

edited

Loading

speth commented Jul 9, 2020

ischoegl commented Jul 9, 2020 •

edited

Loading

bryanwweber commented Jul 10, 2020

ischoegl commented Jul 10, 2020

ischoegl commented Apr 14, 2021 •

edited

Loading

tsikes commented Apr 14, 2021

ischoegl commented Apr 14, 2021

ischoegl commented Apr 16, 2021 •

edited

Loading

bryanwweber commented Jul 1, 2021

ischoegl commented Jul 1, 2021 •

edited

Loading

tsikes commented Jul 3, 2021

ischoegl commented Oct 23, 2021

Access YAML annotations #881

Access YAML annotations #881

Conversation

ischoegl commented Jun 26, 2020 • edited Loading

codecov bot commented Jun 27, 2020 • edited Loading

Codecov Report

ischoegl commented Jul 2, 2020

speth commented Jul 3, 2020

ischoegl commented Jul 3, 2020 • edited Loading

tsikes commented Jul 3, 2020 • edited Loading

speth commented Jul 5, 2020

ischoegl commented Jul 6, 2020 • edited Loading

speth commented Jul 9, 2020

ischoegl commented Jul 9, 2020 • edited Loading

bryanwweber commented Jul 10, 2020

ischoegl commented Jul 10, 2020

ischoegl commented Apr 14, 2021 • edited Loading

tsikes commented Apr 14, 2021

ischoegl commented Apr 14, 2021

ischoegl commented Apr 16, 2021 • edited Loading

bryanwweber commented Jul 1, 2021

ischoegl commented Jul 1, 2021 • edited Loading

tsikes commented Jul 3, 2021

ischoegl commented Oct 23, 2021

ischoegl commented Jun 26, 2020 •

edited

Loading

codecov bot commented Jun 27, 2020 •

edited

Loading

ischoegl commented Jul 3, 2020 •

edited

Loading

tsikes commented Jul 3, 2020 •

edited

Loading

ischoegl commented Jul 6, 2020 •

edited

Loading

ischoegl commented Jul 9, 2020 •

edited

Loading

ischoegl commented Apr 14, 2021 •

edited

Loading

ischoegl commented Apr 16, 2021 •

edited

Loading

ischoegl commented Jul 1, 2021 •

edited

Loading