Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of pickle serialization of Cython objects in python API #10

Closed
ghost opened this issue Nov 7, 2018 · 9 comments · Fixed by Cantera/cantera#692
Closed

Support of pickle serialization of Cython objects in python API #10

ghost opened this issue Nov 7, 2018 · 9 comments · Fixed by Cantera/cantera#692
Labels
feature-request New feature request

Comments

@ghost
Copy link

ghost commented Nov 7, 2018

Serialization is used to transfer objects between processes, nodes, or even between windows and linux. It is also widely used in parallel program design. However, the cython objects such as ct.Solution and ct.Reaction cannot be serialized due to lack of serialization support.

refer to https://snorfalorpagus.net/blog/2016/04/16/pickling-cython-classes/. It seems simple to support serialization in python using pickle.

I tried to use monkey patching to assign __reduce__() method to Reaction object, but it is forbidden by python. So I think it is better to be implemented in cython layer.

Think about this scenario: I can save the Solution object or even FreeFlame object to a *.pkl file in Linux and send it to someone or forum. The receiver can simply load it in Windows and help me to check what is going wrong.

@MicaelBoulet
Copy link

MicaelBoulet commented Feb 11, 2019

Interesting application example, but running parallel script might be the true motivation for this. Like large parametric study on multicore system or cluster. There are way around for this, but call for more effort and that would be neat to have support for pickle.

@skyreflectedinmirrors
Copy link

@MicaelBoulet agreed -- pickle support would be huge for multiprocessing (and much more efficient than having to reinitialize your gas object on every new process). I think the issue may lie deeper however -- wouldn't pickling require implementations of copy constructors on the underlying C++ objects (or at the very least, the implementation of a C++ __getstate__ and __setstate__-like interface)?

@speth
Copy link
Member

speth commented Feb 11, 2019

I think serialization for storage and portability is distinct from serialization for parallel processing. In the latter case, you want to limit the data being transferred. In the instance of a Solution object in Python, this is already achieved by using the (temperature, density, mass fractions) vector available as phase.state. If you actually transmitted a more complete representation, e.g. all of the thermo and reaction data, I think that would probably be comparable to constructing the object from an input file, which you really want to do only once per thread/process.

The C++ copy constructor doesn't really provide any utility here, since it just creates another instance of an object in the same process, not one that can be saved or transmitted.

I would also say that the original suggested use of serialization here isn't a great application -- If you're having a problem, the interesting thing isn't the end state but how you got there, for which you need the code that produced that state.

@bryanwweber
Copy link
Member

@speth It would be good to have a set of best-practices for using at least multiprocessing in Python somewhere. I don't think I know what the best thing to do is, so I just end up creating new Solution objects in every process. Is there any way to share that object between processes? Does it even make sense to share it, if a Reactor or ReactorNet is going to modify the state during a solve? We have the multiprocessing_viscosity example, but that doesn't involve a Reactor.

@speth
Copy link
Member

speth commented Feb 12, 2019

For the common use case of doing some sort of parameter sweep, creating a new Solution object in each process would be the best thing to do even if there were full serialization support, exactly so that each process can do its work without stepping on the others. A version of the multiprocessing_viscosity.py example where the work done in each thread was a reactor calculation might make this more clear.

I'm having trouble thinking of a case where a full Solution object is really what you want to pass around to a different process. A Solution object carries a lot more information than just the thermodynamic state, so it isn't the right container if that's all you need. For example, in the 1D flame classes, we use a single ThermoPhase object for all grid points, not a separate object for each point.

The one case where serializing a Solution does make sense to me is where the species/reactions present have been determined dynamically, and you want to write an input file that will recreate that same set of species and reactions. That's something that will eventually follow from the work I'm doing on Cantera/cantera#584.

@ischoegl
Copy link
Member

ischoegl commented Aug 13, 2019

@speth ... now that yaml support is implemented with Cantera/cantera#584, is there a way to write the make-up of a Solution object (I.e. species list, reactions, etc) back to a yaml string? If this is/were implemented, pickling of a Solution appears to be feasible based on yaml configuration plus current state vector. Among other possible uses, this would also be a feature that is close to what PR Cantera/cantera#451 tried to accomplish (I believe this is what you’re getting at in the last post). I may look into this once documentation for yaml is available.

In principle, I do agree that there is no way around reinitializing one Solution object per thread in a batch processing task; pickling may just be another way to do that. Reactor networks are a different subject matter, as there is currently no machinery or agreed-upon syntax to specify the structure automatically.

@speth
Copy link
Member

speth commented Aug 13, 2019

No, I'm still working on that. It will obviously be a very useful feature, although I would still hesitate to use it to provide the implementation of __reduce__, for the reasons that I explained before.

@ischoegl
Copy link
Member

ischoegl commented Aug 13, 2019

No, I'm still working on that. It will obviously be a very useful feature,

agreed 👍

I would still hesitate to use it to provide the implementation of __reduce__, for the reasons that I explained before.

I think that pickling the "blueprint" of a Solution (with subsequent instantiation) as suggested in PR Cantera/cantera#692 should be quite safe and follows the intent of your statements?

@speth speth transferred this issue from Cantera/cantera Dec 4, 2019
@speth speth changed the title Feature Request: support of pickle serialization of Cython objects in python API Support of pickle serialization of Cython objects in python API Dec 6, 2019
@speth speth added the feature-request New feature request label Dec 6, 2019
@ischoegl
Copy link
Member

Pickle support will be at least feasible once YAML serialization is implemented (see #11). As mentioned above, Cantera/cantera#692 illustrates a blueprint that is based on artifacts from (soon to be deprecated) CTI/XML import. The same can be easily accomplished once YAML emitters of various Cantera objects - as well as the entire Solution object itself - become available.

There are caveats about parts of Cantera using multiple threads already internally (e.g. via Intel MKL), which, however, can be alleviated by setting the environment variable OMP_NUM_THREADS=1 (this works at least on Linux).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants