Support of pickle serialization of Cython objects in python API #10

ghost · 2018-11-07T11:46:55Z

Serialization is used to transfer objects between processes, nodes, or even between windows and linux. It is also widely used in parallel program design. However, the cython objects such as ct.Solution and ct.Reaction cannot be serialized due to lack of serialization support.

refer to https://snorfalorpagus.net/blog/2016/04/16/pickling-cython-classes/. It seems simple to support serialization in python using pickle.

I tried to use monkey patching to assign __reduce__() method to Reaction object, but it is forbidden by python. So I think it is better to be implemented in cython layer.

Think about this scenario: I can save the Solution object or even FreeFlame object to a *.pkl file in Linux and send it to someone or forum. The receiver can simply load it in Windows and help me to check what is going wrong.

The text was updated successfully, but these errors were encountered:

MicaelBoulet · 2019-02-11T16:14:12Z

Interesting application example, but running parallel script might be the true motivation for this. Like large parametric study on multicore system or cluster. There are way around for this, but call for more effort and that would be neat to have support for pickle.

skyreflectedinmirrors · 2019-02-11T16:33:29Z

@MicaelBoulet agreed -- pickle support would be huge for multiprocessing (and much more efficient than having to reinitialize your gas object on every new process). I think the issue may lie deeper however -- wouldn't pickling require implementations of copy constructors on the underlying C++ objects (or at the very least, the implementation of a C++ __getstate__ and __setstate__-like interface)?

speth · 2019-02-11T19:05:21Z

I think serialization for storage and portability is distinct from serialization for parallel processing. In the latter case, you want to limit the data being transferred. In the instance of a Solution object in Python, this is already achieved by using the (temperature, density, mass fractions) vector available as phase.state. If you actually transmitted a more complete representation, e.g. all of the thermo and reaction data, I think that would probably be comparable to constructing the object from an input file, which you really want to do only once per thread/process.

The C++ copy constructor doesn't really provide any utility here, since it just creates another instance of an object in the same process, not one that can be saved or transmitted.

I would also say that the original suggested use of serialization here isn't a great application -- If you're having a problem, the interesting thing isn't the end state but how you got there, for which you need the code that produced that state.

bryanwweber · 2019-02-12T01:16:54Z

@speth It would be good to have a set of best-practices for using at least multiprocessing in Python somewhere. I don't think I know what the best thing to do is, so I just end up creating new Solution objects in every process. Is there any way to share that object between processes? Does it even make sense to share it, if a Reactor or ReactorNet is going to modify the state during a solve? We have the multiprocessing_viscosity example, but that doesn't involve a Reactor.

speth · 2019-02-12T04:41:24Z

For the common use case of doing some sort of parameter sweep, creating a new Solution object in each process would be the best thing to do even if there were full serialization support, exactly so that each process can do its work without stepping on the others. A version of the multiprocessing_viscosity.py example where the work done in each thread was a reactor calculation might make this more clear.

I'm having trouble thinking of a case where a full Solution object is really what you want to pass around to a different process. A Solution object carries a lot more information than just the thermodynamic state, so it isn't the right container if that's all you need. For example, in the 1D flame classes, we use a single ThermoPhase object for all grid points, not a separate object for each point.

The one case where serializing a Solution does make sense to me is where the species/reactions present have been determined dynamically, and you want to write an input file that will recreate that same set of species and reactions. That's something that will eventually follow from the work I'm doing on Cantera/cantera#584.

ischoegl · 2019-08-13T02:29:39Z

@speth ... now that yaml support is implemented with Cantera/cantera#584, is there a way to write the make-up of a Solution object (I.e. species list, reactions, etc) back to a yaml string? If this is/were implemented, pickling of a Solution appears to be feasible based on yaml configuration plus current state vector. Among other possible uses, this would also be a feature that is close to what PR Cantera/cantera#451 tried to accomplish (I believe this is what you’re getting at in the last post). I may look into this once documentation for yaml is available.

In principle, I do agree that there is no way around reinitializing one Solution object per thread in a batch processing task; pickling may just be another way to do that. Reactor networks are a different subject matter, as there is currently no machinery or agreed-upon syntax to specify the structure automatically.

speth · 2019-08-13T18:01:23Z

No, I'm still working on that. It will obviously be a very useful feature, although I would still hesitate to use it to provide the implementation of __reduce__, for the reasons that I explained before.

ischoegl · 2019-08-13T18:18:45Z

No, I'm still working on that. It will obviously be a very useful feature,

agreed 👍

I would still hesitate to use it to provide the implementation of __reduce__, for the reasons that I explained before.

I think that pickling the "blueprint" of a Solution (with subsequent instantiation) as suggested in PR Cantera/cantera#692 should be quite safe and follows the intent of your statements?

ischoegl · 2020-07-11T14:18:37Z

Pickle support will be at least feasible once YAML serialization is implemented (see #11). As mentioned above, Cantera/cantera#692 illustrates a blueprint that is based on artifacts from (soon to be deprecated) CTI/XML import. The same can be easily accomplished once YAML emitters of various Cantera objects - as well as the entire Solution object itself - become available.

There are caveats about parts of Cantera using multiple threads already internally (e.g. via Intel MKL), which, however, can be alleviated by setting the environment variable OMP_NUM_THREADS=1 (this works at least on Linux).

speth transferred this issue from Cantera/cantera Dec 4, 2019

speth changed the title ~~Feature Request: support of pickle serialization of Cython objects in python API~~ Support of pickle serialization of Cython objects in python API Dec 6, 2019

speth added the feature-request New feature request label Dec 6, 2019

ischoegl mentioned this issue Jun 23, 2020

Pickle serialization of Solution objects Cantera/cantera#692

Merged

speth closed this as completed in Cantera/cantera#692 Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of pickle serialization of Cython objects in python API #10

Support of pickle serialization of Cython objects in python API #10

ghost commented Nov 7, 2018

MicaelBoulet commented Feb 11, 2019 •

edited

Loading

skyreflectedinmirrors commented Feb 11, 2019

speth commented Feb 11, 2019

bryanwweber commented Feb 12, 2019

speth commented Feb 12, 2019

ischoegl commented Aug 13, 2019 •

edited

Loading

speth commented Aug 13, 2019

ischoegl commented Aug 13, 2019 •

edited

Loading

ischoegl commented Jul 11, 2020

Support of pickle serialization of Cython objects in python API #10

Support of pickle serialization of Cython objects in python API #10

Comments

ghost commented Nov 7, 2018

MicaelBoulet commented Feb 11, 2019 • edited Loading

skyreflectedinmirrors commented Feb 11, 2019

speth commented Feb 11, 2019

bryanwweber commented Feb 12, 2019

speth commented Feb 12, 2019

ischoegl commented Aug 13, 2019 • edited Loading

speth commented Aug 13, 2019

ischoegl commented Aug 13, 2019 • edited Loading

ischoegl commented Jul 11, 2020

MicaelBoulet commented Feb 11, 2019 •

edited

Loading

ischoegl commented Aug 13, 2019 •

edited

Loading

ischoegl commented Aug 13, 2019 •

edited

Loading