Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YTEP-0037: Code Styling #14

Merged
merged 19 commits into from
Jul 16, 2020
Merged

YTEP-0037: Code Styling #14

merged 19 commits into from
Jul 16, 2020

Conversation

neutrinoceros
Copy link
Member

@neutrinoceros neutrinoceros commented May 19, 2020

In this YTEP, I propose we start using black and isort to automate code formatting as part of the linting process with flake8.

Copy link
Member

@matthewturk matthewturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. The only thing that I think would help with this is a brief description of both how (or if) black handles cython code, and how we could manage existing PRs that have not been merged yet. Can we just run black on the PR?

A proof of concept for this is `#2598 <https://github.com/yt-project/yt/pull/2598>`_,
where CI builds are run correctly across all tested python versions (3.6, 3.7, 3.8).

A serious counter-argument to applying black is that it implies messing up with ``git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we have done in the past in faked the authorship of those commits as something like "svn-convert" or something; I forget the specific one we did now. That would at least signal they weren't to be regarded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work for me. I don't want my GitHub profile to show ~20k lines of code "contributed", hiding the real contributions :)

88, as its authors claim it reduces the total number of lines by some 10% (as compared
to enforcing 80).

In first drafting the PR linked above, I chose a line-lenght of 100, so as to minimize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with something in line with either stdlib or pandas. I'd be fine with 79 or 80.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would align with black’s default, which is also compatible with Raymond Hettinger’ style of «90-ish». It makes for much shorter files (according to black).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd work for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with 88.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I double checked and actually pandas uses 88, not 80 as I previously reported.

Co-authored-by: Matthew Turk <[email protected]>
@neutrinoceros
Copy link
Member Author

neutrinoceros commented May 19, 2020

how (or if) black handles cython code

it doesn't. I have no opinion on what should be done with the Cython part of the library : either leave it roam free or impose max line width on a self-discipline basis. I'm adding this to the YTEP.

@neutrinoceros
Copy link
Member Author

neutrinoceros commented May 19, 2020

how we could manage existing PRs that have not been merged yet. Can we just run black on the PR?

One way to do it would be to ask PRs authors to:

  • rebase to the target branch
  • run black
  • force update the PR

though I'm nowhere near confident enough in that it would not create merge conflict, and we definitely don't want to put contributors in that situation.

I think a safer approach would be to provide a script/command line to apply black only to the lines they've updated, though I need to make some research on how to write this script.
@matthewturk

@neutrinoceros
Copy link
Member Author

@matthewturk it took a bit of learning in the subtleties of git merge but I think I came up with a satisfying startegy to handle the transition with existing PRs.

Copy link
Member

@brittonsmith brittonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very good proposal and everything you have down here I think it worth doing. In my opinion, one additional major point of discussion is when this would all be done, e.g., just before/after a yt-4.0 release. The how is also important, specifically whether it would be done all at once or piecemeal and possibly in a group event like a PR triage.


Moreover, ``isort`` is configurable so that it allows for the definition of custom
"sections" within import statements. This can be use to isolate imports from ``unyt``,
which falls somewhere in between the default sections "third party" (external
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I think we can treat unyt as a regular third party module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is up for discussion, of course ! I have no strong opinion myself.

Co-authored-by: Britton Smith <[email protected]>
@neutrinoceros
Copy link
Member Author

neutrinoceros commented May 20, 2020

In my opinion, one additional major point of discussion is when this would all be done, e.g., just before/after a yt-4.0 release.

Absolutely. My main concern with this is to avoid making @matthewturk 's work harder with the yt-4.0 -> master merge.

The how is also important, specifically whether it would be done all at once or piecemeal and possibly in a group event like a PR triage.

The shorter the transition, the easier, so I think that most of the PRs could be merged in a very narrow time window (a day or two), provided the appropriate conditions. However, because we want to ensure that each step passes the tests, which typically takes a least an hour or two per step, I propose that prep steps be done separately, and the big one (blackening) happen on a meeting.

A possible roadmap:

pre meeting

  • settle on a maximal line length and the status of unyt ("second" or third party)
  • merge isort pass on the code base + CI check + doc
  • optional (needs approval) merge Internally import from unyt yt#2597
  • merge (needs tweaking) Add pyproject.toml (YTEP0037 3/6) yt#2598
  • rebase blackening PR on the target branch (yt-4.0 ?) and prepare it with agreed line length
  • provided all CI check pass and the PR is reviewed & approved, this goes to a PR triage meeting

on the meeting

  • merge blackening + manual fixups + CI checks + doc
  • signal to open PR authors that they should apply black (see transitioning strategy)

can be done later

@neutrinoceros
Copy link
Member Author

@brittonsmith do you think it’s appropriate to include the roadmap in the YTEP directly ?

@munkm
Copy link
Member

munkm commented May 20, 2020

I haven't gotten to look too deeply at the YTEP yet, but what do you think about making this ytep even more broad to encompass all of your code styling PRs (like the imports) and other future styling proposals? This would be a nice place for all of them!

@neutrinoceros
Copy link
Member Author

@munkm the import stuff is already in here :)
I would gladly add other stylistic proposals if you guys want to add any. Right now I do not have more to contribute.

@munkm
Copy link
Member

munkm commented May 20, 2020

@munkm the import stuff is already in here :)

Ha! sorry about that then. Don't mind me!

@neutrinoceros
Copy link
Member Author

@munkm actually you got me thinking there's an important point I haven't covered at all: docstrings ! It seems to me that black will never update them (other than unirformizing their delimiters to """), so if we want to apply the line-length limit there, we need an additional validator. Pandas happens to have a script just for that here
https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py

Since they're also using the numpy convention for dosctrings, I'm hoping the script could be fairly easily adapted for yt.

@brittonsmith
Copy link
Member

@neutrinoceros, I'm certainly open to other opinions, but I do think the roadmap has a place in the ytep itself. I think the ytep should more or less contain all the major decisions on which consensus is sought.

@neutrinoceros
Copy link
Member Author

@brittonsmith here you go. I also added the docstring stuff, and I also propose we use a flake8 plugin (https://github.com/PyCQA/flake8-bugbear) to enforce additional rules.

.. _additional rules:
**additional rules & flake8 pluggings**

Since the oldest python version supported (as of yt-4.0) is 3.6, it means we can start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup.py indicates that we still support Python 3.5. I don't have much problem increasing that to 3.6, but I think this warrants a discussion about whether it's better to do this pre or post releasing yt-4.0.

Copy link
Member Author

@neutrinoceros neutrinoceros May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, but there is room for interpretation here. We don't run tests for any version older that the latest 3.6.x, hence my phrasing.
The fstring experiment I'm running in yt-project/yt#2605 has some flaws anyway, since the tool (flynt) is fairly new.
This part of the YTEP can absolutely be postponed.

Copy link
Member

@brittonsmith brittonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this all looks good to me. Thanks for adding the roadmap section.

Independent of this being merged, I'd like to hear @matthewturk's thoughts about whether this should be done before or after a stable release of yt-4.0.

@matthewturk
Copy link
Member

@brittonsmith somehow I missed your question. I think we should do this before a stable 4.0 release. And, now that we've got 4.0 into the default branch, I think it might be time to finish up approving this.

@neutrinoceros
Copy link
Member Author

So in order to resolve this here's what needs to be agreed on (with my own suggestion in parenthesis)

  • what's the maximal line-length we want to enforce ? (88)
  • should unyt be treated as second party or third party by the import sorter ? (second)

the first point is crucial. The second one is not, but we're currently sitting on a 1 VS 1 vote so I'm soliciting additional votes here.

@Xarthisius
Copy link
Member

So in order to resolve this here's what needs to be agreed on (with my own suggestion in parenthesis)

  • what's the maximal line-length we want to enforce ? (88)
  • should unyt be treated as second party or third party by the import sorter ? (second)

the first point is crucial. The second one is not, but we're currently sitting on a 1 VS 1 vote so I'm soliciting additional votes here.

I'd vote for 88 and unyt as 3rd party.

Copy link
Member

@munkm munkm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love the idea of enforcing our style and adding checks to our CI to support them. Thank you for pushing all of these things through!!! 🎉 🥇 🖤 <--- maybe we should use a black heart emoji for the black checks?? 😉

I've left a few questions here I'd like to get some clarification about!

`#2592 <https://github.com/yt-project/yt/pull/2592>`_.

To better highlight the way yt 4.0 depends on ``unyt``, I also propose that, within the
code base, we import directly from ``unyt`` as often as possible, so as to limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so with unyt here do we want to create some specific guidelines beyond "as often as possible"? Where would we recommend importing from yt.units over unyt and vice-versa? Maybe @jzuhone has some opinions on this too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we shouldn't always use yt.units instead? That way we're always consistent with our units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my guidelines would be very simple (following what I've done in yt-project/yt#2597):

  • YTArray and YTQuantity can be imported from yt.units
  • all the rest should be imported from unyt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm not convinced about this yet and I think we should have more discussion about it. Why would importing from unyt directly be better than using yt.units? unyt's default units (mks) are in a different system than yt's (cgs). Won't we increase the probability of getting weird conflicts if we do imports of both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we increase the probability of getting weird conflicts if we do imports of both?

tbh I didn't think that what even possible. My mental model for yt.units is purely a wrapper module that we merely keep for convenience (avoid breaking downstream code if we don't need to). If this view is incorrect in any way (which seems to be the case with cgs VS mks systems), then it's expected that my recommandations don't make much sense. :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to keep yt.units at all? I thought the whole point of splitting out unyt was for yt to fully depend on it. It's fine to keep some sort of thin wrapper to provide backward compatibility, but shouldn't we "just" switch? Even if that means embracing MKS everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we wanted to minimize back compat breaks for users who were already burned by the yt-2 to yt-3 transition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in 4.0, which I believe is the target here, most/all of yt.units now is a thin-layer. And that means, among other things, that they have MKS base:

>>> yt.units.cm.base_value          
0.01

There are a handful of additional things still included, like UnitContainer, display_ytarray, some create_code_unit_system stuff and a default registry that includes "h" but for the most part it now imports directly from unyt.

Copy link
Member

@ngoldbaum ngoldbaum Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So base_value is the value in unyt's internal unit system, which was indeed changed to an MKS unit system. However, the internal unit system is more of an implementation detail, and is not really exposed in the API beyond the base_value attribute Matt pointed out, unit objects each refer to a unit system distinct from the internal unit system that's used to generate default values for the results of operations (e.g. for unit simplication). The units in the yt namespace have a different unit system from the ones in the unyt namespace:

In [3]: yt.units.cm.registry.unit_system
Out[3]:
cgs Unit System
 Base Units:
  length: cm
  mass: g
  time: s
  temperature: K
  angle: rad
  luminous_intensity: cd
  logarithmic: Np
 Other Units:
  energy: erg
  specific_energy: erg/g
  pressure: dyn/cm**2
  force: dyn
  magnetic_field_cgs: G
  charge_cgs: statC
  current_cgs: statA
  power: erg/s

In [5]: unyt.cm.registry.unit_system
Out[5]:
mks Unit System
 Base Units:
  length: m
  mass: kg
  time: s
  temperature: K
  angle: rad
  current_mks: A
  luminous_intensity: cd
  logarithmic: Np
 Other Units:
  energy: J
  specific_energy: J/kg
  pressure: Pa
  force: N
  magnetic_field: T
  charge: C
  frequency: Hz
  power: W
  electric_potential: V
  capacitance: F
  inductance: H
  resistance: Ω
  magnetic_flux: Wb
  luminous_flux: lm

We did it this way to make it easier to keep backward compatibility in yt.

If you're interested in more detail about this in a slightly more abstract context, both the yt namespace and yt Dataset instances implement the pattern described in the unyt docs here: https://unyt.readthedocs.io/en/stable/usage.html#custom-unit-systems

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ngoldbaum for the clarifications !

This means that if internally in yt a unit ultimately came from the unyt namespace, a user might end up with a result that will ultimately be associated with an MKS unit system, perhaps causing confusion or even outright buggy behavior.

I think in 4.0, which I believe is the target here, most/all of yt.units now is a thin-layer. And that means, among other things, that they have MKS base
There are a handful of additional things still included, like UnitContainer, display_ytarray, some create_code_unit_system stuff and a default registry that includes "h" but for the most part it now imports directly from unyt.

In yt-project/yt#2597 I only tried to avoid the additional import layer, I don't think any behaviour (internal or external) should be affected ?

Why do we want to keep yt.units at all? I thought the whole point of splitting out unyt was for yt to fully depend on it. It's fine to keep some sort of thin wrapper to provide backward compatibility, but shouldn't we "just" switch? Even if that means embracing MKS everywhere?

Because we wanted to minimize back compat breaks for users who were already burned by the yt-2 to yt-3 transition.

But is yt.units planned for deprecation at some point ? I don't know if we're the 3->4 transition will be comparable to 2->3 in terms of backward compatibility breaks, so maybe it's a good time to at least add deprecation warnings (or even get rid of the wrapper completely and just keep the yt-specific bits) ?

In any case, I feel like everyone here agrees it'd be best to avoid mixing imports from both unyt and yt.units
Please correct me if I'm mistaken but it seemed to me that the current state of yt.unit was meant to be transitory, and that yt.units would ultimately disappear in favour of unyt.

straight-forward option configuration to validate docstrings are numpy-styled. However,
there is currently a very large debt in errors caught by this tool, and no way to
automatically solve them. However, it could still be added to our linting CI, if check
for *new* errors only, such as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using this to check for new errors introduced in PRs is a good idea.

@munkm
Copy link
Member

munkm commented Jun 26, 2020

Ok, it seems like everything in this YTEP is agreed on except the unyt imports. Is unyt importing even a style change? Maybe we could table that discussion and start an issue in the yt main repo about it and keep it out of this ytep?

I propose that we remove the unyt-related import changes from this YTEP and add them to a new issue in the yt repo, then merge this YTEP once we have a decision on the time and date of the dedicated maintainer meeting (as mentioned in the schedule of this YTEP). Once that time is decided, then we can update this PR with the time of the meeting and send out specific connection details to the mailing list and slack, and merge the YTEP.

Does this seem reasonable?

@matthewturk
Copy link
Member

matthewturk commented Jun 26, 2020 via email

@neutrinoceros
Copy link
Member Author

@munkm I fully agree with this. Keeping the controversial stuff out of the YTEP is an excellent way forward. Should I update this PR accordingly right now ?

@munkm
Copy link
Member

munkm commented Jun 26, 2020

Yeah I think that's a good idea! I'm happy to approve it once they're removed. (I'd rename the unyt-related PRs so they don't have the YTEP in the name too.)

@cphyc cphyc merged commit cbe0299 into yt-project:master Jul 16, 2020
@neutrinoceros neutrinoceros deleted the ytep_37 branch July 21, 2020 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants