Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a flag to normalize the output (remove machine-specific data) #168

Closed
itraviv opened this issue Nov 14, 2019 · 9 comments
Closed

Add a flag to normalize the output (remove machine-specific data) #168

itraviv opened this issue Nov 14, 2019 · 9 comments

Comments

@itraviv
Copy link
Contributor

itraviv commented Nov 14, 2019

We came up with a simple but strong usage – diff between traces.
We use it to find bugs or regressions between git-branches and code sections.
However, we deal with removing all the machine-specific data (time stamps, reprs containing memory addresses, absolute paths, thread data etc.).

I thought about adding a flag called “normalize” that will do it.
for example:
the trace without changes (normalize=False):

Source path:... C:\[my-absolute-path]\PySnooper\tests\test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
10:02:07.262743 call      1555     def stam():
10:02:07.263744 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538710>
10:02:07.263744 line      1557         b = A(22)
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538518>
10:02:07.263744 line      1558         res = a.a + b.a
New var:....... res = 41
10:02:07.263744 line      1559         return res
10:02:07.263744 return    1559         return res
Return value:.. 41

normalized trace (normalize=True):

Source path:... test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
 call      1555     def stam():
 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1557         b = A(22)
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1558         res = a.a + b.a
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... res = 41
 line      1559         return res
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 return    1559         return res
Return value:.. 41
@cool-RR
Copy link
Owner

cool-RR commented Nov 15, 2019

I like this idea. Do you have code that produces the output you included? If so please include it, possibly as a draft PR just for discussion.

I'd like you to list all the different items that should be dealt with, and how they will be dealt with. For example, timestamps are an easy case, because you just omit them. Memory addresses are a little harder, because you need to choose whether you'll switch the __repr__ method or post-process the output. Absolute paths are another thing where you'll need to decide what you're doing.

@alexmojaki
Copy link
Collaborator

Is there any part of this that can't be done by post-processing the output? This is literally what's done in some of the tests.

@cool-RR
Copy link
Owner

cool-RR commented Nov 15, 2019

Post-processing might work, but if you mean that we'll just leave the post-processing to the users, that means that many users are not gonna do it because they won't want to write that post-processing algorithm.

@alexmojaki
Copy link
Collaborator

No but I think it makes more sense as a separate function than a flag.

@cool-RR
Copy link
Owner

cool-RR commented Nov 15, 2019

Advantage of having a separate function: Avoids adding more complexity to our main algorithm.

Disadvantages of having a separate function:

  1. Post-processing logic is more difficult, might not work for all cases and testing might not be able to reveal that.
  2. More difficult for users to discover.

So I still support adding this as a flag-- Assuming that Itamar can give good answers to my question above, and write a good implementation.

@itraviv
Copy link
Contributor Author

itraviv commented Nov 15, 2019

I'll have everything written (answers to your questions and implementation ideas) in 24h

@itraviv
Copy link
Contributor Author

itraviv commented Nov 16, 2019

  • timestamps - if normalize=True use an empty string.
  • source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.
  • repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).
  • thread data - using normalize=True will force thread_info=False
  • max variable length - I think we should force some value to always get the same output.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:

now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
source_path = source_path if not self.normalize else os.path.basename(source_path)
value_repr = utils.normalize_repr(value_repr)

Where normalize_repr is:
4.

def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]

This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase
the runtime.

@cool-RR
Copy link
Owner

cool-RR commented Nov 16, 2019

timestamps - if normalize=True use an empty string.

Agreed.

source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.

Agreed.

repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).

Agreed.

  • thread data - using normalize=True will force thread_info=False

Hmm, instead, when normalize=True and thread_info=True raise a NotImplementedError that explains that using these flags together isn't implemented.

  • max variable length - I think we should force some value to always get the same output.

I disagree, people who are savvy enough to specify a different max_variable_length should know that they need to set the same number on both runs.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:

now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
source_path = source_path if not self.normalize else os.path.basename(source_path)
value_repr = utils.normalize_repr(value_repr)

Where normalize_repr is:
4.

def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]

This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase
the runtime.

Looks good, don't forget adding tests, I'll wait for your PR.

@cool-RR
Copy link
Owner

cool-RR commented Nov 19, 2019

Merged in #171

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants