Add a flag to normalize the output (remove machine-specific data) #168

itraviv · 2019-11-14T08:16:47Z

We came up with a simple but strong usage – diff between traces.
We use it to find bugs or regressions between git-branches and code sections.
However, we deal with removing all the machine-specific data (time stamps, reprs containing memory addresses, absolute paths, thread data etc.).

I thought about adding a flag called “normalize” that will do it.
for example:
the trace without changes (normalize=False):

Source path:... C:\[my-absolute-path]\PySnooper\tests\test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
10:02:07.262743 call      1555     def stam():
10:02:07.263744 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538710>
10:02:07.263744 line      1557         b = A(22)
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538518>
10:02:07.263744 line      1558         res = a.a + b.a
New var:....... res = 41
10:02:07.263744 line      1559         return res
10:02:07.263744 return    1559         return res
Return value:.. 41

normalized trace (normalize=True):

Source path:... test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
 call      1555     def stam():
 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1557         b = A(22)
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1558         res = a.a + b.a
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... res = 41
 line      1559         return res
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 return    1559         return res
Return value:.. 41

The text was updated successfully, but these errors were encountered:

cool-RR · 2019-11-15T10:01:25Z

I like this idea. Do you have code that produces the output you included? If so please include it, possibly as a draft PR just for discussion.

I'd like you to list all the different items that should be dealt with, and how they will be dealt with. For example, timestamps are an easy case, because you just omit them. Memory addresses are a little harder, because you need to choose whether you'll switch the __repr__ method or post-process the output. Absolute paths are another thing where you'll need to decide what you're doing.

alexmojaki · 2019-11-15T10:05:06Z

Is there any part of this that can't be done by post-processing the output? This is literally what's done in some of the tests.

cool-RR · 2019-11-15T10:07:45Z

Post-processing might work, but if you mean that we'll just leave the post-processing to the users, that means that many users are not gonna do it because they won't want to write that post-processing algorithm.

alexmojaki · 2019-11-15T10:22:56Z

No but I think it makes more sense as a separate function than a flag.

cool-RR · 2019-11-15T12:40:02Z

Advantage of having a separate function: Avoids adding more complexity to our main algorithm.

Disadvantages of having a separate function:

Post-processing logic is more difficult, might not work for all cases and testing might not be able to reveal that.
More difficult for users to discover.

So I still support adding this as a flag-- Assuming that Itamar can give good answers to my question above, and write a good implementation.

itraviv · 2019-11-15T12:46:35Z

I'll have everything written (answers to your questions and implementation ideas) in 24h

itraviv · 2019-11-16T16:51:49Z

timestamps - if normalize=True use an empty string.
source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.
repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).
thread data - using normalize=True will force thread_info=False
max variable length - I think we should force some value to always get the same output.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:

now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''

source_path = source_path if not self.normalize else os.path.basename(source_path)

value_repr = utils.normalize_repr(value_repr)

Where normalize_repr is:
4.

def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]

This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase
the runtime.

cool-RR · 2019-11-16T17:22:12Z

timestamps - if normalize=True use an empty string.

Agreed.

source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.

Agreed.

repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).

Agreed.

thread data - using normalize=True will force thread_info=False

Hmm, instead, when normalize=True and thread_info=True raise a NotImplementedError that explains that using these flags together isn't implemented.

max variable length - I think we should force some value to always get the same output.

I disagree, people who are savvy enough to specify a different max_variable_length should know that they need to set the same number on both runs.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:
now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
source_path = source_path if not self.normalize else os.path.basename(source_path)
value_repr = utils.normalize_repr(value_repr)
Where normalize_repr is:
4.
def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]
This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase
the runtime.

Looks good, don't forget adding tests, I'll wait for your PR.

cool-RR · 2019-11-19T17:08:48Z

Merged in #171

cool-RR closed this as completed Nov 19, 2019

alexmojaki mentioned this issue Oct 9, 2021

Add a comparison mode: disable timestamps, obscure 0xblah repr pointers, & canonicalize pathnames alexmojaki/snoop#46

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a flag to normalize the output (remove machine-specific data) #168

Add a flag to normalize the output (remove machine-specific data) #168

itraviv commented Nov 14, 2019

cool-RR commented Nov 15, 2019

alexmojaki commented Nov 15, 2019

cool-RR commented Nov 15, 2019

alexmojaki commented Nov 15, 2019

cool-RR commented Nov 15, 2019

itraviv commented Nov 15, 2019

itraviv commented Nov 16, 2019

cool-RR commented Nov 16, 2019

cool-RR commented Nov 19, 2019

Add a flag to normalize the output (remove machine-specific data) #168

Add a flag to normalize the output (remove machine-specific data) #168

Comments

itraviv commented Nov 14, 2019

cool-RR commented Nov 15, 2019

alexmojaki commented Nov 15, 2019

cool-RR commented Nov 15, 2019

alexmojaki commented Nov 15, 2019

cool-RR commented Nov 15, 2019

itraviv commented Nov 15, 2019

itraviv commented Nov 16, 2019

cool-RR commented Nov 16, 2019

cool-RR commented Nov 19, 2019