Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print non-printable characters using caret notation #2443

Merged
merged 18 commits into from
Mar 14, 2023

Conversation

einfachIrgendwer0815
Copy link
Contributor

Implements feature request #2429.

When using --show-all, non-printable characters are printed using the Unicode Control Pictures. This pr adds the flag --caret-notation, which can only be used together with --show-all. When --caret-notation is set, non-printable characters are printed using caret notation instead.

For example, without --caret-notation:

This·is·an·example·text.␊␍

With --caret-notation:

This·is·an·example·text.^J^M

Instead of --caret-notation, the alias --caret can be used too.

When this flag is set, non-printable characters are
printed using caret notation.
Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks sensible on first glance! Before a detailed review, please also add regression tests to integration_tests.rs

'\x1B' => output.push_str("^["),

// everything else
c => output.push_str(&c.escape_unicode().collect::<String>())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the diff and levels of indentation I would prefer if you added if use_caret to each existing character arm instead is of doing the "refactoring" in the current PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't quite understand your comment. So should I refactor it or not?

Copy link
Collaborator

@Enselic Enselic Jan 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being vague. I mean that I would prefer e.g.

'\x1B' => output.push('␛'),

to be changed to

'\x1B' => output.push_str(if use_caret { "^[" } else  { "␛" }),

(+ formatted however cargo fmt prefers it)

Repository owner deleted a comment from leandro86elpiojo Jan 9, 2023
@Enselic
Copy link
Collaborator

Enselic commented Jan 15, 2023

Thank you for the update. I will review as soon as I get some time over.

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! To be cautious, I would like us to not have the --caret alias, at least not in the initial version. Otherwise this is Approved from me.

.arg(
Arg::new("caret-notation")
.long("caret-notation")
.alias("caret")
Copy link
Collaborator

@Enselic Enselic Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we didn't make --caret a short form of --caret-notation. "caret notation" indeed seems to be the official term for this, but I don't think --caret is clear enough on its own. I wouldn't rule out us adding an alias in the future, but since it is much easier to add stuff than to remove stuff, I would prefer if we started out without it.

Enselic
Enselic previously approved these changes Jan 28, 2023
Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sharkdp
Copy link
Owner

sharkdp commented Jan 29, 2023

This looks good - thank you! Just one question regarding the CLI. To avoid introducing (1) a new option and (2) a new option that can only be used together with another option, we could also think about modifying the --show-all option. We could change it such that it takes a value (the "non printable character style"). For now, this could have two options: unicode and caret/caret-notation. The default could be unicode to maintain backwards-compatibility.

Downside: this might break cat compatibility in some edge cases?!

Another downside is that it looks a bit strange: --show-all=caret seems to imply something else (show all carets). If we like the idea above, another possibility would be to introduce a new option (like suggested here), but a bit more future-proof. Something like --show-nonprintables-as=unicode/caret. In this case, --show-all would be an alias for --show-nonprintables=unicode.

What do others think of this option?

@Enselic
Copy link
Collaborator

Enselic commented Jan 29, 2023

I think we should leave --show-all as is for compatibility with cat.

But I like the idea of making the name of the new option more future-proof. Both OpenBSD cat and GNU cat use the terminology nonprinting, so I propose that we rename --caret-notation to --nonprinting-notation=caret, with --nonprinting-notation=unicode being default.

@Enselic Enselic dismissed their stale review January 29, 2023 14:27

Sorry for changing my mind, but I think it would be good to make the new option future proof!

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this will affect the public API of the bat library, it is worth putting some extra effort into designing the public API.

My preferred way to see how the public API is impacted is to run cargo public-api diff latest after installing with cargo install --locked cargo-public-api.

Here is the output I get when diffing the public API of your branch against the public API of the latest bat release on crates.io (in the terminal there is syntax highlighting, but that gets lost on GitHub):

$ cargo public-api diff latest
Resolved `diff latest` to `diff 0.22.1`
 Documenting bat v0.22.1
    Finished dev [unoptimized + debuginfo] target(s) in 0.56s
 Documenting bat v0.22.1 (/home/martin/src/bat)
    Finished dev [unoptimized + debuginfo] target(s) in 0.63s
Removed items from the public API
=================================
(none)

Changed items in the public API
===============================
(none)

Added items to the public API
=============================
+pub bat::config::Config::nonprinting_notation: alloc::string::String
+pub fn bat::style::StyleComponents::clear(&mut self)
+pub fn bat::style::StyleComponents::insert(&mut self, component: bat::style::StyleComponent)

As we can see, bat::config::Config::nonprinting_notation: alloc::string::String has been added to the public API.

I really think we should use an enum instead of a free form String. See e.g. https://docs.rs/bat/latest/bat/enum.WrappingMode.html.

So can you introduce an enum please? Thanks!

@Enselic
Copy link
Collaborator

Enselic commented Feb 11, 2023

Thank you for the reminder. I am unusually busy both privately and professionally, so I do not have time to look againt at this PR right now.

I have not forgotten about it though. I would be surprised if it takes longer than a month before I get some time over. (Or maybe some other maintainer will take a look before that.)

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we already use the term nonprintable in Config::show_nonprintable. So to be consistent with ourselves, I think we should rename NonprintingNotation to NonprintableNotation.

Sorry for changing my mind again :(

Comment on lines 1 to 11
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum NonprintingNotation {
Caret,
Unicode,
}

impl Default for NonprintingNotation {
fn default() -> Self {
NonprintingNotation::Unicode
}
}
Copy link
Collaborator

@Enselic Enselic Feb 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add docs and automatically derive Default please? Proposal:

/// How to print non-printable characters with
/// [crate::config::Config::show_nonprintable]
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
pub enum NonprintingNotation {
    /// Use unicode notation (␇, ␊, ␀, ..)
    #[default]
    Unicode,

    /// Use caret notation (^G, ^J, ^@, ..)
    Caret,
}

@einfachIrgendwer0815
Copy link
Contributor Author

Can you add docs and automatically derive Default please? Proposal:

/// How to print non-printable characters with
/// [crate::config::Config::show_nonprintable]
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
pub enum NonprintingNotation {
    /// Use unicode notation (␇, ␊, ␀, ..)
    #[default]
    Unicode,

    /// Use caret notation (^G, ^J, ^@, ..)
    Caret,
}

As the failed check shows, this is not compatible with the minimum supported Rust version (1.60), as this feature wasn't stabilized until version 1.62

@Enselic Enselic mentioned this pull request Mar 3, 2023
@Enselic
Copy link
Collaborator

Enselic commented Mar 3, 2023

As the failed check shows, this is not compatible with the minimum supported Rust version (1.60), as this feature wasn't stabilized until version 1.62

Oh, I see. Well, that's annoying. Let's bump MSRV then! PR: #2496

Copy link
Collaborator

@Enselic Enselic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the updates, this looks good to me now!

For the record, here is the bat public API diff:

$ cargo public-api --omit blanket-impls --omit auto-trait-impls --omit auto-derived-impls diff origin/master..einfachIrgendwer0815/feature_caret_notation
Removed items from the public API
=================================
(none)

Changed items in the public API
===============================
(none)

Added items to the public API
=============================
+pub bat::config::Config::nonprintable_notation: bat::NonprintableNotation
+pub enum bat::NonprintableNotation
+pub bat::NonprintableNotation::Caret
+pub bat::NonprintableNotation::Unicode

/// How to print non-printable characters with
/// [crate::config::Config::show_nonprintable]
#[derive(Debug, Default, Clone, Copy, PartialEq, Eq)]
pub enum NonprintableNotation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot one thing! We need to make the new enum #[non_exhaustive]. Otherwise adding more notations will require a major semver version bump.

Suggested change
pub enum NonprintableNotation {
#[non_exhaustive]
pub enum NonprintableNotation {

You don't have to add it though. I can add it before I merge this. I plan on merging it early next week, unless something else comes up.

@Enselic Enselic merged commit 8f99a78 into sharkdp:master Mar 14, 2023
@einfachIrgendwer0815 einfachIrgendwer0815 deleted the feature_caret_notation branch February 24, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants