Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of EscapeDebugInner. #138237

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

reitermarkus
Copy link
Contributor

I read the note on EscapeDebugInner and thought I'd give it a try.

@rustbot
Copy link
Collaborator

rustbot commented Mar 8, 2025

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 8, 2025
@reitermarkus reitermarkus force-pushed the remove-escape-debug-inner branch 2 times, most recently from e1c078d to 57c0a80 Compare March 8, 2025 20:35
@reitermarkus reitermarkus force-pushed the remove-escape-debug-inner branch from 57c0a80 to 0854482 Compare March 8, 2025 20:52
Comment on lines -309 to -312
// Note: It’s possible to manually encode the EscapeDebugInner inside of
// EscapeIterInner (e.g. with alive=254..255 indicating that data[0..4] holds
// a char) which would likely result in a more optimised code. For now we use
// the option easier to implement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the comment you reference, so perfrun seems to be in order.

}

/// Marker type to indicate that the character is always escaped,
/// use to optimized the iterator implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used to optimize?

pub(crate) struct AlwaysEscaped;

/// Marker type to indicate that the character may be escaped,
/// use to optimized the iterator implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used to optimize?

#[derive(Clone, Copy)]
#[non_exhaustive]
pub(crate) struct MaybeEscaped;

/// An iterator over an fixed-size array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well fix an to a here

Comment on lines +184 to +185
// Invariant: For non-printable characters, `alive.start <= alive.end <= N`,
// for printable characters, `alive.end >= alive.start >= 128`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems more clear to me if this used <= throughout

Comment on lines +220 to +224
/// The caller must ensure that `self` contains an escape sequence.
#[inline]
pub(crate) fn as_ascii(&self) -> &[ascii::Char] {
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`.
unsafe fn as_ascii(&self) -> &[ascii::Char] {
// SAFETY: `self.data.escaped` contains an escape sequence, as is guaranteed
// by the caller, and `self.alive` is guaranteed to be a valid range for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the top level safety comment should state the more detailed requirement that callers are expected to uphold, and then the inner unsafe use can say that it relies on that.

Tangentially, I'm wondering if escaped would not be better as escape_seq or escape.

Comment on lines +155 to +157
union MaybeEscapedCharacter<const N: usize> {
pub escaped: [ascii::Char; N],
pub unescaped: char,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me like the variants should be called something closer to escape_seq/escape and literal/lit.

Comment on lines +235 to +238
/// The caller must ensure that `self` contains an unescaped `char`.
#[inline]
pub(crate) fn as_str(&self) -> &str {
self.as_ascii().as_str()
unsafe fn as_char(&self) -> char {
// SAFETY: `self.data.unescaped` contains an unescaped `char`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "unescaped char", this might be clearer as "literal char".

Comment on lines +257 to +263
impl<const N: usize> EscapeIterInner<N, AlwaysEscaped> {
pub(crate) fn next(&mut self) -> Option<u8> {
let i = self.alive.next()?;

// SAFETY: `i` is guaranteed to be a valid index for `self.data`.
unsafe { Some(self.data.get_unchecked(usize::from(i)).to_u8()) }
// SAFETY: We just checked that `self` contains an escape sequence, and
// `i` is guaranteed to be a valid index for `self.data.escaped`.
unsafe { Some(self.data.escaped.get_unchecked(usize::from(i)).to_u8()) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the safety comment as written (and checked whether "let i = self.alive.next()?;" does any such checking, but no, it is just a Range of hopefully valid indices for the escape seq.). Shouldn't this refer to the AlwaysEscaped type state instead?

pub(crate) fn advance_by(&mut self, n: usize) -> Result<(), NonZero<usize>> {
self.alive.advance_by(n)
impl<const N: usize> EscapeIterInner<N, MaybeEscaped> {
const CHAR_FLAG: u8 = 0b10000000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either add an underscore in the middle or use hex: 0x80

Comment on lines +278 to +279
// This is the only way to create an `EscapeIterInner` with an unescaped `char`, which
// means the `AlwaysEscaped` marker guarantees that `self` contains an escape sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only way to create an EscapeIterInner<N, MaybeEscaped> with a literal char in its union member, then I don't see how that guarantees anything about the contents of EscapeIterInner<N, AlwaysEscaped>, besides that the union in an EscapeIterInner<N, AlwaysEscaped> contains a [ascii::Char; N] which may or may not be a valid escape sequence...

#[inline]
pub(crate) fn as_ascii(&self) -> &[ascii::Char] {
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`.
unsafe fn as_ascii(&self) -> &[ascii::Char] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more meaningful name seems to me as_escape or as_escape_seq


if self.is_unescaped() {
// SAFETY: We just checked that `self` contains an unescaped `char`.
return Some(unsafe { self.as_char() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just inline as_char?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants