-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of EscapeDebugInner
.
#138237
base: master
Are you sure you want to change the base?
Get rid of EscapeDebugInner
.
#138237
Conversation
e1c078d
to
57c0a80
Compare
57c0a80
to
0854482
Compare
// Note: It’s possible to manually encode the EscapeDebugInner inside of | ||
// EscapeIterInner (e.g. with alive=254..255 indicating that data[0..4] holds | ||
// a char) which would likely result in a more optimised code. For now we use | ||
// the option easier to implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the comment you reference, so perfrun seems to be in order.
} | ||
|
||
/// Marker type to indicate that the character is always escaped, | ||
/// use to optimized the iterator implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used to optimize?
pub(crate) struct AlwaysEscaped; | ||
|
||
/// Marker type to indicate that the character may be escaped, | ||
/// use to optimized the iterator implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used to optimize?
#[derive(Clone, Copy)] | ||
#[non_exhaustive] | ||
pub(crate) struct MaybeEscaped; | ||
|
||
/// An iterator over an fixed-size array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well fix an
to a
here
// Invariant: For non-printable characters, `alive.start <= alive.end <= N`, | ||
// for printable characters, `alive.end >= alive.start >= 128`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems more clear to me if this used <=
throughout
/// The caller must ensure that `self` contains an escape sequence. | ||
#[inline] | ||
pub(crate) fn as_ascii(&self) -> &[ascii::Char] { | ||
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`. | ||
unsafe fn as_ascii(&self) -> &[ascii::Char] { | ||
// SAFETY: `self.data.escaped` contains an escape sequence, as is guaranteed | ||
// by the caller, and `self.alive` is guaranteed to be a valid range for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the top level safety comment should state the more detailed requirement that callers are expected to uphold, and then the inner unsafe use can say that it relies on that.
Tangentially, I'm wondering if escaped
would not be better as escape_seq
or escape
.
union MaybeEscapedCharacter<const N: usize> { | ||
pub escaped: [ascii::Char; N], | ||
pub unescaped: char, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me like the variants should be called something closer to escape_seq
/escape
and literal
/lit
.
/// The caller must ensure that `self` contains an unescaped `char`. | ||
#[inline] | ||
pub(crate) fn as_str(&self) -> &str { | ||
self.as_ascii().as_str() | ||
unsafe fn as_char(&self) -> char { | ||
// SAFETY: `self.data.unescaped` contains an unescaped `char`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "unescaped char", this might be clearer as "literal char".
impl<const N: usize> EscapeIterInner<N, AlwaysEscaped> { | ||
pub(crate) fn next(&mut self) -> Option<u8> { | ||
let i = self.alive.next()?; | ||
|
||
// SAFETY: `i` is guaranteed to be a valid index for `self.data`. | ||
unsafe { Some(self.data.get_unchecked(usize::from(i)).to_u8()) } | ||
// SAFETY: We just checked that `self` contains an escape sequence, and | ||
// `i` is guaranteed to be a valid index for `self.data.escaped`. | ||
unsafe { Some(self.data.escaped.get_unchecked(usize::from(i)).to_u8()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the safety comment as written (and checked whether "let i = self.alive.next()?;" does any such checking, but no, it is just a Range of hopefully valid indices for the escape seq.). Shouldn't this refer to the AlwaysEscaped type state instead?
pub(crate) fn advance_by(&mut self, n: usize) -> Result<(), NonZero<usize>> { | ||
self.alive.advance_by(n) | ||
impl<const N: usize> EscapeIterInner<N, MaybeEscaped> { | ||
const CHAR_FLAG: u8 = 0b10000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either add an underscore in the middle or use hex: 0x80
// This is the only way to create an `EscapeIterInner` with an unescaped `char`, which | ||
// means the `AlwaysEscaped` marker guarantees that `self` contains an escape sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only way to create an EscapeIterInner<N, MaybeEscaped> with a literal char in its union member, then I don't see how that guarantees anything about the contents of EscapeIterInner<N, AlwaysEscaped>, besides that the union in an EscapeIterInner<N, AlwaysEscaped> contains a [ascii::Char; N] which may or may not be a valid escape sequence...
#[inline] | ||
pub(crate) fn as_ascii(&self) -> &[ascii::Char] { | ||
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`. | ||
unsafe fn as_ascii(&self) -> &[ascii::Char] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more meaningful name seems to me as_escape
or as_escape_seq
|
||
if self.is_unescaped() { | ||
// SAFETY: We just checked that `self` contains an unescaped `char`. | ||
return Some(unsafe { self.as_char() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just inline as_char?
I read the note on
EscapeDebugInner
and thought I'd give it a try.