Allow text file preview for all UTF-8 encoded files #977

eulores · 2025-01-26T20:45:15Z

A number of PDF files converted into text (mutool convert...) still have few innocuous control characters (typically 0x0c replacing a page break). But these bug Broot into displaying them as hex dumps.

Please consider this patch below for Broot to be more lenient and accept all UTF-8 encoded files, simply by replacing control characters with space (or an empty string).

Binary files are unaffected by this patch, as they get rejected earlier by not being UTF-8 compliant.

diff --git a/src/syntactic/syntactic_view.rs b/src/syntactic/syntactic_view.rs
             self.total_lines_count += 1;
             let start = offset;
             offset += line.len();
+            line = line.replace(|ch: char| ch.is_control() && !"\t\n\r".contains(ch), " ");
-            for c in line.chars() {
-                if !is_char_printable(c) {
-                    debug!("unprintable char: {:?}", c);
-                    return Err(ProgramError::UnprintableFile);
-                }
-            }

The text was updated successfully, but these errors were encountered:

eulores added the bug Something isn't working label Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow text file preview for all UTF-8 encoded files #977

Allow text file preview for all UTF-8 encoded files #977

eulores commented Jan 26, 2025

Allow text file preview for all UTF-8 encoded files #977

Allow text file preview for all UTF-8 encoded files #977

Comments

eulores commented Jan 26, 2025