-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Remove junk Characters from Text #23
Comments
Sorry, I don't quite understand what junk characters are. Can you give an example, like input is blablabla and expected output(result) is blablabla. |
Sure please, Following is input text" The quick *~~~Brown f%%ox jumps) right~ over$ the lazy! dog). Expected output (Clean) : The quick brown fox jumps right over the lazy dog. |
Oh no, what happened to the text. 😂 This feature is easy to implement I think, but I wonder in what circumstance that you will encounter such embarrassing text? |
It typically happens when I use OCR for old documents (hand typed documents) or corrupted PDF's or Old dbase/Foxpro Memo files (corrupted). I think its mainly because old documents are yellow and smudgy and are hard to scan where OCR inserts characters on its own :) |
The pictures you provide are illegible even by myself, what is the copy text like? My OCR result is below The preser.t rerort is Oric of 玉 numbr wiich dr:prr4lrex duri上i anJ 1945 fpr the Frreign poncnic 永ministration Lmembevs:st t.
the unitedi St:tes Tarift' Cornissint:. Orine to the desire of thr 我 、” Econoniy Aininistration to obt ir this matcri.1. !.xs prompt1y as pces1.o, the reports yere Yot revievvd by the Trri:: Connissien. A11 st:tenont.s o1 fagt or opinion in tese renorts CI &ttributithlp t; the. irciyilei Etaef nembers tho prraredi th.em. Th.:Y. 1l,洲以:rieinlt:itsed f conf idential u: of Goverrnent xgencivs, ut .•r(〉 noR brin.: Hdpnniis with the consent oi the For(imr. Eocnnic iuiri:.istrtior:. If the copy text is similar to this, I don't think directly deleting junk characters can get the expected text. |
Thank you for responding! :) I am noobie and I posted same issue on Obsidian forum, requesting for help. ( https://forum.obsidian.md/t/replace-all-asterisks-in-a-given-file/35238/25?u=looney.apache ) Solution was not as elegant as yours - But it works for me (at least for now ) I paste my text to be scrubbed in to https://textcleaner.net/ and get back cleaned text as need be. I value your assistance. Thank you! |
The webpage is fantastic, it supports a lot of configs. The only fly in the ointment is that it cannot be used in Obsidian. Though it seems that I can refer to the code of textcleaner, I'm not sure whether there're some issues with copyright. But I don't think adding such many configs in Ob is a good idea, since it occupies a huge space. Such a dilemma 😂 |
True! Unquestionably one of the GOOD plugins. |
Hi,
Your plugin works great!
Any possibility we could have a feature where in it removed junk characters from paragraph/pasted text, just leaving "."/periods where they are.
Thank you!
The text was updated successfully, but these errors were encountered: