Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Req: replace ligatures in PDF text #24

Open
glocalglocal opened this issue Apr 27, 2022 · 6 comments
Open

Req: replace ligatures in PDF text #24

glocalglocal opened this issue Apr 27, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@glocalglocal
Copy link

Possibly related to #23, it would be good to replace ligatures with its separate characters when cleaning up text coming from a PDF file. The only time I see the ft, fl and fi ligatures is when I copy from a PDF and I have to replace them by hand. A complete list is here.

@Benature Benature added the enhancement New feature or request label Aug 19, 2022
@Benature
Copy link
Owner

To confirm what your request is: you want to replace ligatures like to ff.

And in the Wikipedia you gave, you want to replace the text in column Ligature to text in column Non-Ligature.

image

Do I understand right?

@glocalglocal
Copy link
Author

glocalglocal commented Aug 23, 2022 via email

Benature added a commit that referenced this issue Aug 24, 2022
@Benature
Copy link
Owner

Benature commented Aug 24, 2022

plz try in v1.8.1. If have problems you can re-open this issue.

@glocalglocal
Copy link
Author

glocalglocal commented May 3, 2023

Unfortunately, the problem is still there. Eg take the sentence below from the wikipedia page I referenced:

Other ligatures with the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

In every set of brackets there is a single character. In plain text these characters should be split. Ligatures are often found in PDFs (well, the ones I use anyway) and they are meant to make certain combinations of letters look good in typography. The problem is that when pasted in plain text, these ligatures are replaced with funny looking symbols if a plain text editor can't cope with unicode, or they will be displayed properly but they won't be recognised by Search, spellchecking, content indexing etc. The latter is the problem I am having with Obsidian.

This plugin is the obvious place for fixing this. If you must be selective, almost all ligatures I see in practice start with f and s. I can't remember when I saw any other ligatures in a PDF last time.

v2.2.1

@Benature
Copy link
Owner

For sentence like

Other ligatures with the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

The result of Replace ligatures is

Other ligatures vvith the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

Is the result not behave as you expect? (may be the w to vv)

@glocalglocal
Copy link
Author

glocalglocal commented Jul 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants