-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
markdown to docx unusually slow #2356
Comments
Very strange! It's just a giant code block. Looking at the code for Text.Pandoc.Writers.Docx, I can't see any obvious reason why there'd be a performance problem here, so this is a puzzle that needs looking into. |
Some experiments: I changed the file from a fenced code block to an indented one, to allow testing arbitrary numbers of lines:
I also tried a version where the code block has just one enormously long line (converting newlines into spaces), and that also takes forever. |
Thanks for looking into this, John. I should've mentioned that it began as an issue over at the rstudio/rmarkdown repository, rstudio/rmarkdown#490 |
Further experiment, breaking it down to its core (now using code spans and just a string of
|
Also, Confirmation:
also takes forever. This should just be a single long paragraph with regular text. |
I found the cause: commit f3aa03e which strips out invalid characters. I think this can easily be fixed by doing the stripping in the XML file rather than the Pandoc structure ( |
@mpickering, I solved this by doing the stripping in |
Sorry! Didn't realise the files got so large. |
This file prints the numbers from 1 to 10000. It takes seconds to render to HTML or pdf:
But it takes eight minutes to render to word:
I notice this difference often when I use pandoc through R Markdown to report on data. If I try to do something more modest (like print the numbers from 1 to 1000) I do not notice much of a difference.
The text was updated successfully, but these errors were encountered: