ALIGNMENT TO REF LONGER THAN 2000bp #17

TDDB-limagrain · 2022-02-23T09:23:44Z

hi @ksahlin ,
sorry for coming back with a new possible issue!
I used the very latest patch you've made to align a full sample against the public genome we discussed later.
The output was correctly piped to samtools for further duplicate marking and sorting, so thank you very much for allowing the redirection !

Total mapping sites tried: 260811459
Total calls to ssw: 83238353
Calls to ksw (rescue mode): 24136154
Did not fit strobe start site: 452500
Tried rescue: 8059431
Total time mapping: 4317.84 s.
Total time reading read-file(s): 178.746 s.
Total time creating strobemers: 188.722 s.
Total time finding NAMs (non-rescue mode): 618.305 s.
Total time finding NAMs (rescue mode): 341.146 s.
Total time sorting NAMs (candidate sites): 75.2316 s.
Total time reverse compl seq: 0 s.
Total time base level alignment (ssw): 2239.28 s.
Total time writing alignment to files: 525.32 s.

Everything goes well, but looking at the standard output, I obtained the following message several times:
ALIGNMENT TO REF LONGER THAN 2000bp - REPORT TO DEVELOPER.

For example:
ALIGNMENT TO REF LONGER THAN 2000bp - REPORT TO DEVELOPER. Happened for read: TGCCAATCCTGTGGGACGAATATGGGGCTCAAACCTCAGCCAAAACTCAATAGACACAGTGACGAATGTCTGGTAAAAAATTCAGACCAAAATACCAAAGGAGTAAGGCGTAGCAAGTCCCAGACCGAGAGTGAATAAAACCGGTTTTCCG ref len:53438756

Do you have any idea about such a behavior?
I ran strobealign with default parameters.

The text was updated successfully, but these errors were encountered:

ksahlin · 2022-02-23T10:42:17Z

Hehe, I'm the one who should apologize for user issues. I highly appreciate the feedback!

Ok, at least strobealign ran to completion and the alignment results can be analyzed downstream. I am eager to hear about the SNV and indel analysis.

I will try to fix this bug. It is known, hence the message to stderr. It is nothing crucial (I believe), but obviously still desirable to fix. I may already have such a case in my data. I will look if I identify this in my data so that I can fix it. If not, I may come back to you about obtaining at least one of the read pairs that caused this.

TDDB-limagrain · 2022-02-23T10:45:40Z

Happy to help!
I just wanted to be sure that this message would not affect the downstream analysis. So I will analyze other samples and go to the variant analysis benchmark!

ksahlin · 2022-04-01T13:37:14Z

Hi @TDDB-limagrain,

I have just released v0.7, which fixes this bug. The new release implements much more efficient multithreading impacting speed. I remember you did a comparison of runtime here. Not sure which version you tried, but at least compared to v0.6-0.6.1, the timing should be improved if you used several cores (the more cores the more the speedup). Happy for any feedback.

TDDB-limagrain · 2022-04-01T14:23:11Z

Hi Kristoffer, Thank you very much ! I’ve just compiled the v0.7 and started some mapping with; I am still really interested as a faster alternative to minimap2! My plan is to then used it together with deepvariant to speed up the re-processing of large public resequencing data. I’ll keep you in touch! Best regards, Thomas De : Kristoffer ***@***.***> Envoyé : vendredi 1 avril 2022 15:37 À : ksahlin/StrobeAlign ***@***.***> Cc : DUGE DE BERNONVILLE, Thomas ***@***.***>; Mention ***@***.***> Objet : Re: [ksahlin/StrobeAlign] ALIGNMENT TO REF LONGER THAN 2000bp (Issue #17) Hi @TDDB-limagrain<https://github.com/TDDB-limagrain>, I have just released v0.7, which fixes this bug. The new release implements much more efficient multithreading impacting speed. I remember you did a comparison of runtime here<#9 (comment)>. Not sure which version you tried, but at least compared to v0.6-0.6.1, the timing should be improved if you used several cores (the more cores the more the speedup). Happy for any feedback. — Reply to this email directly, view it on GitHub<#17 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXSO73QIFTH4RQVNTCBM3C3VC33ZLANCNFSM5PDYR2JA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ksahlin · 2022-04-01T14:27:58Z

Excellent!

Yes, let me know how it goes and what your runtime and SV-calling experiments say.

TDDB-limagrain · 2022-04-05T09:01:22Z

Hi @ksahlin,
v0.7 seems to be indeed faster than the previous versions (avx2 compiled)
cputime previous version: 16262.03
cputime v0.7: 13876.03
with a very similar mapping rate.
Looking forward for the impact on downstream variant calling!

ksahlin · 2022-04-05T09:06:25Z

Cool B-) how many cores did you use?

TDDB-limagrain · 2022-04-05T09:08:16Z

I used -t 4 !

ksahlin · 2022-04-05T09:11:17Z

Oh, yeah, I should have realized that from the command line in your other answer :)

I'm positively surprised that it has this relatively "big" speedup only for 4 cores - its great though. The relative speedup will be even bigger with more cores.

TDDB-limagrain · 2022-04-05T09:17:12Z

My job was submitted on the same node, so indeed there was a substantial improvement! The output was piped to samtools command with 2 additional extra-cores in the same way. I usually perform several alignments in parallel, that’s why I didn’t increase the number of threads. Processors are: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz If you are interested, I can make a test with 8, 16 and 32 cores for one sample De : Kristoffer ***@***.***> Envoyé : mardi 5 avril 2022 11:11 À : ksahlin/StrobeAlign ***@***.***> Cc : DUGE DE BERNONVILLE, Thomas ***@***.***>; Mention ***@***.***> Objet : Re: [ksahlin/StrobeAlign] ALIGNMENT TO REF LONGER THAN 2000bp (Issue #17) Oh, yeah, I should have realized that from the command line in your other answer :) I'm positively surprised that it has this relatively "big" speedup only for 4 cores - its great though. The relative speedup will be even bigger with more cores. — Reply to this email directly, view it on GitHub<#17 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AXSO73SGPGRDFBWLP52JXJLVDP7T7ANCNFSM5PDYR2JA>. You are receiving this because you were mentioned.Message ID: ***@***.***>

ksahlin · 2022-04-05T10:13:05Z

Thanks for the offer. It's not in priority, but if you have time/interested, sure!

My biggest interest lies in that the variant calls are good. Next up on the schedule is to output supplementary alignments.

TDDB-limagrain · 2022-04-05T20:15:52Z

Hi,
I give a try with -t 16, ending up with a cputime of 14260.43.
runtime -t4: 3592s
runtime -t16: 1230s
almost a linear gain!

I just launched some job using synthetic reads simulated with the rtg tools suite. That will be easier for me to share some results rather than using private resequencing runs. Will keep you in touch!

ksahlin closed this as completed Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALIGNMENT TO REF LONGER THAN 2000bp #17

ALIGNMENT TO REF LONGER THAN 2000bp #17

TDDB-limagrain commented Feb 23, 2022 •

edited

Loading

ksahlin commented Feb 23, 2022 •

edited

Loading

TDDB-limagrain commented Feb 23, 2022

ksahlin commented Apr 1, 2022

TDDB-limagrain commented Apr 1, 2022 via email

ksahlin commented Apr 1, 2022

TDDB-limagrain commented Apr 5, 2022

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022 via email

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022

ALIGNMENT TO REF LONGER THAN 2000bp #17

ALIGNMENT TO REF LONGER THAN 2000bp #17

Comments

TDDB-limagrain commented Feb 23, 2022 • edited Loading

ksahlin commented Feb 23, 2022 • edited Loading

TDDB-limagrain commented Feb 23, 2022

ksahlin commented Apr 1, 2022

TDDB-limagrain commented Apr 1, 2022 via email

ksahlin commented Apr 1, 2022

TDDB-limagrain commented Apr 5, 2022

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022 via email

ksahlin commented Apr 5, 2022

TDDB-limagrain commented Apr 5, 2022

TDDB-limagrain commented Feb 23, 2022 •

edited

Loading

ksahlin commented Feb 23, 2022 •

edited

Loading