-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--interleaved option and SAM flag for paired reads #273
Comments
Thanks for reporting! Thanks also for the test case, which allowed me to reproduce this problem. The issue appears to be the following: strobealign’s The problem now is that the R1 reads in your test dataset have a This is obfuscated a bit by a different part of the code (run later), where strobealign strips those same suffixes from the read names so that they do not appear in the SAM output. Maybe we can fix this by stripping the suffixes earlier; I will look into this. Alternatively, we may want |
Thanks for the quick reply! That makes sense. Either of those two options sound good! Would definitely be great if And pointing me to the suffix issue was actually all I needed for now, as I can just pipe the input through a sed one-liner. Just tried that and works like a charm (and everything is still way faster than bwa). Thanks! |
It seems that I can look into that code |
strobealign trims the |
This can be done when comparing the names for interleaved matching: compared them in a way that ignores the suffix I'm on my phone now, but I can implement this later |
I am happy to reconsider this if if would offer a more convenient solution. I don't think paf is commonly used for short reads. For long reads we won't typically have the paired format. Edit: Benchmarking mapping-only is useful for development of the seeding scoring etc. |
Removing the suffixes when doing the name comparison during de-interleaving is totally doable, it is just slightly inelegant that this will then be done again when writing the SAM output. But that is the only downside; I cannot imagine this resulting in any type of measurable slowdown. |
A read named `read/1` followed by `read/2` should form a pair in the interleaved format. This matches the behaviour of `bwa mem -p` as well closes ksahlin#273
I might be missing something, but it seems that strobealign (v0.8.0) does not properly specify SAM flag values for paired reads
When I run it on separate R1 + R2 fastqs, reads in the bam have flags like 77 and 141 (paired unmapped reads; indicating first and second read in pair), but when I run it with '--interleaved' on, otherwise identical, interleaved fastq input, the flags do not contain any information on read pairs anymore (e.g., simply 4 for unmapped reads). Note that strobealign correctly indicates "paired-end mode" in both cases. The lack of read pair information cause issues for downstream applications for me.
Since everything works just fine –great actually! strobealign is amazingly fast; thanks for developing such a great tool– using separate fastqs, I hope it's not too hard to fix this for the --interleaved case. I'd really prefer interleaved mode over separate fastqs since it allows me to efficiently pipe input strobealign
Files and commands for a reproducible example below:
Separate R1 and R2 fastqs as input
strobealign -t 1 reference.fasta R1.fastq R2.fastq | head
Interleaved mode
strobealign -t 1 reference.fasta interleaved.fastq --interleaved | head
strobealign_interleaved_test.tar.gz
The text was updated successfully, but these errors were encountered: