Size distribution figures of CEAGBiGR lncRNAs. Link to code.

Full size distribution

image

Binned size distribution

image

This is suspicious. The vast majority are greater than 1000nt and I don’t buy it. The FASTA also includes sequences below a 200nt threshold. Seems like this FASTA from NCBI might just be uncharacterized regions as opposed to lncRNAs. This would make sense since most of the RNAcentral hits from this blast run are “uncharacterized regions” from the same annotation associated with this FASTA.