The biological sequence space is the theoretical representation of all possible combinations of residues to form a protein or nucleic acid. Among the many possible combinations we can identify those which already have functions in many modern organisms.
De novo genes arise from transcribed/translated non-genic sequences known as "protogenes". While the number of protogenes appearing over time is large, only a few become truly functional genes. We suspect this to be closely related to the distribution of possible functions along random unexplored sequenced space.
We have devised a high-throughput system to screen millions of random proteins in vivo for their evolutionary potential and biological activity. We are able to recreate and control a phase of the evolution of new genes that has been difficult to approach through comparative genomics alone.
Preliminary results show that a large fraction of random sequences could have activities relevant to the fitness of the host. This enables us to directly explore the functional sequence space to understand a great variety of aspects of molecular innovation.