Human language, as well as birdsong, relies on the ability to arrange vocal elements in novel sequences. However, little is known about the ontogenetic origin of this capacity. We tracked the development of vocal combinatorial capacity in three species of vocal learners, combining an experimental approach in zebra finches with an analysis of natural development of vocal transitions in Bengalese finches and pre-lingual human infants and found a common, stepwise pattern of acquiring vocal transitions across species. In our first study, juvenile zebra finches were trained to perform one song and then the training target was altered, prompting the birds to swap syllable order, or insert a new syllable into a string. All birds solved these permutation tasks in a series of steps, gradually approximating the target sequence by acquiring novel pair-wise syllable transitions, sometimes too slowly to fully accomplish the task. Similarly, in the more complex songs of Bengalese finches, branching points and bidirectional transitions in song-syntax were acquired in a stepwise manner, starting from a more restrictive set of vocal transitions. The babbling of pre-lingual human infants revealed a similar developmental pattern: instead of a single developmental shift from reduplicated to variegated babbling (i.e., from repetitive to diverse sequences), we observed multiple shifts, where each novel syllable type slowly acquired a diversity of pair-wise transitions, asynchronously over development. Collectively, these results point to a common generative process that is conserved across species, suggesting that the long-noted gap between perceptual versus motor combinatorial capabilities in human infants1 may arise from the challenges in constructing new pair-wise transitions.