Routinely and effortlessly we extract a wealth of socially relevant information from voices. This goes far beyond speech comprehension and allows us to recognize a variety of attributes from the speaker such as identity or affect. Here we review evidence showing that this expertise starts developing very early on in infancy and continues to refine throughout childhood and part of adolescence. We also examine the maturation of dedicated voice processing mechanisms in the brain. We highlight similarities with the development of face processing and discuss implications and future directions for research.