According to ARS Technica, the speech can match the timbre of the voice and the emotional tone of the speaker. In addition, it can also match the room's acoustics. Microsoft calls VALL-E a "neural ...