My guess: somewhere, from your phone through the phone network to their phone, is a ring buffer for audio. The bit of code that is meant to be writing new audio into it isn’t, so the playback code just loops the same fragment. The duration of the loop is the size of the buffer: this is why when CDs skip you get a fraction repeated over and over: the playback continues playing from the buffer but the reader isn’t putting new audio into the buffer.
I’d say its most likely to be your phone or their phone, and not the network.