As mentioned in my previous post, the transcription service in Microsoft Stream for outside the US is not fantastic (that’s an understatement).

A few people in the Steam team have been made aware of my post, so I thought I’d give a practical example to compare against.

Using Microsoft Teams, I recorded myself speaking a paragraph first in my native English (Australian) accent, and then again with an attempt at an English (US) accent.

The paragraph spoken was:

“”The quick brown fox jumps over the lazy dog” is an English-language pangram—a sentence that contains all of the letters of the alphabet. It is commonly used for touch-typing practice, testing typewriters and computer keyboards, displaying examples of fonts, and other applications involving text where the use of all letters in the alphabet is desired. Owing to its brevity and coherence, it has become widely known.”

Here’s the video of my recording:

(yes, I acknowledge that I borked a couple of words and my American accent is terrible – so this experiment isn’t perfect)


And here is the transcription:

00:00:07.320 –> 00:00:12.500
Ok, so, I am now recording we’re going to see how well strain does

00:00:12.500 –> 00:00:16.930
a transcription status is when speaking a us english and

00:00:16.930 –> 00:00:21.280
strong so this I’m going to say a quick the

00:00:21.280 –> 00:00:24.630
quick around fox jumps over the lazy dog is an english-language.

00:00:26.010 –> 00:00:30.680
And, rank senses that contains all the letters of the alphabet is commonly

00:00:30.680 –> 00:00:34.430
used the shopping practice fifty typewriters attributable

00:00:34.430 –> 00:00:39.420
to display examples of faults and other applications involving textured uses all that is the

00:00:39.420 –> 00:00:46.380
alphabet is designed audiences for every coherence is become one.

00:00:46.380 –> 00:00:51.680
And, now for the american version with my terrible american access.

00:00:51.680 –> 00:00:54.710
A quick round fox jumps over the lazy dog

00:00:54.710 –> 00:00:59.220
is an english-language new service that contains all letters of the

00:00:59.220 –> 00:01:03.780
alphabet is commonly used for touch typing practice testing typewriters

00:01:03.780 –> 00:01:08.380
and computer keyboard displaying examples of fonts and other applications

00:01:08.380 –> 00:01:12.920
involving text for the use of all levels of the alphabet is desire

00:01:12.920 –> 00:01:19.240
linked with gravity incoherence has become widely known let’s see how well stream works with that.


As you can see the second version where I spoke with a (albeit poor) American accent is considerably better than the first version with my Australian accent.

I look forward to this being fixed, as I tend to record most meetings I have with clients as it saves me having to write copious notes. In person I do this in OneNote as the notes I take relate back to the audio. Unfortunately as the transcription in Stream is so poor I still have to do this for calls via Microsoft Teams or Skype for Business as I can’t rely on being able to search the transcription.

