Another week of murderous rage as we get closer to our impending Summer of Code ending on the 31st July.
It's been a while since I complained about the state of Microsoft SDKs. I'll try with this post not fill the 2 year gap but it's fair to say that I fell into the SDK trap yet again. For those that haven't read my serialised complaint stream over the years you'll begin to understand from now onward how much of a scratched record I am when it comes to Microsoft SDKs. To cut a long story short many of them are fundemantally broken. I decided a few years ago that I would always start with the REST API and build my own tooling. I broke my own rule to keep up my own development cadence for my team's "videocracker" entry and have set fire to another 6 hours since my past post.
It's only my sheer determination to write something complete with my team and my muscle memory that will see me through here.
Let's start at the beginning. If you missed my previous post about the first wasted 14 hours of building dependencies on a Batch pool read that first to get yourself in the mood.
This is part 2. Predictably but tragically, I moved my working code from MacOSX to Ubuntu 18 and Ubuntu 22 on a Batch node.
In order to run a transcription from a wav file, the simplest way to do this asynchonously is through the following code.
def transcribe_audio(self, wav_file):
print("wav file: {}".format(wav_file))
speech_config = speechsdk.SpeechConfig(subscription=self.key, region=self.region) speech_config.speech_recognition_language = "en-GB"
audio_config = speechsdk.audio.AudioConfig(filename=wav_file)
conversation_transcriber = speechsdk.transcription.ConversationTranscriber(
speech_config=speech_config, audio_config=audio_config)
transcribing_stop = False
def stop_cb(evt: speechsdk.SessionEventArgs):
print('CLOSING on {}'.format(evt))
nonlocal transcribing_stop
transcribing_stop = True
def transcribed_cb(evt: speechsdk.SpeechRecognitionEventArgs):
line = '{}: {}'.format(evt.result.speaker_id, evt.result.text)
print('TRANSCRIBED: {}'.format(line))
self.transcribed_lines.append(line) conversation_transcriber.transcribed.connect(transcribed_cb)
conversation_transcriber.session_started.connect(lambda evt: print("SESSION STARTED: {}".format(evt)))
conversation_transcriber.session_stopped.connect(lambda evt: print("SESSION STOPPED: {}".format(evt)))
conversation_transcriber.canceled.connect(lambda evt: print("CANCELED: {}".format(evt)))
conversation_transcriber.session_stopped.connect(stop_cb)
conversation_transcriber.canceled.connect(stop_cb)
conversation_transcriber.start_transcribing_async()
while not transcribing_stop:
time.sleep(.5)
conversation_transcriber.stop_transcribing_async()
Turns out this code doesn't work on Ubuntu and fails silently. Of course it does.
SESSION STARTED: SessionEventArgs(session_id=dc36012432ec4805a331aed11c8f72e7)
CANCELED: ConversationTranscriptionCanceledEventArgs(session_id=dc36012432ec4805a331aed11c8f72e7, result=ConversationTranscriptionResult(result_id=7a57e296583a40d098512c580c156d10, speaker_id=, text=, reason=ResultReason.Canceled))
CLOSING on ConversationTranscriptionCanceledEventArgs(session_id=dc36012432ec4805a331aed11c8f72e7, result=ConversationTranscriptionResult(result_id=7a57e296583a40d098512c580c156d10, speaker_id=, text=, reason=ResultReason.Canceled))
SESSION STOPPED: SessionEventArgs(session_id=dc36012432ec4805a331aed11c8f72e7)
CLOSING on SessionEventArgs(session_id=dc36012432ec4805a331aed11c8f72e7)
This sad little output is all that's present when you try and transcribe. No reason for cancellation and certainly no trascription. On my mac I get every single line of transcribed audio passed to the transcription event. On Ubuntu it just breaks.
My first thought was to look up the SDK online.
Given I'm now incapable of using anything other than an AI I asked ChatGPT and this is the link it gave me.
Yes, you get a 404. It's entirely possible that ChatGPT made this up and also highly probable BUT I didn't think so. I looked wider to see whether I could find any other SDKs for Speech Services. Sure enough, a Go SDK and a Javascript one just what I need. Read through and gave up, too much abstraction. Was chatting to Darsh and it dawned on me I should just write my own SDK and wrap up the API. So I did it.
Kind of looks something like this:
def create_transcription(self):
url = f"https://{self.region}.api.cognitive.microsoft.com/speechtotext/v3.2/transcriptions"
headers = {
"Ocp-Apim-Subscription-Key": self.subscription_key,
"Content-Type": "application/json"
}
data = {
"contentUrls": [self.content_url],
"locale": "en-GB",
"displayName": self.display_name,
"properties": {
"wordLevelTimestampsEnabled": False,
"languageIdentification": {
"candidateLocales": ["en-US", "en-GB"]
},
"diarizationEnabled": True,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked"
}
}
response = requests.post(url, headers=headers, data=json.dumps(data))
Once you've created a transcription you can check whether it's available. It gets written to a file. Everything is synchronous though. Just have to wait until you get a success message from polling the transcription id and then you can use the content link to download the details of the transcription and the JSON metadata in all its glory. A little bit shit compared to the SDK but I can live with that.
I checked the JS SDK which wasn't taken offline like the Python one thinking that it had to do something interesting to get the transcription line by line and use Javascript promises. Turns out is uses websockets. Checked the docs and there we go wtt protocol. Okay so now I'm thinking I can create my own async SDK.
Checked to see whether the Batch node could use a websocket using the following code. Damn, it worked.
import asyncio
import web sockets
async def test_websocket():
uri = "wss://echo.websocket.org"
async with websockets.connect(uri) as websocket:
await websocket.send("Hello WebSocket!")
response = await websocket.recv()
print(f"Received: {response}")
asyncio.get_event_loop().run_until_complete(test_websocket())
Not in my happy place but feel like I'm closing in on something.
Okay, so thinking now I must be able to get a more verbose view. Spent some more time looking through the Javascript SDK and then the samples and low and behold turns out you can get verbose logging through a property set.
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "speech_sdk.log")
As Mark Russinovich says, you can never have too much logging.
I checked the logs after this and boom! something stands out.
[720977]: 3011ms SPX_TRACE_ERROR: exception.cpp:130 About to throw Runtime error: Failed to initialize platform (azure-c-shared). Error: 2176
Okay so I checked the library chain with ldd and it looks like it's dependent on an older version of openssl. A much older version. Of course it is. Right. Tracked down the openssl 1.1 dependency and installed directly from an older package like so.
sudo wget http://security.ubuntu.com/ubuntu/pool/main/o/openssl/libssl1.1_1.1.1f-1ubuntu2.22_amd64.deb && sudo dpkg -i libssl1.1_1.1.1f-1ubuntu2.22_amd64.deb && sudo wget http://security.ubuntu.com/ubuntu/pool/main/o/openssl/libssl-dev_1.1.1f-1ubuntu2.22_amd64.deb && sudo dpkg -i libssl-dev_1.1.1f-1ubuntu2.22_amd64.deb
And voila. Hours of fun and frolics cursing the Speech Services SDK team and instant gratification. Works straight away.
I'll probably carry on and write my own SDK which isn't dependent on libraries from the neolithic era so watch this space and mine or Elastacloud's Github if you want to use something a bit lighter weight and Python native. I have to say I was pretty hardcore with C++, then C# and Java, then Scala. My love for Python hasn't really surfaced. I'm struggling not to loath it currently but this is the new world so I'm going with it.
Happy trails!
The medication works by mimicking the GLP-1 hormone, which helps regulate appetite and food intake, leading to a reduction in overall emsculpt neo houston caloric consumption. Long-term use of semaglutide has also been associated with improvements in metabolic health.
Plumbing issues can escalate quickly, leading to water damage, mold growth, and expensive repairs. Routine smart financial strategies for optimal property performance plumbing maintenance is essential for preventing these problems. Regularly inspecting pipes, checking for leaks.
Making it suitable for various formal and informal occasions one of the central benefits of wearing a do foreigners wear thobes is its promotion of modesty, a value that is significant in many cultures and religions.
It also helps in collagen production, which is vital for maintaining the integrity of joints and connective tissue. Regular use of is shilajit halal can support the regeneration of cartilage, making it a great supplement for improving joint flexibility and mobility.
Over time, this leads to better cost management, allowing businesses to reinvest savings in other areas such as expanding operations or what are the challenges of implementing autonomous ships upgrading equipment. Fuel monitoring helps businesses mitigate the risk of fuel-related downtime.