Turn a routine into a fun adventure with AI: a story about how to create a simple audio-to-text transcription program using ChatGPT during a lunch break.
This is an example of writing programs for non-programmers. I believe that today the ability to hire AI for various tasks, including programming, is the key to personal efficiency. It’s like “confident mastery of Word and Excel” twenty years ago – a competitive advantage in the labor market that has become an integral requirement over time.
I’d like to apologize to the real developers right away: we, the “humanitarians,” have no goal of taking your work away. It’s more to support our own curiosity and get one step closer to technology.
ChatGPT will write all the necessary code for us, and he will also advise us on how to work with this code and other technical issues.
For simplicity and better understanding, we’ll use the metaphor of a coffee shop when working with technical things.
Our coffee shop should have a kitchen, a place with all the tools to create dishes and drinks. In developer terminology, this is a “programming environment”. We will use GitHub Codespaces to write and store code.
Let’s create the app.py file in it, where all the following code will be stored.
First Python code
Let’s ask ChatGPT to write the first code. Let’s specify the programming language – Python – and the name of the function that we’ll be modifying and refining later. A function is a recipe in a coffee shop. Step-by-step instructions on how to do something. The instructions can be complex, with “if so, do this” conditions, but it all starts with a simple one: we ask you to write ChatGPT code in Python that will run immediately and write the phrase “Hello, robosvit” into the terminal.
A terminal is a restaurant administrator. You can ask it to do something, it will fulfill your request, and it can tell you something. The difference from a waiter is that he can’t and doesn’t want to communicate much, he’s busy and speaks a little bit of his own language. Whereas the waiter is a friendly graphical interface to the program. In our case, the waiter could be a web form where we upload a file, but today we are making a simpler first version and working with the Terminal will be enough.
Code execution
To run the code, we can use the terminal, for this we write python space the file name (app.py). If you see the message “Hello, Roboworld” – congratulations, you have executed the first Python code in your life. Just don’t demand a pay raise tomorrow because you’re already programmers. But for you and me, this is an important step in understanding the capabilities of technology.
The first modification of the code – check for an audio file
Our code works, now let’s improve our function. At the moment, it only writes “hello”, but let’s ask it to check for the audio.mp3 file. We give ChatGPT the task the same way we would ask our colleague to write such a program – in simple words.
We get the modified code, transfer it to our app.py file. We delete the previous one, paste the new one, run the file, and the terminal shows the following message: Audio not found.
Indeed, there is no audio.mp3 next to the app.py file. We drag-n-drop the file with the audio, run it, and the file is found, great!
The second modification of the code is to send the audio file for AI transcription
We return to ChatGPT, ask it to modify our function and, if the file is already found, send it to OpenAI for transcription using the API. How do I know about this feature? OpenAI themselves told me about it, but if I hadn’t known, I could have asked ChatGPT for advice on how to implement it.
The API is a metaphor for a coffee shop – it’s a menu of partners. We can send them an order and get it delivered. Order cakes and have the courier bring them. Transfer an audio file and get its transcription in text.
We get the modified code and notice that ChatGPT tells us to add the openai library to our development environment. A library is like pre-prepared ingredients, such as syrups for drinks. We could prepare them ourselves, but why? This library can be installed by inserting a command into the terminal. It executes immediately, and everything works.
I would also like to mention the API Key – you can get it in your ChatGPT account, but you need to deposit $5 to use it. It will most likely be enough for a long time, 20 students spent almost $6 for a month of work through one key 🙂
We paste the code written in GPT and get the first error.
Correction of errors!
A developer’s job often comes down to fixing bugs. We are no exception here 🙂
Our mistake is that ChatGPT wrote the code using outdated knowledge of working with this particular library. So, ChatGPT doesn’t know its own up-to-date documentation or didn’t think to use it.
First attempt to fix the error
Let’s copy the error to ChatGPT and hope that it will fix it. He will definitely try.
We run the new code and get the same error.
Strategies for solving programming errors with ChatGPT:
- Ask them to correct the mistake and let them figure it out themselves. Very often it works;
- Find the documentation of what we are working with on the Internet (in our case, an error in working with the openai library) and let him figure it out;
- Find a working code on the Internet, feed it to ChatGPT, and let it figure out how to implement it in ours;
- Change AI – not only ChatGPT can write code, Claude and Gemini can do it too. Let them fix other people’s mistakes;
- Create a new chat. ChatGPT has a memory within a single chat, and sometimes it happens that it corrects errors in a circle. You can break the circle by saying “forget everything you wrote before”, or it’s easier and more reliable to create a new chat, copy the code and its error there, and ask for a fix.
Most mistakes can be avoided if you try to think of ChatGPT not as a “slave” to whom you dump work and demand quality performance, but as a consultant. Before giving a command on how to modify a function, you can ask what ways it can be implemented. And then choose the one that suits you best from the list.
And if you can’t choose, describe the task in more detail and ask ChatGPT to choose a solution for you and implement it.
We ask another AI, Claude AI, to fix the bug for ChatGPT by feeding it our current version of the code and the bug.
Let’s run Claude’s version of the code.
Our code works!
Once the result is achieved, you can contact ChatGPT again to ask for recommendations on what else you can do with this personalized microservice.
The development opportunities he offered me are interesting:
- If the file is too large (there is an API limit of 25 megabytes), cut it into pieces and transcribe them one by one, and then collect all the transcriptions into one file;
- Add the ability to add videos, not just audio, and automatically convert them to the format required for transcription;
- Store “used” files in a separate folder.
Conclusions
The ability to automate your personal routine is not only a great competitive advantage, but also a way to make your life easier. Within my team at LUN, we have a rule that any repetitive task that can be instructed should take no more than 5 minutes to complete. Let the robots do the rest, and let people use their most valuable expertise to control the quality of routine tasks.
Loading comments …