The One UI interface uses text and image recognition technology developed by Samsung R&D Institute Ukraine. We talked to the developers who worked on it

One-UI-OCR-Interview

Taras Mishchenko Editor-in-Chief of Mezha.Media. Taras has more than 15 years of experience in IT journalism, writes about new technologies and gadgets.

7 September 2023, 11:24 AM

Users of Samsung smartphones and tablets of the last few generations, and not only flagships, may have noticed a very useful feature. It allows selecting and copying various objects in an image, as well as extracting text not only from images or PDFs, but even from videos. This feature works in Samsung's proprietary One UI interface and was developed at the Samsung R&D Institute Ukraine research center in Kyiv. We talked to the developers who worked on the development of this technology and asked them about its creation.

How long have you been working on this text recognition technology? How many people were involved in the development?

We came up with the idea of recognizing handwritten text from a photo back in 2016 when our team was developing technology to recognize handwritten text obtained with the S Pen, also called digital ink.

At that time, we were organizing the collection of handwriting text, in more than 50 languages, entered with the S Pen, to train working neural network models for recognizing digital ink. The functionality we created for S Pen is already well known and loved by Samsung users, and currently supports handwriting recognition in more than 100 languages.

But in order to realize the idea of recognizing handwritten text from photos, which we came up with during our research work, we decided to experiment: we converted the handwritten dots obtained from the S Pen into images, created an innovative neural network architecture and trained the model to recognize text from photos.

Our experiment was successful, we patented the technology, and our results and scientific articles were published at several well-known international conferences, such as UIST-2019, ICDAR-2021, ICASSP-2023. Samsung R&D Institute Ukraine in 2018-2019, long before the technology of handwriting recognition from photos appeared among competitors, developed a product called CalliScan (Calligraphic Scanning), which was available to Samsung users through the Galaxy Store.

At the prototype stage, a very small team of seven engineers worked on the technology for recognizing handwriting from photos and supported only two recognition languages. Today, our product is already part of One UI and a fairly large team is working on its improvement.

The technology is implemented in many applications of a mobile device: camera, gallery, browser, video player, smart highlighting. At the moment, in addition to Ukrainian, we support 9 more languages (English, French, Italian, German, Spanish, Portuguese, Korean, Japanese, Chinese), and we are currently working on expanding this list.

Was the technology developed entirely in Ukrainian R&D or did developers from other countries participate?

The main part of the technology was developed and continues to be developed in Ukraine. The development and training of neural network models for recognition, building language models for all languages, optimization for launching on a mobile device, and the commercialization process itself were all done by the Ukrainian team.

But it is worth noting the help of our colleagues from the Suwon office at the headquarters in Korea, who were involved in the integration of OCR (Optical Character Recognition) technology into the final product.

What were the main difficulties and challenges in developing a text recognition system for Samsung mobile devices?

The main requirement that Samsung software developers always adhere to was the highest quality of recognition, and we had to consider the fact that the user does not specify the language in advance. That is, the text on the image can be in any language, so our algorithm must first automatically detect the language of the text, and then recognize the text itself. In addition to automatic language detection, we also took into account the angle of the text, the division of the text into lines, different types of lighting, and, of course, all this had to work on-device within the available resources on a mobile device without Internet access.

How difficult was it to use neural networks, and was the mobile processor powerful enough to implement everything you planned?

Several neural networks are involved in the entire text recognition process. All of them should work as quickly as possible, but with minimal impact on power consumption. It's important to note that our team has extensive experience in developing on-device solutions using neural networks, and many experiments were conducted during the development process to find a balance between quality and speed. As a result, we have a text-to-picture technology that works fully on a mobile device and does not require an Internet connection, which helps protect user data from leakage.

Did the transition to the Qualcomm Snapdragon platform help?

The Snapdragon 8 Gen 2 for Galaxy did indeed allow us to get the recognition result much faster, but we designed our solution in such a way that it would be available on all Samsung flagships, as well as on devices from the A, M, and S Lite series.

Were there any peculiarities in the implementation of text recognition on video and from a smartphone camera?

Each application has its peculiarities. For example, video is more likely to produce blurred text, while the algorithm has to recognize text with an inverted or high angle, including perspective, as well as text taken from a computer monitor. In the keyboard, camera recognition works in real time, and we've added the ability to limit the algorithm's field of view to make it work even faster. We took all these details into account to get the best possible recognition result for our users.

The Samsung R&D Institute Ukraine team, with the support of the head office in Korea, plans to continue to improve the quality of handwriting recognition and provide even more users with the opportunity to benefit from intelligent input technologies in their native language.

Advert: