<

AI-Powered Baybayin Translator Being Developed by UP Mathematicians

Words by Eunice Jean Patron, UPD-CS SciComm

Filipino mathematicians have just invented a computerized method for converting entire paragraphs and even full documents written in the ancient Filipino Baybayin writing system into text that even non-native readers can easily understand. And theyโ€™re now hard at work developing a full two-way translator.

By combining mathematics and technology, scientists from the University of the Philippines โ€“ Diliman College of Science Institute of Mathematics (UPD-CS IM) have made what is likely the worldโ€™s first paragraph-level optical character recognition (OCR) system that can distinguish between entire blocks of Baybayin and Latin characters in a text image.

Baybayin OCR system 2

Thousands of images, months of hard work

In their paper entitled โ€œBlock-level Optical Character Recognition System for Automatic Transliterations of Baybayin Texts Using Support Vector Machine,โ€ masters student Rodney Pino and associate professors Dr. Renier Mendoza and Dr. Rachelle Sambayan developed an algorithm to convert a photograph of a set of text into binary data, which is then run through a support vector machine (SVM) character classifier to automatically determine whether the characters are Baybayin or Latin.

โ€œSVM is a machine learning algorithm used to solve regression or classification problems,โ€ Pino explained. โ€œWe have a dataset for Baybayin charactersโ€”letโ€™s say character A and then character BA. SVM uses techniques or mathematical methods that can separate the two datasets to determine characters BA and A.โ€

It took the group more than three months to collect over a thousand images for each Baybayin character, gathering a total of 110 paragraphs from different websites that have either hand- or typewritten Baybayin, Latin, or Baybayin and Latin writing. โ€œAdding more character images improves the recognition rate of SVM,โ€ Pino explained.

Developing a smart, two-way translator

Currently, the OCR system can spell out the Latin equivalent of the Baybayin characters on a page, thus producing a transliterated version of the text. But the researchers are looking to enable it to do so much more.

The mathematicians also plan to make the OCR system more aware of the context of Baybayin words and phrases, possibly paving the way for a full-fledged translator. They are also trying to make the system work both ways, with the ability to convert Latin words with foreign sounds into Baybayin.

โ€œWeโ€™re trying to refine the software we developed to make it easier for future users to navigate it. We also dream of creating a mobile application that automatically and accurately translates Baybayin characters just by hovering over the phone,โ€ Dr. Mendoza said.

However, there are some kinks to smoothen out: Dr. Mendoza said that it was challenging to get the OCR system to translate Baybayin words and sentences accurately. โ€œFor now the system canโ€™t distinguish between some Baybayin characters that are similar in writing, such as E and I, and O and U. We also have a lot of words that have different Latin equivalents,โ€ he expounded. โ€œThe algorithm we used shows all possible translations of the Baybayin words.โ€

Baybayin OCR system 1

Preserving Filipino writing systems

Although still scant, interest in and research on Baybayin is slowly increasing, making the mathematicians hopeful that more Filipinos will become interested in protecting Baybayin through research. The team published their data to encourage more researchers to conduct studies on Baybayin and OCR. โ€œWe cleaned the data in such a way that researchers could use it in analyzing Baybayin through other algorithms,โ€ Dr. Mendoza shared. โ€œWe made the data readily available for use, so researchers wouldnโ€™t go through the difficulty we experienced in gathering data.โ€

Philippine traditional writing systems, such as Baybayin, are representations of Filipino tradition and national identity. As such, the countryโ€™s government officials created the โ€œPhilippine Indigenous and Traditional Writing Systems Act,โ€ which seeks to promote, protect, and preserve Baybayin and other traditional writing systems.

The proposed law urges using Baybayin as a tool for cultural development and safeguarding, therefore encouraging organizations and institutions to spearhead activities and projects that promote awareness of these traditional writing systems.

According to the scientists, Baybayin is living proof that we Filipinos have our own technically-sophisticated traditions. While they arenโ€™t putting forward making Baybayin the Philippinesโ€™ primary writing system, the group believes that conducting more research on Baybayin will help preserve this heritage. โ€œThis can be forgotten,โ€ Dr. Sambayan said. โ€œItโ€™s important to have a record of each Baybayin characterโ€”even having digitized ones.โ€

Dr. Sambayan expressed concern that the number of Filipinos who can read and write Baybayin is decreasing, adding to the importance of identifying and translating Baybayin characters into Latin. โ€œWeโ€™re hoping that through this OCR system, we could preserve and pass on the knowledge of understanding Baybayin to future Filipino generations,โ€ she said.

Baybayin and other traditional writing systems are a part of the Philippinesโ€™ rich history. Several old Filipino documents are in Baybayinโ€”documents that can uncover more information about Filipino culture. The scientists are encouraging more Filipinos to join them in cultivating the body of knowledge the country has on Baybayin. โ€œKapag walang gagawa nito, sinong gagawa? Even though its implication already has a bit of a niche, I think this is still a vital research venture,โ€ Dr. Mendoza said.

For interview requests and other concerns, please contact media@science.upd.edu.ph.

Sources:

Pino, R., Mendoza, R., & Sambayan, R. (2022). Block-Level Optical Character Recognition System for Automatic Transliteration of Baybayin Texts using Support Vector Machine. Philippine Journal of Science, 151(1), 303-315.

Philippine Indigenous and Traditional Writing Systems Act, S. 1680, 19th Cong. (2022).