Deep learning creates Rosetta Stone for Medieval Japanese script
A wealth of historical information, books, and culturally important scripts lie painfully unread by contemporary scholars because of the difficulty of transliterating them. With the use of deep learning techniques and tools, Rani Horev and his colleagues are working toward preserving and opening up these important documents for new scholarly analysis.
Deep learning is a form of artificial intelligence that uses data representations to learn by example. It has become the dominant approach to machine learning because it outperforms other methods and is useful in many problem domains.
Rani Horev, co-founder and machine learning researcher at Lyrn.ai, has used deep learning to develop tools to classify characters of a little known, but historically important cursive script called Kuzushiji. During a recent meeting if the San Francisco Deep Learning Meetup, Horev described a Rosetta Stone-like deep learning system for transliterating the medieval script. His entire presentation, Image Classification with Deep Learning: From Minst to Kuzushiji through ResNet, is available on YouTube.
Recognizing Characters and Digits
Image classification is a well-studied problem with an array of algorithms that reach human-level performance, explained Horev. Perhaps the best-known data set used to evaluate image classification models is the MINST data set, which includes 70,000 examples of written digits that have been normalized and centered. Building a model to classify MINST digits is now so well understood that it has become the equivalent of a “Hello World” example of fundamental deep learning development skills.
Early attempts at classifying MINST digits used K-Nearest Neighbor (KNN) and Support Vector Machines (SVM) algorithms, but the highest accuracy models used deep learning and convolutional neural networks (CNN,) according to Horev.
Convolution is an operation in which small parts of the image are multiplied by learned values to create a representation of the image that is key to CNN’s higher accuracy. CNNs typically have multiple sets of layers of convolutions. The output of one convolutional layer becomes the input to the next convolutional layer until the last part of the network. Finally, a single-layer fully connected network takes the output of the CNN layers and produces probabilities of the original input being a particular category, such as a digit in the case of MINST.
CNNs perform well in many domains but as the networks grow in size, says Horev, they increasingly suffer from the problem of vanishing gradients. This is the failure to train early layers because the back-propagation algorithm, which is used to learn the weights in the network, does not pass information back to the early layers. A simple but highly effective way to avoid the impact of vanishing gradients is to pass the original input, along with values produced by convolutions, through the network. This eliminates the loss of information that hinders other deep learning models.
This approach is used in deep Residual Networks, also known as ResNET. ResNet and variations on ResNet achieve the best available performance on challenging and socially important problems, ranging from historical document analysis to power generation.
Creating a Rosetta Stone for Japanese Scripts
Horev describes three kinds of Japanese written scripts, including Kanji, Hiragana, and Kuzushiji. Kanj is a logographic system in which a symbol represents a word or phrase. There are over 10,00 Kanji symbols. Hiragana is a syllabary system which is roughly similar to phonetic letter systems. Hiragana has 49 symbols. The third script, Kuzushiji, is a cursive script with dozens of variations. Few people can read this historically important script making it difficult to transliterate the 3 million Kuzushiji documents and books that still exist.
- Kuzishji-MINST, which is a set of 10 hiragana characters with 70,000 examples.
- Kuzishji-49, which includes all 49 letters of Hiragana
- Kuzishji-Kanji, which includes 3,832 characters across 140,426 examples
Machine learning researchers have applied a variety of algorithms to classifying Kuzushiji characters using KNN, simple CNNs, as well as several variants of ResNET to find the best performer, which was found to be an ensemble of a ResNet model and a VGG model, a specialized form of a CNNs.
An ensemble combines the results of two or more models to produce its output. Since ResNet and VGG employ different analysis methods, the combined model takes advantage of the strengths of each approach while compensating for each approaches disadvantages.
Deep Learning Tackles Modern Challenges
One example of how deep learning is helping to solve a real-world problem is DeepMind’s neural network for predicting weather around a wind farm. Horev noted that researchers trained the network using weather and turbine data to create a model that predicts wind power output 36 hours in the future. These predictions are used to optimize the wind farm’s resources when meeting power generation commitments.
While it is important to advance the state of the art in deep learning, it is just as important to apply the tools and techniques to solve real-world problems, said Horev. Existing tools, such as TensorFlow and PyTorch, simplify the development of deep learning models and can aid in the solving of real-world problems like the ones Horev is working on.
To learn more about practical applications of deep learning, start by viewing Horev’s Image Classification with Deep Learning: From Minst to Kuzushiji through ResNet.