Blog

Optical Character Recognition Using One-Shot Learning, RNN, and TensorFlow

Sophie Turol

one-shot-attention-mechanism-reccurent-neural-networks-icon-v333

Optical character recognition (OCR) drives the conversion of typed, handwritten, or printed symbols into machine-encoded text. However, the OCR process brings the need to eliminate possible errors, while extracting only valuable data from ever-growing amount of it.

At the recent TensorFlow meetup, the attendees learnt how employing the one-shot attention mechanism for token extraction in Keras using TensorFlow as a back end can help out. In addition, the meetup discussed how to enable multilingual neural machine translation with TensorFlow.

 

Generating expense reports with machine learning

Mike Stark, a data scientist at Concur, shared his experience of enabling an application to automatically generate expense reports from the photos of receipts. Relying on optical character recognition, the solution is able to convert images into reports, while employing machine learning techniques to extract important information from the OCR text. In contrast to regular expression matching, machine learning allows for automatically learning a large number of features and ongoing retraining as the amount of receipts grows.

The valuable receipt data to be extracted includes:

  • transaction amount
  • transaction date
  • currency
  • vendor
  • location

generating-expense-reports-with-one-shot-learning-tensorflow-and-keras-v111

At the classification stage, the entire text of a receipt is split into words, which become features for a classifier. Then, candidate strings are extracted from the receipt by pattern matching. The text surrounding the strings becomes the features of the regression algorithm that predicts the likelihood of each string being the result.

 

Training with the one-shot attention mechanism

Here, recurrent neural networks come into play. The text is fed into the neural network character by character and the network is triggered to generate either a classification or a sequence of characters.

one-shot-attention-mechanism-reccurent-neural-networks-v112

As long as not all the information in the receipt is valuable (e.g., expense type or a phone number), one needs to enable token extraction. For that purpose, Mike applied the one-shot attention mechanism, which is easy to train and is straightforwardly coded into Keras running on top of TensorFlow:

class Concurrence(Layer):


def build(self, input_shape):
    self.input_spec = [InputSpec(shape=input_shape)]
    self.input_dim = input_shape[2]

    self.W = self.add_weight((self.input_dim, 1),
                              initializer=self.init,
                              name='{}_W'.format(self.name),
                              regularizer=self.W_regularizer)
    super(Concurrence, self).build(input_shape)

def call(self, x, mask=None):
    attention = K.softmax(K.squeeze(K.dot(x, self.W), 2))
    return K.batch_dot(x, attention, (1, 1))

Here’s also a sample code of a model running on Keras:

model = Sequential()
model.add(Bidirectional(GRU(hidden_size, return_sequences=True), merge_mode='concat',
                            input_shape=(None, input_size)))

model.add(Concurrence())
model.add(RepeatVector(max_out_seq_len + 1))
model.add(GRU(hidden_size * 2, return_sequences=True))
model.add(TimeDistributed(Dense(output_dim=output_size, activation="softmax")))
model.compile(loss="categorical_crossentropy", optimizer="rms_prop")

Below, you will find Mike’s presentation from the meetup.

Join our group to stay tuned with the upcoming events.

 

Want details? Watch the video!

 


About the speakers

Mike Stark was an academic astronomer for many years, concentrating on black holes and neutron stars observed via satellites. In 2015, he followed his growing interest in machine learning out of academia and into Concur. At Concur, as part of a data science group, Mike is working on machine learning solutions to various problems created by and/or addressable with large volumes of data. He is particularly interested in the surprising power of recurrent neural networks. You can also check out Mike’s GitHub profile.


To stay tuned with the latest updates, subscribe to our blog or follow @altoros.

Get new posts right in your inbox!

1 Comment
  • Abhinav Singh

    Any help on implementing the above using Tensorflow? I wanted to make my own OCR for images as a project.

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!