Python driven OCR (text detection) with Amazon Rekognition

As I’m oft to do on a Sunday evening, I had a chance to play around with a few Amazon services. It started with a general question/challenge: “Can I do OCR on a non-text PDF with Amazon Rekognition using the detect_text function?”

Short answer is yes. Long-answer is sort-of… Rekognition only returns 50 WORD or LINE text elements per image. So at this point in time if you need a cloud-based OCR service that can handle dense text (like you would have in most PDF documents), Google Vision is probably a better choice.

For this exercise you will need:

  • An AWS developer account
  • A spare EC2 instance
  • An IAM role assigned to your EC2 instance allowing S3 and Rekognition services
  • A S3 bucket we can write to
  • Python 3.x
  • ImageMagick (OSS suite for image manipulation/conversion)
  • Wand (ImageMagick wrapper for Python)
  • Boto3 (AWS SDK for Python)

For this experiment I used the standard Amazon Linux AMI (Centos):

Start by updating yum:

sudo yum update -y

Next install Python 3.x (we’ll use Python 3.6 here):

sudo yum install python36 -y

Next install ImageMagick:

sudo yum install ImageMagick-devel

Then install wand and boto3:

sudo pip-3.6 install wand boto3

IAM Policy

Now let’s make sure IAM policy is setup correctly and attached to your EC2 instance:

In EC2, select your running instance and choose Actions->Instance Settings->Attach/Replace IAM Role

Since you probably don’t have a specific role just for S3 and Rekognition, go ahead and choose ‘Create New IAM Role’. Then follow these simple steps:

  • Click ‘Create Role’
  • Choose ‘EC2’, then click ‘EC2’ use case and click the ‘Next: Permissions’ button
  • Search for ‘AmazonS3FullAccess’ and ‘AmazonRekognitionFullAccess’ and select their checkboxes. Then click ‘Next:Review’ button.
  • Finally give your IAM role a name and click ‘Create Role’

Now back at the ‘Attach/Replace IAM Role’ screen, hit the refresh button and choose the newly created role in dropdown. Your EC2 instance can now make trusted calls to S3/Rekognition!

And here’s the quick and dirty code (this was a 20-30 minute hack for proof of concept – yes we would be catching errors, etc. if we were going to stand this up or roll it into a application):

import boto3
import json
from wand.image import Image
from wand.color import Color

fname = "./sample.pdf"

# Take apart PDF into individual image for each page
page_images = []
with Image(filename=fname, resolution=300) as pdf:
    for i, page in enumerate(pdf.sequence):
        with Image(page) as img:
            img_name = "image_{}.png".format(i+1)
            img.format= 'png'
            img.alpha_channel = False # Set to false to keep white background
            img.save(filename=img_name)
            page_images.append(img_name)


# Connect to S3 and upload images to OCR
s3 = boto3.resource('s3')
bucket = '<S3_BUCKET_NAME>'
for image in page_images:
    print("Uploading {} to S3...".format('./' + image))
    s3.meta.client.upload_file('./' + image, bucket, image)

# Call Rekognition to detect_text
rekog = boto3.client('rekognition', '<EC2_REGION_NAME>')
img_responses = {}
for image in page_images:
    response = rekog.detect_text(
        Image={
            "S3Object": {
                "Bucket": bucket,
                "Name": image,
            }
        }
    )
    # Write JSON response from Rekognition text detection out to file
    with open(image + ".json", 'w') as outfile:
        json.dump(response, outfile)
Share this:

Leave a Reply