Reading image text using Elixir
This post will detail how you can read text off of an image using the Elixir programming language.
To handle this we will use a technology called Optical Character Recognition which is used to find printed or handwritten text characters inside of an image.
The steps we will take to complete this project are:
- Install the system package for tesseract (an OCR engine).
- Include the tesseract-ocr-elixir lib in your elixir dependencies.
- Test the library functionality using IEx.
Important note before installation:
If you don't install the tesseract engine before trying to use the Elixir library tesseract-ocr-elixir you will likely be met with this error:iex(1)> TesseractOcr.read(".lib/testocr.png")
** (ErlangError) Erlang error: :enoent
(elixir 1.12.0) lib/system.ex:1041: System.cmd("tesseract", [".lib/testocr.png", "stdout"], [])
(tesseract_ocr 0.1.5) lib/tesseract_ocr.ex:19: TesseractOcr.read/2
This error is letting you know you haven’t installed tesseract, not that the file path is empty (which without reading the stacktrace you might suspect because of the :enoent eror).
Once you install tesseract this error should go away. You can see the system request to tesseract in the screenshot below:
replace_image_0_x
1. Installing Tesseract
I use Homebrew to install system dependencies on my mac. So for me installing this library was straightforward:
brew install tesseract
The tesseract website has more options for installation.
2. Add tesseract-ocr-elixir lib to deps
In your mix application add the library tesseract-ocr-elixir to deps. This is an Elixir wrapper for OCR.
def deps do
[
{:tesseract_ocr, "~> 0.1.5"}
]
end
Run mix deps.get
to install the package.
3. Test the library functionality
To do a quick test of the library you will need an image with text, so grab your favorite meme and then load your application using iex -S mix
.
Now you can test the library by the read
function which will print out any words OCR finds:
iex> TesseractOcr.read("test/resources/testocr.png")
"OH YOU FOUND SOME INTERNET MEMES YOU MUST BE FUNNY"
That was the last step! You can also use the library to read PDFs and do other fun things.