Applied computer vision: reading water consumption meters

Published in

BrData

6 min readJan 26, 2021

A small computer vision application project

Tasks that can be automated must be automated. But ordinary tasks developed by human beings prove to be complex to be done by machines. Digit reading in images is one of them.

The task of reading digits in an image is easily performed by humans, but when trying to implement this ability on machines, difficulties arise. Light conditions, image angle, noise, definition, etc. There are many variables that can affect the performance of the system.

This small project analyzes the application of computer vision techniques to the reading of water consumption meters.

The code used in this project was developed in python and can be found here. And feel free to contact me if you have any questions or suggestions.

Problem definition

Nowadays many companies accept that electricity and water consumption readings be sent through a photo of the meters. That way the company is easily faced with thousands of images that need to be processed.

This small project aims to analyze techniques that would make it possible to automate this process.

By doing a little research you can see that there are several solutions available. Some are “closed”, and others only perform well under controlled conditions. And as they are techniques that have many possibilities of application, I found it interesting to look with a little more attention, and thus learn a little more.

Briefly, the objective of the project is to read the digits in the meter images:

Photo by Roman Kucev(https://ieee-dataport.org/open-access/water-meter-dataset)

But how to do that? Let’s look at some possibilities.

Divide to conquer!

How to read digits in images? I will divide the problem into three parts.

Part 1: in the first stage the objective is to find some way of identifying and cutting out the region of interest, which contains the digits.

Part 2: the goal now is to segment the digits contained in the image cropped in the first step

Part 3: reading the digits resulting from step 2

Let’s do it. Part 1

I found several possibilities to segment the region that contains the digits. Many are used to read license plates, based on morphological operations such as erosion, dilation, etc. The results are good, but I would like to try something more robust. I decided to use a convolutional neural network to accomplish the task.

To achieve this goal, the YOLOv3 system (you look only once) was adopted. The algorithm makes use of a convolutional neural network with special characteristics. The network has only fully connected convolutional networks(FCN). Called Darknet-53, it contains 53 convolutional layers, each followed by a batch normalization layer and Leaky ReLU activation.

Mini-YOLOv3 - DOI: 10.1109/ACCESS.2019.2941547

I found a very interesting tutorial on YOLO (can be seen here), and a dataset with more than 1000 images of water meters (which can be found here).
To train the network we need to design the image labels. Hard work. I labeled 500 images with the aid of the LabelImg software.

LabelImg in action(photo by Roman Kucev)

As my computer is already a little tired, I decided to train the model in Google Colab. With the help of the GPU offered by the platform, the training lasted 5 hours, and the result was very satisfactory.

With the region of interest located, it is easy, from the coordinates obtained, to cut out the region containing the numbers.

Part 2 : digit segmentation of the cropped region

To segment the digits of the region obtained in part 1, traditional techniques were used. The steps were as follows:
• First, the image was changed to gray scale:

• In the sequence, the best threshold for the problem was chosen (the best result was obtained with TOZERO - If the pixel value is less than the adopted limit, it is set to zero, and the other values are maintained.):

• With the findContours function (OpenCV), the contours were located. The contours found were ordered, and filtered according to their area. The values with areas most likely to contain the digits were selected.

The result obtained can be seen below:

As you can see, the number 6 was not identified.

Part 3: digit reading

The task of this last phase is to read the digits segmented in step 2. For this task, I will adopt another convolutional neural network, trained with the help of the well-known MNIST dataset.

The MNIST dataset is an acronym that stands for the Modified National Institute of Standards and Technology dataset. It is a dataset of 60,000 small square 28 × 28 pixel grayscale images of digits between 0 and 9.

The result obtained by the algorithm can be seen in the figure below. It can be seen that the network performed well, and that the error in identifying the digit 6 comes from phase two.

Could be better?

One of the most complex parts of computer vision is segmentation. Many factors can affect the performance of the system, and there are many parameters to be adjusted. And how could the system be improved? The first and third phases showed good results. The error appeared in phase two. The improvement, which I intend to implement soon, is to replace phases two and three with another YOLO network, to directly identify the digit in the image obtained in the first phase. The system will become more robust.

In the first phase, the YOLO network was implemented to identify only one object in the image. In this new format I will have to train the network to identify the digits from 0 to 9, that is, 10 possible objects in an image. For the first phase I labeled 500 images, now I would have to label 5000. It is a lot of work. But I think the result will be interesting.

Anyway, it was a small project that allowed me to learn a little more about computer vision. Any questions or suggestions, feel free to write to me. Bye.