This stages purpose is to detect each word in the input image. Its input consists of only the Source Image and it outputs a SegmentLayout. The SegmentLayout is an ordered list of lines detected in the Source Image. Each line is an ordered list of the detected words, each represented as an ImageSegment.
The ImageSegment instances produced by this stage will contain nothing more than the two co-ordinates which make up the box around the represented image segment. These co-ordinates may be adjusted in the User-verification stage before being used to splice the image in the Image Splicing stage.
We will use a custom algorithm (which uses the black-and-white version of the image):
- To segment into lines:
- Scan rows of pixels until you find a row with some pixels on.
- Keep scanning until you find an empty row.
- Everything between those is one line.
- To segment those lines into words:
- Within a line, scan vertical columns of pixels until you find a column with some pixels.
- Keep scanning until you find an empty column.
- scan for another column with pixels in it.
- If the distance between those is above some threshold, interpret it as a space.
Try making the threshold proportional to the typical line height, to adjust
for writing size.
David Goodwin
2008-10-21