Forum Moderators: phranque

Message Too Old, No Replies

Check if user uploaded image contains text

         

csdude55

5:48 am on Jul 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a Perl program written where users can upload images (JPG, PNG, GIF, or BMP) to the server. These images correspond to classified ads.

But is there ANY way to test whether the image contains text?

The reason for this is because I don't allow ads that promote competing sites, but lately I've had a rash of them uploading images that contain the text for the competing site. I don't have a way to filter it automatically, so it ends up showing until I manually remove it.

It's not always even intentional. Sometimes someone will post the ad on a competing site, then for some reason take a screenshot of it and upload it to an ad on my site. Why they think this is easier, I can't even begin to tell you, but it happens at least once a day.

I don't necessarily need for the script to know what the text says, just that it exists. Then I can flag any ad that contains an image with text, and manually approve it later if it's OK.

I'm open to rebuilding in any programming language.

TIA!

piatkow

10:30 am on Jul 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It must be possible to detect text in an image as G do it rather eratically with StreetView. I would guess that it takes a pretty brutal amount of processing though.



It's not always even intentional. Sometimes someone will post the ad on a competing site, then for some reason take a screenshot of it and upload it to an ad on my site. Why they think this is easier, I can't even begin to tell you, but it happens at least once a day.

Never underestimate the efforts that people will go to to find inconvenient ways of doing something. Editing a print magazine it took me ages to convince some people that emailing an ad as an attachment was easier than printing it, putting it in an envelope and sending it by snail mail for me to scan, while sending me an email to say that it was in the post!

csdude55

7:46 pm on Jul 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm really not even concerned with the processing; I could always just get a second small server just for image processing for fairly cheap. I've been thinking about that, anyway, just to share some of the load.

phranque

8:53 pm on Jul 9, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



have you looked at tesseract?

Image::OCR::Tesseract [search.cpan.org]

csdude55

10:42 pm on Jul 9, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had not until now. It looks interesting, but the description is very confusing, probably written by an ESL author.

Have you used it yourself? If so, in the sample code:

use Image::OCR::Tesseract 'get_ocr';
my $image = './hi.jpg';
my $text = get_ocr($image);

What does $text output? Found text (in any font, I guess), or a boolean if text is found, or what?

phranque

12:58 am on Jul 10, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



get_ocr()
...
Warns if no output.