Subscribe to DSC Newsletter

Searching Google for Images Similar to a Specific Image

You can search Google for pictures similar to a given image, for plagiarism detection or to find people that look like you. 

Here's how I did a test:

  • I used the Search by Image tool from Google
  • I chose the option Upload an image
  • I used a screenshot tool, to extract a picture of myself, free of compromising metadata (see figure 1 and 3), then clicked on the picture icon in the Google search box to upload the screenshot picture.

Google returned pictures of people that look like me (see figure 2 and 4) - actually, pictures that look like the picture I uploaded, whether they represent a human being or not, or whether they represent a man or a woman. . 

Figure 1: first picture of myself

Figure 2: pictures similar to picture in figure 1, according to Google

In this first test, the results reported by Google are based on metadata, despite my efforts to eliminate metadata in the picture in figure 1. Unfortunately, as you can see, the results are wrong. Google correctly figured out that figure 1 represents Vincent Granville, and indeed the first picture returned by Google is the one from figure 1. But Google displayed search results (images) related to Vincent Granville, as opposed to pictures similar to the one that I uploaded: that is, images associated with my blog posts on Data Science Central, or images corresponding to another guy who shares the same name Vincent Granville (and a friend of mine, incidentally).

Then I did a second test with the picture below. The results are a little more promising.

Figure 3: second picture of myself

Figure 4: pictures similar to picture in figure 3, according to Google

My daughter seemed to have been totally ignored by the Google algorithm, but at least Google's image search algorithm did not rely on metadata this time. My gender was properly identified in all but one image.

If you use an original image (not a screenshot), it has metadata (probably stored in the Google image index database) and Google will tell you which websites are displaying your image, based on the metadata. The screenshot tool alters the metadata, making image forensics (plagiarism detection) far more difficult, and could be used by pirates to upload copyrighted images and eluding infringement detection.  

Question:

How to design a better algorithm? This is a computer vision problem, and this field has been very active for at least 30 years now.

Related article

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 2614

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Rohan Kotwani on March 16, 2017 at 5:36pm

I designed two algorithms for my masters of analytics practicum project.

A cropped image would definitely make it more difficult to detect matches even if the image is the same. This is especially true if the subset image is nothing similar to the original. I'll present two solutions to this problem. The first will only consider a non-cropped image of any size. The second will present standard computer vision techniques that can be used to find similar images.

First Approach:

Four quadrant can be created by dividing the image into four parts: upper right, upper left, lower lower, and lower right. I found that quadrant brightness along with overall brightness, down to the decimal place, are relatively unique to each photo. In fact, with 1 decimal place there are 2550 possible values for each quadrant. Naively, there is a 9.274683493E-18 probability that there exist photos which are exact duplicates by chance. An image could be decomposed into identifiers and could be used to query the database for matches and reference an image located on a file system. For example, the actual pictures would be located in a folder corresponding to the scrape number: Image_Dump/Scrape_num/picture.jpg.

Please see my github markdown document for more information.

https://github.com/Freedomtowin/MSA-Image-Database-With-Identifiers

Second Approach:

Satellite images that have overlapping geographic regions can be stitched together using a feature matching technique. This preprocessing technique can be used for matching a person's unique features across different images. A picture of a person could be compared with a set of new pictures for similarity. If enough features are in common between the images, the picture can be warped to match the new picture where both pictures are similar. A subtracting method could both picture shows whether the images are close enough. Various algorithms and parameters can be used to make this process less prone to false positive or false negatives.

Please see my github markdown document for more information.

https://github.com/Freedomtowin/MSA-Facial-Recongition-With-Sat-Pre...

My name is Rohan Kotwani. I have a Bachelors in Electrical Engineering and a Masters in Analytics. If you are interested in discuss about potential job/project opportunities, please send me an email at [email protected]

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service