We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Preventing posting dupes on Instagram
2024-09-27 ・ side-project, machine-learning, instagram, photography
What's now simple
One of the things I've done continuously for a while is photography. Recently, I did my first paid shoot, at an art gallery opening, for one of the artists. Otherwise, it's simply things from my life, like anyone. I've almost posted a photo twice before, if it's from a previous month or year.
As with others, I was recently reminded of XKCD 1425 (Tasks) 1 by Simon Willison 2. That means that I can easily solve this problem for myself.
Models considered
Considering that I may want to do this on my iPhone, I looked at well-known models that could be run. ResNet-50 and MobileCLIP came up as suitable solutions. ResNet-50 requires a modification, since its architecture is to classify an image into 1 of 1,000 distinct classes. I remove that final layer to get a model, where I can get the embeddings from. Here is a common example of the architecture 3.
MobleCLIP was used as is
Embedding similarity distribution analysis
As someone relatively new to the more in-depth analysis of ML model performance, I did a rather simple attempt. Simply as well, I was curious how similar my photos are. I used a similarity matrix, created through Torch. I used just the dot product for similarity. This is simple, possibly a little wrong, but sufficient.
The code is available in a repository online: jesse-c/instagram-image-already-posted-check.
Distribution of image similarities for ResNet-50:
Distribution of image similarities for MobileCLIP (S2):
S2 was picked, based on general recommendations read online, for similar tasks.
I had posted a dupe
When running the analysis, and looking at the most similar pair, I did discover that I had posted a dupe! It was the same photo, albeit with slightly different editing.
Why ResNet-50 may be preferable
There's higher similarity scores, with the distribution being centered around higher similarity values. This suggests it may be more sensitive to subtle differences.
A tighter distribution means that it could be easier to set a clear threshold, for determining if a candidate photo is a dupe. Ultimately, this would still be a manual process, but in an automated sense, you'd want to avoid surfacing candidates for a manual check.
The symmetry indicates that there isn't bias towards any particular kind of photo.
Using it on a candidate photo
Here's the candidate photo and the top k similar images. Fundamentally, it was helpful! Here are the results:
Next steps
Often if I'm editing photos, I'm not at my computer. Having this available as a private online service for myself would be helpful. Even better would be an offline iOS app! This could possibly be used by others.