This blog post is the third in the series about CommonsDB. If you don’t know about the project at all, I recommend to check out the first one, it also has a nice video introducing the project.
A rainy weekend in March, about 80 people gathered in Arnhem, the Netherlands for the Wikimedia Hackathon Northwestern Europe 2026. I came there with the hopes of getting some input on how CommonsDB could best be put to use in our projects, socialize our progress, and, of course, do a bit of hacking and improve our prototype.
Even early in the first day when I showed that an initial prototype, built by Sebastian Berlin, developer at Wikimedia Sverige, can help you check if an image you are about to upload already is declared in CommonsDB, I saw people understanding the potential. Since CommonsDB is a registry for public domain and openly licensed works, it can give assurance to the uploader that they are doing the right thing. However, the prototype in the state of the start of the hackathon required some clicks and weren’t making use of the CommonsDB search API. My mission was to make progress in that area.
Technical tinkering
The starting point was a user script that submits the image to a service that generates a code, the International Standard Content Code or ISCC for short. In the interface, it also gave you a link to the CommonsDB Explorer with the generated ISCC, so that you can click through and see if the image was already declared. For example, it could be a link like this one: https://registry.commonsdb.org/explorer/KEC6SZK633OFYX5YSBRWGPNMK3M7AAWE6AGBUYQPMSLOMFE22TJJ5AA That lets you look in the registry and see if there is any matches and find the link to the source if you want to explore more.
So far, so good. But it requires some manual work which we, thanks to the CommonsDB search API can reduce. My improvement during the hackathon was to use this API to get some answers in JSON format instead. For original images and images not declared in CommonsDB yet, it would just say that there was no match and would let the user just continue as per usual. If there was a match the script then looks at the answer and looks at who is the source of the image in CommonsDB. If the source is not Wikimedia, it then shows which license the image has and who made that claim with a direct link to the original source. The user then are helped for the next step when selecting the license and can also investigate if the image looks curious for some reason.
If the results include a match with Wikimedia as source, information is shown to the user that there is already a similar match on Wikimedia Commons, including a thumbnail and a link. The user can then decide if they still want to upload a new version or not. Since the perceptual search can find a match even for an image it would, for example, likely show a match for an image that have had digital restoration work or similar, which are appropriate new uploads.
In the movie below, we see two use cases. The first is an upload of an image from Europeana that is in CommonsDB. The user gets a link and click through and see that it looks correct if they didn’t already know that this was the source. In the second one, a user starts an upload of a heavily downscaled version of an image that was declared by Wikimedia Sverige in CommonsDB. The user then gets information and a thumbnail and gets to decide what to do. (In this case, hopefully cancel the upload.)
How could this look like in the future?
There’s still a lot of work to do, both when it comes to the visual design and covering corner cases, how to show this when doing multiple uploads or what to do if there are multiple matches in the register. But the basic workflow would be similar but even more smooth for the user. If there is no match for example, there is not really a reason to show anything for the user.
When there is a match, in addition to showing the license, it would be neat to also make a suggestion for what license template to use making it much easier for the user to get it right.
Some feedback received during the hackathon was to also log when the user made another choice than the suggestion. This could be used to either see if the user made an error or a judgment call that another rights statement was more appropriate for Wikimedia Commons.
Currently, this is a user script, and it could perhaps be created as a gadget. But even better would be to integrate it even deeper, perhaps as an extension, together with a Wikimedia-hosted ISCC-code generating service. This would reduce the time it takes to get the code significantly.
Finding a new use case
While we have some listing of potentials at Meta-Wiki, and some other implementation ideas, during the hackathon we also heard about another use case from Wikimedia Ukraine and Wiki Loves Monuments. They would benefit from a tool where users upload many similar images shot in “burst mode”, meaning that are not exactly identical, but often strikingly similar. Experiments with a couple of example images verified that such images could indeed be identified through the perceptual hash ISCC provides. While we already envisioned a tool that could compare existing images on Commons, this showed that there are more cases for potential “duplicates” to be found.
Other related development at the hackathon
CommonsDB also supports a declaration to provide an argument of why an image is in the public domain, they call this PD rationale. Thanks to the ground laying work by Paulina and others, using Wikidata as the identifier for these statements has been the choice of CommonsDB as well.
During the hackathon, a group of people were making progress on how to model copyright statements in the structured data on Commons. When implemented on scale, this will provide a great way for us to submit those as the PD rationales to CommonsDB.
For now, there are a lot of templates on Wikimedia Commons giving this information for an image. This is great for a human, but it could be more accessible for machines. Currently we are mapping license templates towards Wikidata items about public domain reasons so that we can already use this in our declarations. It might also be useful for adding this information to the structured data on Commons in bulk. You can help by mapping more templates.
What’s happening next in the project?
We are continuing to declare files from Wikimedia Commons to CommonsDB. We just passed the one million mark, and hopefully we will have declared several millions by the end of the project. Still a long way to go for all files on Commons, but a start large enough to show the value of a registry like this.
We’ll also continue developing the prototype, to show the value for the community more clearly and possibly even have something useful for regular users. Not only will we be doing that work at home, we will also be at the hackathons in Milan and before Wikimania in Paris. Please talk to us if you are there and are curious or have ideas.
In parallel, we’ll also try to make more demos for some of the other ideas that we have. If you come up with any new ideas for how either the CommonsDB registry or access to perceptual hashes to find similar images could help you as a Wikimedian, please let us know!
Sharing knowledge
A hackathon is of course about meeting people and collaborating, not only stare at your own screen and only work on your own projects all the time. I was happy to be able to make myself useful for other participants as well.
One of my favorite tools on Wikidata, Integraality / Property dashboard, needed some input for how it is being used in practice and the maintainer Jean-Frédéric interviewed me as user research.
User:Spinster had some ideas about data sets and Wikidata and we also had a great conversation about what could be done, perhaps notably in terms of outreach.
User:ItsNyoty wanted to add an image blurring filter for sensitive images as a gadget, and I had just seen one being deployed on Swedish Wikipedia only hours earlier and I could connect them. This turned into a crosswiki collaboration with people not even at the hackathon and Dutch Wikipedia getting an advanced version deployed quickly.
Since I love hackathons, and this was a well-organized one, I also took the chance to record a podcast episode (also on Commons) with two of the organizers with the hope of inspiring more people to run local and regional hackathons.
Finally I want to point to the showcase page for the hackathon where you can read more about what other people hacked on, and also find links to other participant’s summarizing blog posts.





























































