Avoid image indexing in robots.txt
People might not want images from their websites to be indexed in online searches for several reasons.
- To protect their intellectual property or copyrighted images from being copied or used without permission
- To reduce the usage of the website (bandwidth) by people finding and using images
- To help protect the privacy of those included in images
I have no problem with images from this blog appearing in search engines but recently wanted to add some protection against usage/indexing for another site.
I did some thinking and research here. Search engines have some help pages for their indexing such as Google, but each each search engines help pages only talk about their own search. In reality you probably want to use User-agent: *
So here are some easy to copy and paste chunks that you could look at using…
If it’s a CMS (like WordPress)
For wordpress, or any CMS, you can probably just disallow the specific directory that your images are stored in.
User-agent: *
Disallow: /wp-content/uploads/
Code language: HTTP (http)
You many need to create a robots.txt file for your site. In some cases you might already have one.
For WordPress there are plugins that allow easy robots.txt editing from within the WordPress settings UI.
All Images?
If your site is a little less organized and you can’t ignore speciifc directories, then you’ll have to take the approach of ignoring file extensions instead.
Looking at the MDN web docs Image file type and format guide this list should cover you in most cases.
User-agent: *
Disallow: /*.apng$
Disallow: /*.avif$
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.png$
Disallow: /*.svg$
Disallow: /*.webp$
Disallow: /*.bmp$
Disallow: /*.ico$
Disallow: /*.tiff$
Code language: HTTP (http)
I’m really not sure what sorts of files search engines choose to index in a more fancy way (probably all of them).
Depending on the files you use you mad need an more exhaustive list. So here are some more you might want to try according to a post that claims to be a “Complete Image File Extension Lists for Developers“
Disallow: /*.ai$
Disallow: /*.arw$
Disallow: /*.cdr$
Disallow: /*.cr2$
Disallow: /*.dib$
Disallow: /*.eps$
Disallow: /*.heif$
Disallow: /*.heic$
Disallow: /*.ind$
Disallow: /*.indd$
Disallow: /*.indt$
Disallow: /*.j2k$
Disallow: /*.jif$
Disallow: /*.jfif$
Disallow: /*.jfi$
Disallow: /*.jp2$
Disallow: /*.jpe$
Disallow: /*.jpf$
Disallow: /*.jpx$
Disallow: /*.jpm$
Disallow: /*.k25$
Disallow: /*.mj2$
Disallow: /*.nrw$
Disallow: /*.pdf$
Disallow: /*.psd$
Disallow: /*.raw$
Disallow: /*.svgz$
Disallow: /*.tga$
Disallow: /*.tif$
Disallow: /*.webp$
Code language: HTTP (http)
NoIndex on the web server
Depending on what exactly you care about:
- Images showing up in search results
- Directories of images being findable on the web
You may also want to look at a NoIndex setting on your webserver.
Apache webservers have .htaccess
files. Create one of these in the directory that you don’t want to be navigable with the following line in it.
IndexIgnore *
If you’re using a different webserver you’ll need to use a different method, but searching for your webserver name and “no index” is probbaly the right way to go!