Back in 2016 I wrote a short hacky script for taking HTML from facebook data downloads and adding any data possible back to the image files that also came with the download. I created this as I wanted to grab all of my photos from Facebook and be able to upload them to Google Photos and have Google automatically slot them into the correct place in the timeline. Recent news articles about Cambridge Analytica and harvesting of Facebook data have lead to many people deciding the leave the platform, so I decided to check back with my previous script and see if it still worked, and make it a little easier to use.
Step #1 – Move it to Github
Originally I hadn’t really planned on anyone else using the script, in fact I still don’t really plan on it. But let’s keep code in Github not on aging blog posts.
Step #2 – Docker
The previous version of the script had hard coded paths, and required a user to modify the script, and also download things such as the ExifTool before it would work.
Now the Github repo contains a Dockerfile that can be used that includes the script and all necessary dependencies
If you have Docker installed running the script is now as simple as
docker run --rm -it -v //path/to/facebook/export/photos/directory://input facebook-data-image-exif.
Step #3 – Update the script for the new format
As far as I know the format of the facebook data dump downloads is not documented anywhere. The format totally sucks, it would be quite nice to have some JSON included, or anything slightly more structured than HTML.
The new format moved the location of the HTML files for each photos album, but luckily the format of the HTML remained mostly the same (or at least the crappy parsing I created still worked).
The new data download did however do something odd with the image sources. Instead of loading them from the local directory (all of the data you have just downloaded) the srcs would still point to the facebook CDN. Not sure if this was intentional, but it’s rather crappy. I imagine if you delete your whole facebook account these static HTML files will actually stop working. Sounds like someone needs to write a little script for this…
Step #4 – Profit!
Well, no profit, but hopefully some people can make use of this again, especially those currently fleeing facebook.
You can find the “download a copy” of my data link at the bottom of your facebook settings.
I wonder if there are any public figures for the rate of facebook account deactivations and deletions…