Google Cloud Storage upload with s3cmd
I recently had to upload a large (1TB) file from a Wikimedia server into a Google Cloud Storage bucket. gsutil
was not available to me, but s3cmd
was. So here is a little how to post for uploading to a Google Cloud Storage bucket using the S3 API and s3cmd.
S3cmd is a free command line tool and client for uploading, retrieving and managing data in Amazon S3 and other cloud storage service providers that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects.
https://s3tools.org/s3cmd
s3cmd was already installed on my system, but if you want to get it, see the download page.
Google Cloud secret & access keys
I took guidance from a post from the Bitmovin docs with some modifications.
You need to navigate to your Bucket > Settings > Interoperability.
The easiest path here is to hit the “Create a Key” button for your user account. This will immediately generate you an access key and secret that you can use below.
Service Account (optional)
You can also create a dedicated service account for s3cmd to use which I did in my case.
Click”Create key for service account”.
You can then pick an existing service account, or create a new one, your keys will then be shown.
You need to ensure that the service account has access to perform the needed operations within the bucket. So I added Storage Legacy Bucket Owner
and Storage Legacy Object Owner
as this bucket is dedicated for this porpose.
If you want to access a “requester pays” bucket you’ll also need to set something like Project Billing Manager
on to the service account. You can read more about the fine-grained permissions on the Google Cloud docs.
Basic usage
Without creating any sort of configuration file you can pass most arguments via command line options.
s3cmd --disable-multipart --host=storage.googleapis.com --host-bucket="%(bucket).storage.googleapis.com" --access_key XXX1 --secret_key XXX2 ./large.txt put s3://addshore-bucket/large.txt
Code language: JavaScript (javascript)
At least one post that I read said that --disable-multipart
was required, hence it is included in the examples as it worked fine for me. [1]
Using a proxy
The only way that I could figure out how to use a HTTP/HTTPS proxy with s3cmd was via the config file rather than only command line arguments.
You can use s3cmd --configure
which will guide you through the creation of a rather extensive ~/.s3cfg
. You can find a small walkthrough of the --configure
user experience in this post by Nicky Mathew. Or you can use the minimum config I have provided below.
[default]
access_key = XXX1
access_token =
host_base = storage.googleapis.com
host_bucket = %(bucket).storage.googleapis.com
proxy_host = webproxy.eqiad.wmnet
proxy_port = 8080
public_url_use_https = False
secret_key = XXX2
Code language: PHP (php)
From this point, the command is quite simple, as most details are read from the config file.
s3cmd --disable-multipart put ./large.txt s3://addshore-bucket/large.txt
Code language: JavaScript (javascript)
[…] of disk space free, so the file was copied to that host using some internal tool. From there I used s3cmd to copy the large file into a Google cloud […]