Tag: google cloud

A first Wikidata query service JNL file for public use

This entry is part 2 of 3 in the series Your own Wikidata Query Service

Back in 2019 I wrote a blog post called Your own Wikidata Query Service, with no limits which documented loading a Wikidata TTL dump into your own Blazegraph instance running within Google cloud, a near 2 week process. I ended that post speculating that part 2 might be using a “pre-generated Blazegraph journal file to…

By addshore October 2, 2022 6

Google Cloud Storage upload with s3cmd

Tech

I recently had to upload a large (1TB) file from a Wikimedia server into a Google Cloud Storage bucket. gsutil was not available to me, but s3cmd was. So here is a little how to post for uploading to a Google Cloud Storage bucket using the S3 API and s3cmd. S3cmd is a free command line…

By addshore September 7, 2022 1

WBStack Infrastructure

Posts Tech

This entry is part 7 of 12 in the series WBStack

WBStack is a platform allowing shared scalable hosting of Wikibase and surrounding services. A year ago I made an initial post covering the state of WBStack infrastructure. Since then some things have changed, and I have also had more time to create a clear diagram. So it is time for the 2021 edition. WBStack currently…

By addshore April 29, 2021 4

Testing WDQS Blazegraph data load performance

Tech Posts

Toward the end of 2020 I spent some time blackbox testing data load times for WDQS and Blazegraph to try and find out which possible setting tweaks might make things faster. I didn’t come to any major conclusions as part of this effort but will write up the approach and data nonetheless incase it is…

By addshore February 10, 2021 1

Creating a new replica after purging binlogs with bitnami mariadb docker images

Tech Posts

I have been using the bitnami mariadb docker images and helmfiles for just over a year now in a personal project (wbstack). I have 1 master and 1 replica setup in a cluster serving all of my SQL needs. As the project grew disk space became pressing and from an early time I has to…

By addshore November 15, 2020 0

Faster munging for the Wikidata Query Service using Hadoop

Tech Posts

The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF…

By addshore October 14, 2020 3

WBStack Infrastructure (2020)

Tech Posts

This entry is part 3 of 12 in the series WBStack

UPDATE: You can find an up to date 2021 version of this post here. WBStack currently runs on a Google Cloud Kubernetes cluster made up of 2 virtual machines, one e2-medium and one e2-standard-2. This adds up to a current total of 4 vCPUs and 12GB of memory. No Google specific services make up any…

By addshore January 25, 2020 10

Your own Wikidata Query Service, with no limits

Tech Posts

This entry is part 1 of 3 in the series Your own Wikidata Query Service

The Wikidata Query Service allows anyone to use SPARQL to query the continuously evolving data contained within the Wikidata project, currently standing at nearly 65 millions data items (concepts) and over 7000 properties, which translates to roughly 8.4 billion triples. You can find a great write up introducing SPARQL, Wikidata, the query service and what…

By addshore October 23, 2019 15