It's a blog

How can I get data on all the dams in the world? Use Wikidata

During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from different sources to try to get a complete worldwide view on the topic. Perhaps it is easier with Wikidata now?

Below is a very brief walkthrough of topic discovery and exploration using various Wikidata features and the SPARQL query service.

A typical known Dam

In order to get an idea of the data space for the topic within Wikidata I start with a Dam that I know about already, the Three Gorges Dam (Q12514). Using this example I can see how Dams are typically described.

Classification

The first thing I notice is that this Dam is an “instance of” “gravity dam”. An “instance of” is represented with the id P31 and a “gravity dam” is represented by the id Q3497167. This is probably a subclass of a wider set. When navigating to gravity dam I see that it is a “subclass of” “dam”. A “sub class of” is represented by the id P279 and “dam” is represented with the id Q12323. This feels like the top level for this ontological tree.

Looking at the talk page for the dam item, I can see some useful links allowing us to dive into the subclasses of “dam”, and also various instances of “dam”.

Properties

Taking another look at the Three Gorges dam item page we see various properties used to describe the dam that might be useful to look at in the dam context:

The whole set

The best way to get an overview of the whole data set is to use the query service. The dam item talk page already included a link to the query service listing all instances of dam, so we can start there.

This query will list a random set of 1000 “instance of” (P31) or instances of “subclasses of” (P279) the “dam” (Q12323) item, while also providing the English label (name) of the “dam”.

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

If we remove the LIMIT from the query we can see that 84215 dams are currently collected in Wikidata. However, sometimes this might not be desired, as the lists can get pretty long.

We can see more information about the dams on this list by expanding the query to look for triples for statements of installed capacity (P2109). Statements are the basic data building blocks in Wikidata connecting a property such as “instance of” with a value such as “dam”. Triples are a representation of this data which is queried by the SPARQL language.

SELECT ?item ?itemLabel ?installedCapacity WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  ?item wdt:P2109 ?installedCapacity
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

This will only return dams that have this statement defined (261). In order to still return the complete list, this needs to be added as an OPTIONAL triple.

SELECT ?item ?itemLabel ?installedCapacity WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  OPTIONAL{ ?item wdt:P2109 ?installedCapacity }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

More data points can be extracted in much the same way where available.

SELECT ?item ?itemLabel ?installedCapacity ?anualEnergyOutput ?watershedArea WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  ?item wdt:P2109 ?installedCapacity .
  ?item wdt:P4131 ?anualEnergyOutput .
  ?item wdt:P2053 ?watershedArea .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

Other views can also be used for the data. Setting the defaultView option in a comment will make this happen once the query has run.

#defaultView:Map
SELECT ?item ?geo WHERE {
  ?item wdt:P31/wdt:P279* wd:Q12323;
        wdt:P625 ?geo .
}
LIMIT 10000

This map displays 10,000 random dams, and allows you to zoom in, hover and inspect them.

Conclusion

Starting with a topic area to explore and a single known example I have explored the way that Wikidata describes the topic and figured out where the topic fits within the larger tree of concepts. I have also expanded from a single example to a complete data set, visualizing that set on a map.

SPARQL and the query service allow much more than is discussed in this post, such as filtering, alternate data representation and visualization, and much more.

Further reading

1 Comment

  1. Mike Peel

    Brazil shows up nicely on the map because of this event: https://outreachdashboard.wmflabs.org/courses/Grupo_de_Usu%C3%A1rios_Wikimedia_no_Brasil/Open_Data_Day_2019_-_S%C3%A3o_Paulo

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2020 Addshore

Theme by Anders NorĂ©nUp ↑

%d bloggers like this: