During May of 2015 Orain was the target of a DDoS attack. The attack ended up lasting roughly 9 days and bringing the service to its knees repeatedly. The ‘official’ timeline of events and write up can be found here. Below I will discuss why the details of the DDoS as well as how it took Orain down so easily and the measures that have now been put in place.
Details of the DDoS
The DDoS was first detected on May 20th and immediately took down all of the Orain services, although at this stage no one knew that it was a DDoS, we simply thought Orain was having IPv4 routing issues as the site was still accessible on IPv6.
After messaging DigitalOcean support they revealed to us that they had nullrouted our load balancing instance (the main instance needed to access the website) due to an inbound DDoS, they also apologised for a lack of automated email about this.
The image to the right (sorry for the poor quality, apparently I didn’t take a screenshot but took a snap on my phone) shows the spike of around 800Mbps inbound on the public network interface and a small increase in the internal traffic. After this spike both interfaces can be seen to fall to 0Mbps. This was due to the public IPv4 address being nullrouted by DigitalOcean.
After a period of time DigitalOcean would return the original route of the IP to our box and service would be restored for a short period before the next round of the DDoS hit us, and repeat…….
The DDoS did not concentrate on a single instance but as all public IPs were available to the world on GitHub the attackers could easily target them all and bring down every last service Orain was running, mail, dns, web, stats etc.
What changes have been made
So as discussed above the reason the attack hit Orain so hard was because all of the public IPs for servers were available to be abused. This in combination with our services being run on VPSs which nullroute traffic when DDoSs are detected meant attacking Orain was really quite easy..
Firstly I made the switch to CloudFlare. This would mask the IP addresses of the servers when, for example, requesting meta.orain.org. Previously this would have pointed you directly to the public IP address of our load balancer, now it will direct you to an IP address for CloudFlare’s CDN. Of course with a change of name servers we had to wait for this change to propagate around the world.
This in itself was not enough as of course the attackers already had our IP addresses and continued to DDoS Orain once protected behind CloudFlare, thus we needed to rotate the IP addresses of all of our servers and make sure that the new IPs were no longer visible anywhere.
The easiest place to find all of our IP addresses was in our public DNS configuration repo located at https://github.com/Orain/dns. With the move to CloudFlare this repo would no longer be used, so check, no new public IPs here!
The second place that public IPs could be found was within our Ansible playbook located at https://github.com/Orain/ansible-playbook. All public IPs were replaced with private internal IPs in this change as well as the addition of a hosts file to ensure all instances always resolved Orain domains locally rather than being pointed to CloudFlare.
So, Lastly, IP rotation. DigitalOcean do not provide a user with an easy way to grab a new public IPv4 address for a machine as of course this is something that could be easily abused and thus the process takes a bit more time. Each box must be shut down, a snapshot created, then a new box created from the snapshot. For all of Orain’s servers this took me roughly 1 hour.
TADA! No more DDoS :)
Side effects of the changes
- The Orain DNS configuration is no longer publicly accessible and contributing to it is of course more difficult. This isn’t something that really needs to change that much though.
- CloudFlare only supports wildcard domain support with full proxy services if you pay them $5000 per month, which of course Orain can not do. Thus Orain now has a CNAME record for every wiki that it hosts. A change has also been made to our CreateWiki extension to automatically create these.
- No SSHing or accessing servers directly via domains. Previous to this we could simply ssh to ‘prod10.orain.org’ for example, but now of course this domain points to a CloudFlare IP address.
- Custom domains of course broke. Custom domains were either pointing at the old Orain name servers which are no longer being used, or at the IP address of the load balancer instance which has no changed. Currently as we want to keep our IP addresses secret the only way to keep your custom domain working is to CNAME it to ‘lb.orain.org’ which per an RFC may not actually be allowed (but works with multiple hots all the same). See this stackoverflow question. Another solution to this would be to have a second load balancer with a public IP not routed through CloudFlare, thus if it was ever DDoSed then only custom domains would see downtime.
- Our traffic is now being routed through the CloudFlare CDN which has apparently saved 25% of the bandwidth from needing to go to our servers (around 3GB) according to their dashboard.
- We are currently only on the free version of CloudFlare meaning that we are using SNI SSL which is not compatible with some browsers.
- I wish this post had more images in it.
- I wish it had not taken the team 9 days to resolve the issue (sorry for the downtime guys).
- I wish we were not being DDoSed in the first place…
- I believe a more detailed analysis of the DDoS itself may be released / discussed in the future.