Apache's "deny from" access control can really slow website

  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Posts: 9089
  • Loc: Seattle, WA & Phoenix, AZ

Post 3+ Months Ago

If you don't want to read this entire post, I will get right to the point. Make sure Apache's HostnameLookups is disabled and make sure that if you use access controls to block certain kinds of visitors, that you are only doing it blocking actual IP addresses. Never try to block domain names, or using wildcards in domain names. While it will work, the performance impact on your site can be horrible causing pages to take longer than 5-20 seconds to load. The rest of this post will explain everything in more detail.

In the last few months I had noticed that all sorts of bad bots were coming from Amazon's AWS which were using and abusing the services at Ozzu and other websites we host. For those of you do not know, Amazon AWS stands for Amazon Web Services and part of it is basically a cloud computing platform that allows small companies, individuals, or start-ups to get access to an extremely powerful system that makes it super affordable and giving you the power to scale extremely well. In theory this is probably a good thing, but there are still bad apples out there that use it for other things.

For instance there may be a new search engine company that doesn't have the resources to obtain all the hardware necessary to have enough computing power to index the web, so they might start with Amazon's AWS since it provides them just that at a much more affordable cost. While this may be good to have people trying to develop new search technology, its bad when there are lots of people trying this and fetching pages from our servers. Our servers can only handle so much, and we much prefer to allow our visitors to use our services first, not consumed by all sorts of bots that may never return anything to Ozzu.

There may also be others out there that instead are out to simply scrape all of Ozzu's pages and duplicate them for their own dubious purposes. Finally the reason for us making the decision to ban Amazon AWS, is it seems there are individuals who have created click bots using their services that click on everything on our website, including ads which causes another set of problems.

So once we made the decision to ban Amazon's AWS, we took the easiest route to do so by adding to an .htaccess file the following:

Code: [ Select ]
<Limit GET POST HEAD>     
Order Allow,Deny
Allow from all    
Deny from *.amazonaws.com   
Deny from amazonaws.com
</Limit>
  1. <Limit GET POST HEAD>     
  2. Order Allow,Deny
  3. Allow from all    
  4. Deny from *.amazonaws.com   
  5. Deny from amazonaws.com
  6. </Limit>


This worked quickly and anybody using Amazon's Web Services quickly found they were only getting 403 errors. After a few days it appeared to be successful and things were looking much better as far as for the reasons we did it in the first place. After that it was basically forgotten.

Unknowingly to us this caused a new sporadic problem that was actually difficult to trace.

When ever you use Apache's Access Controls for denying or allowing IP addresses, its a simple operation that doesn't really have any bad consequences. So if I were to deny an IP address like this:

Code: [ Select ]
Deny from 123.123.123.123


That causes no issues as far as what I can see. The problem arises when you use actual domain names with Apaches Access Controls. So the following line:

Code: [ Select ]
Deny from *.amazonaws.com


even though being simple and to the point, does much more than you think. It makes sense when you think about it, but tracing it down to these lines were much more difficult. When a web page loads on Ozzu in order to be able to Deny any of these AWS domains, what is required by Apache is to do a Reverse DNS look up on the IP visiting Ozzu. If using Linux, it first look at your /etc/hosts file, and then uses your local resolver to translate the visiting IP address to the domain name if it exists. For many IPs this works quickly and there are no problems. However, for many IPs either no reverse IP record is setup, or network problems can occur with our resolver since we use an external service for that. The effect from this is that when visiting for the first time, Ozzu would sometimes take anywhere from 2-20 seconds to load. Completely unacceptable. Subsequent pages seemed to load very quickly, under half a second, and my theory behind that is that the local resolver caches the result for some set period.

I started to notice that every now and then a page would take forever to load. I thought it was my local connection as looking at debugging information show that the pages were all still generated quite quickly. Soon a few others mentioned this to me that pages would hang for awhile which indicated to me that more was going on, but it was hard to figure out because most of the time you didn't see the problem, just every now and then. I don't like tracking problems down like that as its more difficult to trace.

My theory is that pages load slow periodically because when you first load Ozzu, it then tries to do a reverse look up of your IP Address and its possible that if no record exists, or if the resolver is working slowly, the entire process to load the page on Ozzu is blocked until this operation finishes. Since all of this occurs before the PHP scripts even run, my debugging information still shows that pages are being generated quickly. In Firebug it was very clear that the page-load times were in excess of 5 seconds at times. Yuk! So I knew it wasn't the script, or MySQL database, it had something either to do with the network itself, or Apache. Getting closer.

My next step, because I really had no idea what was going on, was to use strace in SSH to look at all of the system calls and signals with Apache. So I ran the following code right before loading a page through my browser:

Code: [ Select ]
ps auxw | grep /bin/httpd | awk '{print"-p " $2}' | xargs strace -f


That code basically lists all of the processes, only showing httpd processes, and then using awk to output only the pids, and then pass those along to strace so we can just follow the system calls and signals with those processes. I would then load my web page and hit CTRL-C in SSH once the web page stopped loading.

The output from strace is long, but after looking you can find relevant portions that look something like this in my case:

Code: [ Select ]
[pid  637] munmap(0x7fa497fff000, 4096) = 0
[pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
[pid  637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
[pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid  637] sendto(22, "F\374\1\0\0\1\0\0\0\0\0\0\003112\00299\00222\00276\7in-add"..., 43, MSG_NOSIGNAL, NULL, 0) = 43
[pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid  637] ioctl(22, FIONREAD, [91])  = 0
[pid  637] recvfrom(22, "F\374\201\0\0\1\0\1\0\0\0\0\003112\00299\00222\00276\7in-add"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 91
[pid  637] close(22)          = 0
[pid  637] socket(PF_NETLINK, SOCK_RAW, 0) = 22
[pid  637] bind(22, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
[pid  637] getsockname(22, {sa_family=AF_NETLINK, pid=593, groups=00000000}, [12]) = 0
[pid  637] sendto(22, "\24\0\0\0\26\0\1\3\30X[R\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
[pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\30X[RQ\2\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 488
[pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\30X[RQ\2\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 192
[pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\30X[RQ\2\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
[pid  637] close(22)          = 0
[pid  637] open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 22
[pid  637] fstat(22, {st_mode=S_IFREG|0644, st_size=73, ...}) = 0
[pid  637] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa497fff000
[pid  637] read(22, "127.0.0.1\t\tlocalhost.localdomain"..., 4096) = 73
[pid  637] read(22, "", 4096)     = 0
[pid  637] close(22)          = 0
[pid  637] munmap(0x7fa497fff000, 4096) = 0
[pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
[pid  637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
[pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid  637] sendto(22, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid  637] ioctl(22, FIONREAD, [52])  = 0
[pid  637] recvfrom(22, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
[pid  637] poll([{fd=22, events=POLLIN}], 1, 4961 <unfinished ...>
  1. [pid  637] munmap(0x7fa497fff000, 4096) = 0
  2. [pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
  3. [pid  637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
  4. [pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
  5. [pid  637] sendto(22, "F\374\1\0\0\1\0\0\0\0\0\0\003112\00299\00222\00276\7in-add"..., 43, MSG_NOSIGNAL, NULL, 0) = 43
  6. [pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
  7. [pid  637] ioctl(22, FIONREAD, [91])  = 0
  8. [pid  637] recvfrom(22, "F\374\201\0\0\1\0\1\0\0\0\0\003112\00299\00222\00276\7in-add"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 91
  9. [pid  637] close(22)          = 0
  10. [pid  637] socket(PF_NETLINK, SOCK_RAW, 0) = 22
  11. [pid  637] bind(22, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
  12. [pid  637] getsockname(22, {sa_family=AF_NETLINK, pid=593, groups=00000000}, [12]) = 0
  13. [pid  637] sendto(22, "\24\0\0\0\26\0\1\3\30X[R\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
  14. [pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\30X[RQ\2\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 488
  15. [pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\30X[RQ\2\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 192
  16. [pid  637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\30X[RQ\2\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
  17. [pid  637] close(22)          = 0
  18. [pid  637] open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 22
  19. [pid  637] fstat(22, {st_mode=S_IFREG|0644, st_size=73, ...}) = 0
  20. [pid  637] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa497fff000
  21. [pid  637] read(22, "127.0.0.1\t\tlocalhost.localdomain"..., 4096) = 73
  22. [pid  637] read(22, "", 4096)     = 0
  23. [pid  637] close(22)          = 0
  24. [pid  637] munmap(0x7fa497fff000, 4096) = 0
  25. [pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
  26. [pid  637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
  27. [pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
  28. [pid  637] sendto(22, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
  29. [pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
  30. [pid  637] ioctl(22, FIONREAD, [52])  = 0
  31. [pid  637] recvfrom(22, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
  32. [pid  637] poll([{fd=22, events=POLLIN}], 1, 4961 <unfinished ...>


Not the easiest thing to read, but if you go through line by line you can figure out what is going on. Basically here, for whatever reason it is trying to connect to our local IP of 10.0.80.11 to port 53 which indicates to me that its a DNS lookup since DNS uses port 53. If you keep going through the lines you can see it sends some data and then at the end it calls poll and its unfinished probably waiting for a response. Then I don't see anything with pid 637 for awhile, and many seconds pass until I finally see these few lines:

Code: [ Select ]
[pid  637] <... poll resumed> )    = 0 (Timeout)
[pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 24
[pid  637] connect(24, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, 16) = 0
[pid  637] poll([{fd=24, events=POLLOUT}], 1, 0) = 1 ([{fd=24, revents=POLLOUT}])
[pid  637] sendto(24, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid  637] poll([{fd=24, events=POLLIN}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
[pid  637] ioctl(24, FIONREAD, [52])  = 0
[pid  637] recvfrom(24, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, [16]) = 52
[pid  637] poll([{fd=24, events=POLLIN}], 1, 4965 <unfinished ...>
  1. [pid  637] <... poll resumed> )    = 0 (Timeout)
  2. [pid  637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 24
  3. [pid  637] connect(24, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, 16) = 0
  4. [pid  637] poll([{fd=24, events=POLLOUT}], 1, 0) = 1 ([{fd=24, revents=POLLOUT}])
  5. [pid  637] sendto(24, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
  6. [pid  637] poll([{fd=24, events=POLLIN}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
  7. [pid  637] ioctl(24, FIONREAD, [52])  = 0
  8. [pid  637] recvfrom(24, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, [16]) = 52
  9. [pid  637] poll([{fd=24, events=POLLIN}], 1, 4965 <unfinished ...>


The first line there is very telling:

Code: [ Select ]
[pid  637] <... poll resumed> )    = 0 (Timeout)


So the resolver timed out, and now with the other lines you can see its trying the next IP from /etc/resolv.conf which is 10.0.80.12. Then it calls poll again and I see nothing for awhile as it is simply waiting for a response. Many seconds later I then see this:

Code: [ Select ]
[pid  637] <... poll resumed> )    = 0 (Timeout)
[pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid  637] sendto(22, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid  637] ioctl(22, FIONREAD, [52])  = 0
[pid  637] recvfrom(22, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
[pid  637] poll([{fd=22, events=POLLIN}], 1, 4982 <unfinished ...>
  1. [pid  637] <... poll resumed> )    = 0 (Timeout)
  2. [pid  637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
  3. [pid  637] sendto(22, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
  4. [pid  637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
  5. [pid  637] ioctl(22, FIONREAD, [52])  = 0
  6. [pid  637] recvfrom(22, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
  7. [pid  637] poll([{fd=22, events=POLLIN}], 1, 4982 <unfinished ...>


So again you can see it timed out, and now its trying the first resolver IP again. This process continues a few more times before finally giving up and processing the rest of everything like normal.

Our problem now seems very likely that it has to do with doing DNS lookups. To get a few more hints I also did a TCPdump with traffic going to and from one of the resolvers:

Code: [ Select ]
tcpdump -i eth0 -A -s 0 '(src 10.0.80.12 or dst 10.0.80.12) and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'


This will show all the packets for anything that gets sent or received from 10.0.80.12. Many of the lines would look like this:

Code: [ Select ]
19:54:38.933723 IP 10.0.80.12.domain > 10.60.164.194.33286: 62627 ServFail- 0/0/0 (49)


So most ended up returning a status of ServFail, which probably just means no record exists for that query. The rate that the IPs were coming in, and for the ones that worked it seemed like the IPs tended to reverse back into a regular visitors IP addresses. This indicated to me that everybody visiting Ozzu was having their IP address trying to be translated back into a domain name. Finally going down the right track I found and remembered recently adding the access controls to block Amazon AWS which ended up being the cause for all of these reverse IP look ups. As soon as I removed those lines all of the reverse look ups stopped, and as far as I can tell there are no more long delays loading Ozzu.

Before getting to this point I started to think that I had Apache's HostnameLookups enabled which would also cause every page load to try to have its IP address translated, however in my case I already had it turned off.

Lesson learned here, never enable HostnameLookups, and be very careful when using Apache's access controls, or any process that requires doing an extra DNS look up as waiting for the response can hold everything up. In our case it was causing pages to load in excess of 5 seconds for a good number of our visitors. Not good, not good at all.

Since Apache makes this available to block IPs via their domain names, I am sure others out there are using this not knowing why their websites load slow occasionally. I hope the information above can help others speed up their websites.
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

Host names are a pain in the butt. Even on a home network, I've had host based operations cause computers sitting right next to eachother take multiple seconds to communicate.

Another gotcha I've been running into, is IPV6 based stuff causing timeouts due to old networking equipment.

Post Information

  • Total Posts in this topic: 2 posts
  • Users browsing this forum: No registered users and 29 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.