Apache's deny from or allow from access controls can cause extremely slow Apache response times.

If you don't want to read this entire post, I will get right to the point:

  1. Make sure Apache's HostnameLookups is disabled.
  2. Don't use hostnames with access controls (deny from, allow from).
  3. Don't use wildcards in domain names.

Make sure that if you use access controls to block certain kinds of visitors, you are only doing it by blocking actual IP addresses. Never try to block domain names, or using wildcards in domain names. While it will work, the performance impact on your site can be horrible causing pages to take longer than 5-20 seconds to load.

Apache Access Controls

In the last few months, I noticed that all sorts of bad bots were coming from Amazon's AWS which were using and abusing the services at Ozzu and other websites we host. For those of you who do not know, Amazon AWS stands for Amazon Web Services and part of it is basically a cloud computing platform that allows small companies, individuals, or start-ups to get access to an extremely powerful system that makes it super affordable and giving you the power to scale extremely well. In theory, this is probably a good thing, but there are still bad apples out there that use it for other things.

For instance, there may be a new search engine company that doesn't have the resources to obtain all the hardware necessary to have enough computing power to index the web, so they might start with Amazon's AWS since it provides them just that at a much more affordable cost. While this may be good to have people trying to develop new search technology, it's bad when there are lots of people trying this and fetching pages from our servers. Our servers can only handle so much, and we much prefer to allow our visitors to use our services first, not be consumed by all sorts of bots that may never return anything to Ozzu.

There may also be others out there that instead are out to simply scrape all of Ozzu's pages and duplicate them for their own dubious purposes. Finally, the reason for us making the decision to ban Amazon AWS is it seems there are individuals who have created click bots using their services that click on everything on our website, including ads which causes another set of problems.

So once we made the decision to ban Amazon's AWS, we took the easiest route to do so by adding access control to a .htaccess file:

<Limit GET POST HEAD>           
Order Allow,Deny
Allow from all        
Deny from *.amazonaws.com      
Deny from amazonaws.com
</Limit>

This worked quickly and anybody using Amazon's Web Services quickly found they were only getting 403 errors. After a few days, it appeared to be successful and things were looking much better as far as the reasons we did it in the first place. After that, it was basically forgotten.

Avoid Using Domain Names in Access Controls

Unknowingly to us, this caused a new sporadic problem that was actually difficult to trace.

Whenever you use Apache's Access Controls for denying or allowing IP addresses, it's a simple operation that doesn't really have any bad consequences. So if I were to deny an IP address like this:

Deny from 123.123.123.123

That causes no issues as far as I can see. It simply looks at the remote user's IP address and if it matches that line it will then deny that user access. The problem arises when you use actual domain names with Apaches Access Controls. So the following line:

Deny from *.amazonaws.com

even though being simple and to the point, does much more than you think behind the scenes and can cause extremely poor Apache performance and response times. It makes sense when you think about it, but tracing it down to these lines was much more difficult.

How Apache Matches Domain Names in Access Controls

When a web page loads on Ozzu in order to be able to Deny any of these AWS domains, what is required by Apache is to do a Reverse DNS lookup on the IP address visiting Ozzu. If using Linux, it first looks at your /etc/hosts file and then uses your local resolver to translate the visiting IP address to the domain name if it exists. For many IPs, this works quickly and there are no problems. However, for many IPs either no reverse IP record is set up, or network problems can occur with our resolver since we use an external service for that. The effect of this is that when visiting for the first time, Ozzu would sometimes take anywhere from 2-20 seconds to load. Completely unacceptable. Subsequent pages seemed to load very quickly, under half a second, and my theory behind that is that the local resolver caches the result for some set period.

Using Strace to Confirm Hostname Lookups Caused the Slowness

I started to notice that every now and then a page would take forever to load. I thought it was my local connection as looking at debugging information showed that the pages were all still generated quite quickly. Soon a few others mentioned to me that pages would hang for a while which indicated to me that more was going on, but it was hard to figure out because most of the time you didn't see the problem, just every now and then. I don't like tracking problems down like that as it's more difficult to trace.

My theory is that pages load slowly periodically because when you first load Ozzu, it then tries to do a reverse lookup of your IP Address, and it's possible that if no record exists, or if the resolver is working slowly, the entire process to load the page on Ozzu is blocked until this operation finishes. Since all of this occurs before the PHP scripts even run, my debugging information still shows that pages are being generated quickly. In Firebug it was very clear that the page-load times were in excess of 5 seconds at times. Yuk! So I knew it wasn't the script or MySQL database, it had something either to do with the network itself, or Apache. Getting closer.

My next step, because I needed to confirm what was in fact happening, was to use strace in SSH to look at all of the system calls and signals with Apache. So I ran the following code right before loading a page through my browser:

ps auxw | grep /bin/httpd | awk '{print"-p " $2}' | xargs strace -f

That code basically lists all of the processes, only showing httpd processes, and then using awk to output only the pids, and then pass those along to strace so we can just follow the system calls and signals with those processes. I would then load my web page and hit CTRL-C in SSH once the web page stopped loading.

The output from strace is long, but after looking you can find relevant portions that look something like this in my case:

[pid   637] munmap(0x7fa497fff000, 4096) = 0
[pid   637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
[pid   637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
[pid   637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid   637] sendto(22, "F\374\1\0\0\1\0\0\0\0\0\0\003112\00299\00222\00276\7in-add"..., 43, MSG_NOSIGNAL, NULL, 0) = 43
[pid   637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid   637] ioctl(22, FIONREAD, [91])   = 0
[pid   637] recvfrom(22, "F\374\201\0\0\1\0\1\0\0\0\0\003112\00299\00222\00276\7in-add"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 91
[pid   637] close(22)                   = 0
[pid   637] socket(PF_NETLINK, SOCK_RAW, 0) = 22
[pid   637] bind(22, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
[pid   637] getsockname(22, {sa_family=AF_NETLINK, pid=593, groups=00000000}, [12]) = 0
[pid   637] sendto(22, "\24\0\0\0\26\0\1\3\30X[R\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
[pid   637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\30X[RQ\2\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 488
[pid   637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\30X[RQ\2\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 192
[pid   637] recvmsg(22, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\30X[RQ\2\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
[pid   637] close(22)                   = 0
[pid   637] open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 22
[pid   637] fstat(22, {st_mode=S_IFREG|0644, st_size=73, ...}) = 0
[pid   637] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa497fff000
[pid   637] read(22, "127.0.0.1\t\tlocalhost.localdomain"..., 4096) = 73
[pid   637] read(22, "", 4096)          = 0
[pid   637] close(22)                   = 0
[pid   637] munmap(0x7fa497fff000, 4096) = 0
[pid   637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 22
[pid   637] connect(22, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, 16) = 0
[pid   637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid   637] sendto(22, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid   637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid   637] ioctl(22, FIONREAD, [52])   = 0
[pid   637] recvfrom(22, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
[pid   637] poll([{fd=22, events=POLLIN}], 1, 4961 <unfinished ...>

Not the easiest thing to read, but if you go through line by line you can figure out what is going on. Basically here, for whatever reason, it is trying to connect to our local IP of 10.0.80.11 to port 53 which indicates to me that it's a DNS lookup since DNS uses port 53. If you keep going through the lines you can see it sends some data and then at the end it calls poll and it's unfinished probably waiting for a response. Then I don't see anything with PID 637 for a while, and many seconds pass until I finally see these few lines:

[pid   637] <... poll resumed> )        = 0 (Timeout)
[pid   637] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 24
[pid   637] connect(24, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, 16) = 0
[pid   637] poll([{fd=24, events=POLLOUT}], 1, 0) = 1 ([{fd=24, revents=POLLOUT}])
[pid   637] sendto(24, "1u\1\0\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid   637] poll([{fd=24, events=POLLIN}], 1, 5000) = 1 ([{fd=24, revents=POLLIN}])
[pid   637] ioctl(24, FIONREAD, [52])   = 0
[pid   637] recvfrom(24, "1u\201\2\0\1\0\0\0\0\0\0\16c-76-22-99-112\4hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.12")}, [16]) = 52
[pid   637] poll([{fd=24, events=POLLIN}], 1, 4965 <unfinished ...>

The first line there is very telling:

[pid   637] <... poll resumed> )        = 0 (Timeout)

So the resolver timed out, and now with the other lines, you can see it's trying the next IP from /etc/resolv.conf which is 10.0.80.12. Then it calls poll again and I see nothing for a while as it is simply waiting for a response. Many seconds later I then see this:

[pid   637] <... poll resumed> )        = 0 (Timeout)
[pid   637] poll([{fd=22, events=POLLOUT}], 1, 0) = 1 ([{fd=22, revents=POLLOUT}])
[pid   637] sendto(22, "c-76-22-99-112hsd1"..., 52, MSG_NOSIGNAL, NULL, 0) = 52
[pid   637] poll([{fd=22, events=POLLIN}], 1, 5000) = 1 ([{fd=22, revents=POLLIN}])
[pid   637] ioctl(22, FIONREAD, [52])   = 0
[pid   637] recvfrom(22, "c-76-22-99-112hsd1"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.80.11")}, [16]) = 52
[pid   637] poll([{fd=22, events=POLLIN}], 1, 4982 <unfinished ...>

So again you can see it timed out, and now it's trying the first resolver IP again. This process continues a few more times before finally giving up and processing the rest of everything like normal.

Utilizing TCPdump To Confirm Slow Hostname Lookups

Our problem now seems very likely that it has to do with doing DNS lookups. To get a few more hints I also did a TCPdump with traffic going to and from one of the resolvers:

tcpdump -i eth0 -A -s 0 '(src 10.0.80.12 or dst 10.0.80.12) and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)'

This will show all the packets for anything that gets sent or received from 10.0.80.12. Many of the lines would look like this:

19:54:38.933723 IP 10.0.80.12.domain > 10.60.164.194.33286: 62627 ServFail- 0/0/0 (49)

So most ended up returning a status of ServFail, which probably just means no record exists for that query. The rate that the IPs were coming in, and for the ones that worked it seemed like the IPs tended to reverse back into regular visitors' IP addresses. This indicated to me that everybody visiting Ozzu was having their IP address trying to be translated back into a domain name.

Conclusion

Finally going down the right track I found and remembered recently adding the access controls to block Amazon AWS which ended up being the cause for all of these reverse IP lookups. As soon as I removed those lines all of the reverse lookups stopped, and as far as I can tell there are no more long delays loading Ozzu.

Before getting to this point I started to think that I had Apache's HostnameLookups enabled which would also cause every page load to try to have its IP address translated, however in my case I already had it turned off. Definitely check in your Apache Conf files that this is disabled.

Lesson learned here, never enable HostnameLookups, and be very careful when using Apache's access controls, or any process that requires doing an extra DNS lookup as waiting for the response can hold everything up. In our case, it was causing pages to load in excess of 5 seconds for a good number of our visitors. Not good, not good at all.

Since Apache makes this available to block IPs via their domain names, I am sure others out there are using this not knowing why their websites load slowly occasionally. I hope the information above can help others speed up their websites.

This page was published on It was last revised on

Contributing Authors

0

1 Comment

  • Votes
  • Oldest
  • Latest
JO
184 4
Commented
Updated

Host names are a pain in the butt. Even on a home network, I've had host based operations cause computers sitting right next to eachother take multiple seconds to communicate.

Another gotcha I've been running into, is IPV6 based stuff causing timeouts due to old networking equipment.

add a comment
0