Apache's Allow/Deny; Overhead of using Hostnames

  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

Recently I've identified a source of webpage requests I would like to block altogether. I have a range of IP addresses that I could block, but it appears this range has changed at least once in the last 2 years. Furthermore, I believe there are multiple ranges of IP addresses for different but similar services which all resolve to sub-domains of one main domain.

One option is to find all IP address ranges associated with this domain, and block them specifically. This seems like a daunting task.
The other option is to add the host name with the Deny directive. According to the Apache manual page for the Allow / Deny Directives though, this will result in a double reverse DNS lookup regardless of the HostNameLookups setting.

This seems to mean each request to the server will result in two lookups being performed for every visitor. As an educated guess, I would think that this lookup would be cached for at least the length of KeepAliveTimeout, if not the length of a timeout directive designed specifically for this functionality.

From here I also start to wonder, if a clients address results are cached, how much is cached? Does it cache the association for that one address, or does it cache the entire range(s) associated with the domain name and perform a check within that range before making a lookup outside of the cache.
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4923
  • Loc: NY

Post 3+ Months Ago

Writing rule sets for any allow deny situation can be hard work. Is your reason for considering it due to attempted scripts being run by the offending IP?
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

In a nutshell, I'm looking to block consumer accessible cloud hosting networks, for starters Amazon's AWS/EC2 services. A lot of times when I look through my logs and I find strange bots/crawlers that don't benefit me at all, they're coming from these could hosting services.

I'd like to do this, using "deny,allow" order so I can block the range and optionally add exceptions via Allow later on,

APACHE Code: [ Select ]
Order Deny,Allow
Deny from .amazonaws.com
  1. Order Deny,Allow
  2. Deny from .amazonaws.com
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4923
  • Loc: NY

Post 3+ Months Ago

They may not benefit you but if they aren't harming your site, it doesn't matter.
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

It matters if I can block them without using more resources than it consumes to let them run amok.
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4923
  • Loc: NY

Post 3+ Months Ago

The I guess you have to start writing rule sets.
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

I started looking at the Apache 2.2.16 source code last night. Specifically the source of mod_authz_host.c and files containing functions called by that module.

It's worth noting that in Apache 2.0.* the Allow and Deny directives were provided by mod_access. Starting with Apache 2.1 those directives are being provided by mod_authz_host.

So far it looks like Apache asks the operating system to return the host name of the client on every request, which would leave any caching behavior up to the OS and the OS's networking library. I'm not seeing Apache do any caching of it's own within the context of mod_authz_host.

I jumped right in at the module entry point though. Apache could very well be going to a cache before actually calling the functions in mod_authz_host.

I'm going to apply a host based Deny directive on the server next to me and beat on it with Apache Bench. I want to see if I can get a noticeable difference in requests served when the directive is in place.
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4923
  • Loc: NY

Post 3+ Months Ago

I didn't think the OS participated in any of that but now that you say it does, would hosts file entries help in writing rule sets?
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

Well, the main concern of mine with all of this is looking into how much overhead there is to using a host name with an Allow/Deny directive. I would like to simply use a domain name in that directive and let DNS take care of watching the IP ranges for me, I have a feeling that it's more costly than it's worth in it's current implementation though.

I tested the copy of Apache 2.2.11 I have running on one server sitting next to me, by bombarding it with requests using Apache Bench (ab -c -n) on the computer I'm typing with right now. I ran the test with no Deny directive, when a host based Deny directive, and a CIDR Deny directive. All three benchmarks included an Order directive.

Here are the three sets of directives used for each benchmark.

Code: [ Select ]
# no deny directive
Order Deny,Allow
  1. # no deny directive
  2. Order Deny,Allow

Code: [ Select ]
# host deny directive
Order Deny,Allow
Deny from .amazonaws.com
  1. # host deny directive
  2. Order Deny,Allow
  3. Deny from .amazonaws.com

Code: [ Select ]
# CIDR deny directive
Order Deny,Allow
Deny from 75.101.128.0/17
  1. # CIDR deny directive
  2. Order Deny,Allow
  3. Deny from 75.101.128.0/17


I just attached a screenshot of the three benchmark results so it's easier to see them next to eachother. I ran two out of three benchmarks multiple times and kept the average result. I only ran the host based benchmark once because of how long it was taking. As you can see, the performance for host based deny directives leaves much to be desired here.

Attachments:
benchmarks.gif


I think host based Allow/Deny directives could be reworked to lookup the assigned range for an IP address and cache host associations. I can't help but wonder if there's already a module out there designed for this purpose. :scratchhead:
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Posts: 9099
  • Loc: Seattle, WA & Phoenix, AZ

Post 3+ Months Ago

That is very interesting to see. That shows clearly how much slower the host based one is due to it being looked up each time. You would think some sort of cached based system would be implemented somewhere. The only thing I can think of is that some hosts deny the clients through server based firewalls instead. For instance on the servers I run I use Config Server & Firewall (CSF), which has many things built in to auto block abusive ip addresses. Every day numerous IPs try to flood Ozzu making numerous requests per second (many from China for some reason), numerous ips are always doing port scanning, also e-mail based dictionary attacks where spammers are trying to detect valid e-mail address and they are all automatically blocked temporarily if a flood of requests are detected, whether on httpd port 80 or any other service.

Post Information

  • Total Posts in this topic: 10 posts
  • Users browsing this forum: No registered users and 3 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
cron
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.