Need script that blocks proxies

  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

If it were me? I think I would go gather a bunch of URLs that list free open proxies. As many as I could find... Most of the time, those are listed as an IP address, but you can always use a command line to resolve them to an IP if not.

Then, I would write an HTML scraper script that would suck down those pages, and hunt line by line for IP addresses. These would ALL be logged to a database. The database is important to avoid duplicate entries, since the sites could be cross-listing.

*EDIT: Forgot to add: This update script would run on a cron (or AT) job at intervals. Maybe every night at midnight, maybe once a week... whatever seems appropriate based on the update frequency of those proxy list sites.

Then, I would have a global on entry script that compairs the IP address of the entering user to my database of open proxies (which are the bad ones... legit traffic comes from closed, commercial ISP proxy servers, if any). If this was the case, I'd bounce them to a page explaining that the use of open proxies are not allowed. If possible, try and figure out who they are, perhaps based on the user they're trying to access (log the incoming link). Send that person an e-mail reminding them that use of open proxies is not allowed, etc.

If you start banning people, your users will get the picture... just be careful because it's not ALWAYS the user's fault. That's the tricky part you'll have to figure out.

.c
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Okay so what langauge would you use?

Assuming you would go with a low level language?

I actually forgot that open proxies are listed. Silly me then of course you could block them if they came from the bad list - duh. (I think alot of us, well me at least, missed the plot because we where trying to figure out how to detect whether or not a proxy was being used instead of taking a step back and going, okay so what do all proxies have in common - ah an IP address, Spam, hmmmm sure they trace open proxies to stomp out spam, hey what, spam, proxy blocker - well I never)

Like the way you think! (you had to do this before hmmmm?)
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

I'd probably use PHP, or ASP if it was an IIS machine.

No, I haven't had to write a script for this specifically, but I have written a number of scripts like this to basically emulate an RSS feed when none exists. At my org, we use an ASP service provide for online donations, advocacy and e-mail marketing. When we create a campaign on their system, they list it, along with other active campaigns, on our "center homepage"

Since I have no intention of manually syncing the two systems, I wrote an ASP scraper that goes over to their system, grabs the HTML, finds the section where the advocacy campaigns are listed, harvests those titles and URLs, then rebuild the list on our site. I happened to have written this script in Perl since our deployment system is built on a custom Perl and Tomcat engine, but I've done similiar things to sync remote, databases using ASP as well (needed an HTTP solution, because the only ports open between the two was 80 and 443, so I used 443 to create a secure datasync using XML, not using the XMLHTTP object either, though I later rebuilt it using MS's builtin object).

I've never done anything like this in PHP, but it would certianly be possible.

.c
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Well I am most def gonna give it a bash!

Will let you know when it is done, sounds really interesting
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

Would this help me? I really hope so because if it works I will even pay you to find me something/someway to block most proxies from being used on my site. Also need it to not be able to click the users link basically.Basically I need to block proxies all together atleast most of them. If you can find a program,script something to do this let me know I will pay.
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Frozen when I get this right I will give it to you for free. The amount of stress and panic you have been subjected I would feel bad selling it to you.

Will PM you when it is done, in the mean time fix the javascript errors ;o)
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

Thanks alot man. Will do, Btw ThunderArena is down for now lmao my brother tried to put up a firewall into the site to block proxies and it blocked everything...Anyways it will be back up soon we need this time to work on it anyways.

Thanks alot man reply when it's done.
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

RD, let me know if you need any help/advice, I'll be happy lend whatever assistance I can.

.c
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Will most def take you up on that c.

Actually right now. I was doing a search for db's that list open proxies. Came up with a huge amount. I was wondering if you had any uri's to throw into the mix?
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

I can't access most of them from work (silly gatewary restrictions). Pretty much any warez or other hackeresque sites are restricted. There really are tons of sites, though I think most of them cross link to a large degree. If you can find 10 large lists, you're probably ok. Those sites come and go though, so any application you write will need to be extensible enough to configure input sites. The basic format of the IP search is simple, just a line by line, or maybe even " " by " " (space by space.. word seperation) search, in case the site owner didn't understand the use of carriage returns (heh).

Start with a database that populates an array of URLs based on the number of URLs entered in the database (you can use a db admin tool or build a nice interface for that.. I usually build some admin tool interface for these sorts of things, since not everyone knows enough about mySQL to actually use a real administration tool). When the scraper runs, log the response codes, and if it's anything but success, you can either automatically remove that entry (marking it as invalid without actually deleting the record is probably better), or set up a function to send a results mail at the end of the process detailing the number of IPs found, which urls were valid and which we not... etc. Those details are up to you =]

.c
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

I have built a config file that allows you to enter the uri into an array. (Was worried about the stress on the DBMS with too many uri's)

Then I just grab the html files and with a handly little regex pattern (thanks to RTM) build an array containing the IP address inside the page.

Then I make the array unique (remove duplicate values) and insert the values from the array into the DB.

So far so good?
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

Good sounds good so far. Tell me when it's done so we can test it on ta and see if most of the proxying stops...I so hope it works man if it works you are the best genius on the internet! ...lol we just need this bad ;)
  • rtm223
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 1855
  • Loc: Uk

Post 3+ Months Ago

Question for carnix and RD. Does the IP REGEX need to differentiate between ports. As I said, I'm not too good with the whole IP address thing. ATM if you feed it:

<b>01.02.03.04:80
01.02.03.04:8080</b>

it will see them <i>both</i> as <b>01.02.03.04</b>, and remove the duplicate entry. Is this acceptable or not? I can modify the regex if needed.
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

No don't think it is. The remote client IP will not contain a port number, so the ip on port 80 or port 8080 will be denied.

Hmmmm wonder if php can tell the port the client is connected to a proxy on or the port of the incoming request.
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

Good question. It might, although most open proxies use 80 since it's generally open by default. 8080 is a more of a convention than a true default, these days anyway.

I can't actually open any of the better provisioned open proxy sites, so I can't really speak to this right now. I guess adding an optional (:[0-9]+) or something similar (heh, sorry, I can't spit out regexes from memory.. that's what I have O'Reilly for!), would do the trick. If it's not known whether a port will be listed or not, I guess the regex should be able to detect both.

.c
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Nope PHP only handles the server port, so no revisions to the reg ex
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

hmm I hope yall help me and get this....damn it will help so much...lol I hate proxiers and people that use proxies to enter sites... :(
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Don't worry. What I was thinking about adding (if carnix hasn't beat me to it) is a list of good proxies ie proxies you don't mind.

So effectively the proxies you allow will come in and the proxies you don't allow will be ejected.

Carnix: I noticed that the HTTP_X_FORWARDED_FOR is the IP sent from client requesting the page via proxy and that anonymouse proxies claim to not send it. Suppose then we should do a check to see if that variable is set and if it is not empty.

WHAT ABOUT DNS lookups - or am I getting carried away?
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

well, DNS lookups will only give you the proxy's information.

If you didn't see it already, check this page:
http://www.freeproxy.ru/en/free_proxy/f ... nymity.htm

Seems to me that, with the exception of so-called "High Anonymity Proxies (Elite proxies)" all proxies, even the anon ones have SOME value in that header, even if it's fake. That it's fake isn't important, only that it isn't empty (or not determined... not sure what the result would be in that case... not an IP anyway).

The majority of free open proxies out there are not acutally supposed to be open proxies. They are usually misconfigured private proxies, or other systems that have some server that can act as a proxy. When we installed Interwoven TeamSite a couple years ago here, it turns out that, by default, the internal proxy server it uses was open through Apache... Took me a month to figure out what all that damn traffic was and shut down the hole. I found that system's IP address on several free open proxy listing sites, and to this day, I think it must still be listed in a few, because there are still some folks trying to connect to it. I bet someone wrote and distributed some sort of script and hardcoded that IP as the proxy to use...

Anyway, that header is something you should certainly include. Although, it wouldn't be definitive, for sure.

.c
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

I had a look at that site. Alot of them cross reference the same proxies.

Any comments on the code?
I have cleaned it up alot more. Add new features and once this end is happy I am going to work on the user interface, add the ability to decided to write to file or DB and error logging. Config file is pretty cool as well. Gives a not so PHP savy individual the chance to configure it through one page not scrapping through all the code
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

Man can't wait until yall get this done it's going to be sweet thank you so much!
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

The only problem I am picking up at the moment is that you are going to have to maintain the list quiet closely. Legit users might get blocked.

The way I am going to set up the validation is check that the FORWARD_BY variable is set. If it is not set then reject otherwise check the DB for the ip address. If it appears then reject else (just thought of this now.) check the DB for the Forwarded By Address and match it against the DB or a list of user banned (hmmm going to have to re look at the sequence here. Anyways, the batch black list updating tool is done. Start on the white list updating tool today.
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

Nice thanks keep up the good work let me know when it's done.
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

HA HAHAHAHHAHAHAHAHAHHHAHA

At last my greatest work is completed!!!!

Well almost - the engine is done, now just for the front end! (damn graphics nonsense)


Anyways it is really simple to use.
a> Set the config
b> on the top of every page instantiate the class and call the function...

That simple! If it fails it will redirect and do all the other fancy things - mailing etc (if it is set in the config)

Very close to release - I can taste it!

Carnix - the empty FORWARDED FOR variable - does it return an empty string? How would I check to see if it set or not (am using if !isset at the moment)
  • xfrozenxsoulsx
  • Novice
  • Novice
  • xfrozenxsoulsx
  • Posts: 30

Post 3+ Months Ago

nice nice cant wait.
  • Fire90
  • Born
  • Born
  • Fire90
  • Posts: 1

Post 3+ Months Ago

Why not just ad a verification script that ppl have to write a random verification code from a image thing that will stop the amateour its VERY simple a proxy clicker ONLY CLICKS it cannot write and even if some can they cant read an image afterall they arent human, will you trie this and tell me if it work.



I bet this will work dont ask me to script one cuz im still learning my html so asking me would be worthless but hey i do have a heck of an idea that will stop proxy clickers!!!
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

What if the person manually re assigned their ip address through one of the hundreds of free proxies so you can't track the IP address?
  • rtm223
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 1855
  • Loc: Uk

Post 3+ Months Ago

Quote:
VERY simple a proxy clicker ONLY CLICKS it cannot write and even if some can they cant read an image afterall they arent human

Fire90, are you talking about blocking people using proxies, or blocking automated bots?

In addition the whole image thing really would not work. I have clicked on xfrozenxsoulsx link, so he got a hit from that. Thats fine by me. But i wold never have bothered to type in the code from an image, just to give him a hit - why waste those 30 seconds of my time for nothing?
  • Rabid Dog
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3245
  • Loc: South Africa

Post 3+ Months Ago

Just to let you know that I haven't given up the project. I am busy with the user interface and got stuck on a freaking div layer because I didn't spell position right and was to tired to even try debug it.

Will be continuing on it this week.

So far all you will have to do is include the file (filter file). Anyone that enters with a listed proxy will get jacked and told that they are not allowed to use the proxy that is listed.
  • Carnix
  • Guru
  • Guru
  • User avatar
  • Posts: 1098

Post 3+ Months Ago

Fire90 wrote:
Why not just ad a verification script that ppl have to write a random verification code from a image thing that will stop the amateour its VERY simple a proxy clicker ONLY CLICKS it cannot write and even if some can they cant read an image afterall they arent human, will you trie this and tell me if it work.



I bet this will work dont ask me to script one cuz im still learning my html so asking me would be worthless but hey i do have a heck of an idea that will stop proxy clickers!!!


Already been suggested:
Carnix wrote:
I'd suggest doing something like the domain name registrars and many spam blocking validation systems have done. Use a non-machine-readable keywork to validate a human clicked the link. Basically it's a random set of graphics that display short (4 or 5 digit) passwords that a user has to type in. Go do a whois at Network Solutions (https://www.networksolutions.com/en_US/ ... ndex.jhtml) and you'll see what I mean. I haven't looked, but there's bound to be some sort of GPL version of that somewhere.


Anyway, this sort of breaks the simplicity of the game itself. I think using RD's proxy prevention engine (or whatever he calls it), is the way to go at first. Perhaps, as an add-on, a human user validator could be added as additional security as well.

.c
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

Post Information

  • Total Posts in this topic: 95 posts
  • Users browsing this forum: No registered users and 86 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
cron
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.