Help With Session IDs ! UPDATED !

  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

Quote:
Is your whole site built using this product...or just the shopping cart. Here's what the search engines are picking up as urls from your site:

"www.redcellpaintball.com/Goggles-Safety-C9.aspx?UserID=31084&SessionID=qz8QA1M1kJb9y4LDAaie"

"www.redcellpaintball.com/Guns-Markers-C44.aspx?UserID=30646&SessionID=kNVWuLWxUHtZcAh016hk"

There's thousands of pages like this.

Search google for: -aspx? site:www.redcellpaintball.com

This will return all the pages that don't end in aspx?

There's only one.


Quote:
This is an excerpt from Ablecommerce's website:
http://www.ablecommerce.com/SEO-FAQ-3-B ... 78C76.aspx

....issues that might cause the search engines not to crawl or index you website. Such as excessive use of JavaScript, linked images and ‘?’ (Question marks) in your URL’s......Research URL re-writing tools such as Apache mod_rewrite and for Windows 200x servers ISAPI_Rewrite by Helicon Technologies is an excellent tool. ISAPI_Rewrite is incorporated deeply into the IIS 5.0/6.0 layer. It’s possible to use these tools create more search engine friendly page names. For example the following rule and some modifications to your shopping cart pages your stores URL’s could be dramatically improved.


They are now offering a rwite component for the shopping cart but you have to have a server that allows com installs if you're on Windows and you have to use their database.


Quote:
If you go here and enter your url

http://www.webconfs.com/search-engine-s ... ulator.php

you can see what the spiders see...mostly dynamic urls with session id's and other parameters. It's going to difficult to pass any PR around the site with this type of setup. Which means the interior pages will not rank well.


Ok, I am obviously a laymen when it comes to this. I was under the impression that we had a tool installed already that rewrites our URLs ?? Where would I find that? For instance when I go to the site all I see is http://www.redcellpaintball.com/Tippman ... s-C70.aspx or http://www.redcellpaintball.com/Proto-P ... -C147.aspx, I don't see all the session IDs??? Also when I use the spider sim (above) some pages come up with session IDs and some don't, why is this? Can someone PLEASE help me understand this better, I don't want to wast anymore time submitting with web position gold if our URLs aren't right! Thanks, Ian
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

Here's some background on the issue so you can understand it better. There are two issues that you face.

1.Query strings – dynamically generated URLs that contain ‘?’ and ‘amp;’ characters. These can be removed, or replaced, using ‘URL rewriting’ – a technique that involves a web-server plug-in (e.g. mod_rewrite for Apache or IIS_rewrite for IIS), and the conversion of these special characters to a ‘directory’ style URL (e.g. /news/document/latest.html).

This is probably what you have installed (rewrite_mod) to avoid the dynamic urls. It looks to me that this is working but when the spirder gets to your site and refuses the tracking cookie, the cookieless otpion in your app kicks in and the url becomes dynamic....read below to see why this happens.


2. Session IDs. Many sites use ‘sessions’ to allow the persistent tracking of a user throughout a site (so that the user remains logged-in, or for user-path analysis, etc.). To allow this ‘persistence’ across multiple pages of a site, the CMS will create a unique number (session id) for the user, and store it in a) a cookie, b) a per-session cookie, or c) the query string (URL) of each internal link. As many users/browsers will not allow cookies, a) and b) are often replaced by c) when the CMS cannot create a cookie for the user. The Google spider, amongst others, will not accept cookies, and the site may therefore include the session id in URLs for the Google spider. (This is happening to your site) As Google needs to uniquely identify each page (so that it doesn’t re-index the same page multiple times), this session id will present Google with different URLs for each visit (a new session is started on each visit), and as Google cannot obtain a single unique URL for each page, it won’t index the site. To prevent this, sessions (or at least URL based session ids) should be switched off for any search-engine-spider visits. Search engine spiders can be detected (and sessions switched off accordingly) by detecting the robot’s user_agent in the HTTP headers.

Detecting the spiders is easy.

This code will do it for aspz:
<%
Dim spidercheck
Spidercheck = Request.ServerVariables("HTTP_USER_AGENT")
If Spidercheck = "Googlebot" Then
Enablesessionstate=False
End IF
%>

Make sure your dveloper reads this article to make sure that no other calls are made to the session object once the spiders session is disabled.
http://support.microsoft.com/kb/306996/EN-US/#2

The resaon you can't see the session id in the url is because your browser is accepting cookies. If you turn off cookies the cookieless setting in you app will allow you to see the session id in the url in the address bar.

I can't determine why some urls are getting formatted with the session id and some without...I would need to see the actual aspx code to figure this one out....PM it to me if you want me to take a look.

There may also be an easier fix to this issue. I came across an article where a user claimed that their shopping cart software had a directives area where they could enter in user agents to automatically disable the session object.....they didn't mention what they were using but may be you cart app has a similar feature
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

OK......So I disabled my cookies and went to your site....the menu on the left is the only set of links that get session ids....everything else on the page is ok....is the menu generated by the cart app?
  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

:oops:
Im sorry what is "cart app"?
I noticed that even if you dont use the menu nav. (cookies off) and you go to any of the product pages you do get the full session IDs?
If we imput that code into the proper place will that turn off the session IDs when a spider hits or will it just let us know when a spider hit? Where do we put it? The problem is that our developer is ablecommerce and its more of a program that the site is created in then a developer and its all server based so we cant really change much.
  • ATNO/TW
  • Super Moderator
  • Super Moderator
  • User avatar
  • Posts: 23456
  • Loc: Woodbridge VA

Post 3+ Months Ago

This tool might be worth your time to look into:
http://www.qwerksoft.com/products/iisrewrite/

Not free. But might make things easier for you.
If you are running on a Windows 2003 server with IIS 6, I'd check if they have updated testing and compatability information.
  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

I forgot to mention that everything is hosted off site at Ablecommerce!
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

Sorry I let my techie lingo slip....the cart app is the shopping cart application....essentially all the pages and files that make up the shopping cart.

That code was only a minor example and would only catch the googlebot. It would have to be expanded to catch other spiders. Based on how your site is constructed would determine the best place to insert the code. If you have a web.config file on your server then that's where I would look to put it. This would require an advanced level of coding where you would have to build and exception object and use that to determine if your visitor is a spider.... If not it will have to be at the page level.

Check out this article: http://support.microsoft.com/kb/306996/EN-US/

Read the section about turning off the sessions at the page level. In instruction #4 they tell you to set the value of <EnableSessionState> to "false"...this is where you would put your code ....I would put that script in to every page in the site...using an inclued file....this way if you need to tweak the code you don't have to edit any pages.....just the include file.

Yes...this will disable sessions when the spider visits.

"The problem is that our developer is ablecommerce and its more of a program that the site is created in then a developer and its all server based so we cant really change much." ...then I would only try this on one page just to make sure that you are doing it correctly. These's a program call "Web Reaper" that will allow you to spoof the user agent on your browser and allow you to see what the spider is seeing...or I think the spider simulator will work as well.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

"I forgot to mention that everything is hosted off site at Ablecommerce!"

Did you contact them about this issue...the IIS_rewrite will not help with the session ids....they are create after the url has been re-written. It appears that Ablecommerce's rewrite is doing fine. The real issue has to do with overriding the cookieless session id for only the spiders....the cookles option was designed for pages that will not funcion with a session id...when someone disables cookies this mode kicks in....thus screwing up the url for search spiders.
  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

Im sorry that Im so clueless when it comes to this stuff! But if money wasn't an issue what would be the best way to go about fixing this problem? I have spent the last month submitting to free directories and using web position gold, and now it seems that was all a wast of time considering that the SEs reject our URLs with session IDs, and if they don't allow cookies thats all of them! I don't want to completely scrap the site and the use of Ablecommerce, so if there is a fix that could take place so that the spiders can at least get URLs without session IDs that would be awesome. Also did I mention that I am clueless, so if you have ideas about this I would really appreciate it if you could give me a step by step. I know this is a lot to ask but I am desperate! Thanks guys!
Ian
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

The first thing would be to get hosting where you control everything...or at least can request modifications to the server. Then you'll need some one to figure out where the spider detection code should go so you can turn off session ids and the cookie requirement. Is the Ablecomm software portable....can you move it to a server of your choice?

Big Webmaster has some good hosting deals and they should be able to let you know how much control you will have over the server....I looked in the AC forum and didn't see anyone else reporting this issue....
  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

Problem solved, I think... I contacted ablecommerce today and found out that we didnt get the patch up to solve this issue. So we downloaded it and all it contained was a new session header for our includes folder and a .txt that the session header uses to identify search engines. Thanks
Ian
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

Hey Redcell...this should do it...it looks as though they are detecting if the string "UserID" is present in the url...

Len(Request.QueryString("UserID")) > 0 Then

Thus indicating that the session cookie was rejected and the variables were attached to the url...this is what happens when cookies are turned off on a browser or if a search engines robot visits you're site.....then they get the list of spiders from the spider agents text file and read it into a cache object.

arrSpiderAgents = Cache.Item("arrSpiderAgents_" & intStoreID)
If File.Exists(sSpiderFile) Then
Dim sr as StreamReader
Dim line as String
sr = File.OpenText(sSpiderFile)
line = sr.ReadLine()
while not (line is Nothing)
line = LCase(Trim(line))
If line.length > 0 Then
arrSpiderAgents.Add(line)
End If
line = sr.ReadLine()
End While
sr.Close
sr = Nothing
End If
arrSpiderAgents.Sort()

From here they do somthing pretty cool...they maintain the session by using a database to hold the session variables...this is a great solution....then they can track movement of the bot.....oncce the system knows that a bot is present on the site it then cleans up the urls so that no variables are present.

As for using this....it should either replace an existing file on your server or be a new file that will be called by existing pages as an include. You can test it since there is code to also run a DB session when a bot name is spoofed as the user agent. I'm not sure how the search engine simulators work but I have software that allows me to pass a seach engine's user agent string as the user agent in my requests....this should trigger that code and start a DB session. Let me know when you have this file installed on your server and I'll test it.
  • Redcell
  • Graduate
  • Graduate
  • User avatar
  • Posts: 122
  • Loc: Loveland, Colorado

Post 3+ Months Ago

Its installed... Is this a security risk? Is this info that ablecommerce wouldnt want me to post here?
  • ATNO/TW
  • Super Moderator
  • Super Moderator
  • User avatar
  • Posts: 23456
  • Loc: Woodbridge VA

Post 3+ Months Ago

Redcell wrote:
Its installed... Is this a security risk? Is this info that ablecommerce wouldnt want me to post here?


I'm not familiar with the program. If it's open source, it's available to anyone for free and I wouldn't worry about it. However, if you had to pay for a license, then it may be against their TOS and you might want to consider editing it out of your post.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

It seems to work. The search engine simulators don't try to spoof the user agent so they don't show really what happens when the spider gets to the site. Compare these images

No session variables
http://www.mvcomputers.com/stuff/webreaper.jpg

With session variables
http://www.mvcomputers.com/stuff/simulator.jpg

See how when the user agent is google (web reaper graphic) the urls are just plain but with the simulator they attach the user id and session id to the url. So now you should see the pages listed in the web reaper graphic start to show up in the search engines. If you are going to link to internal pages use these urls to do so....then google and the others will give you PR and all sorts of warm fuzzies. :lol:

I would remove you code post above....it has connection variables to their DB....they probably don't want those exposed. The snippets I took are ok...they are using standard calls to system objects and don't divulge any secure info.....you should see a bump in your ranks in an update or two...once those pages are indexed they will pass internal PR around the site and make it stronger....keep linking! Not just to the home page.

Post Information

  • Total Posts in this topic: 15 posts
  • Users browsing this forum: No registered users and 12 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.