Dynamically Retrieving Photos From URL

  • twalters84
  • Graduate
  • Graduate
  • twalters84
  • Posts: 161
  • Loc: Mount Savage, MD

Post 3+ Months Ago

Greetings,

I have a rather interesting programming problem right now. I recently joined the Google Affiliate Network (GAN) and got accepted into some major affiliate programs such as KMart, Target, and Sears. These businesses provide product feeds, which I downloaded and wrote a script to input everything into my database. One of the fields is a product image URL.

Using the product image URL, I had a script that retrieved the image, stored it on my local server, and then performed image manipulations such as resizing photos. Here is a quick example of this code:

Code: [ Select ]

<cfset myPath = '#ExpandPath("..\img\products\2\")#'>

<cfquery datasource="#dsnName#" name="PHOTO_LIST" maxrows="1">
SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL, PRODUCT.PRODUCT_ID, BUSINESS.BUSINESS_NAME, PRODUCT.PRODUCT_NAME, BUSINESS.BUSINESS_URL
FROM PRODUCT, BUSINESS
WHERE PRODUCT.BUSINESS_ID = BUSINESS.BUSINESS_ID
AND PRODUCT.PRODUCT_PHOTO IS NULL
AND PRODUCT.PRODUCT_GOOGLE_IMAGE_URL IS NOT NULL
</cfquery>

<cfif #PHOTO_LIST.RecordCount# NEQ 0>

 <cfloop index="i" from="1" to="#PHOTO_LIST.RecordCount#">
 
  <cfset businessURL = '#PHOTO_LIST.BUSINESS_URL[i]#'>
  <cfset photoURL = '#PHOTO_LIST.PRODUCT_GOOGLE_IMAGE_URL[i]#'>
  <cfset productID = '#PHOTO_LIST.PRODUCT_ID[i]#'>
  <cfset bizName = '#PHOTO_LIST.BUSINESS_NAME[i]#'>
  <cfset productName = '#PHOTO_LIST.PRODUCT_NAME[i]#'>
 
  <cfhttp method="get" url="#photoURL#" useragent="#CGI.http_user_agent#" getasbinary="no" result="objGET">  
   <cfhttpparam type="HEADER" name="referer" value="#businessURL#" />
  </cfhttp>

  <cfif FindNoCase("200",objGET.StatusCode)>
   
   <cfset acceptChars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'>   
   <cfset photoNameTemp = '#productName#'>
   <cfset photoName = ''>
    
   <cfset photoNameTemp = #Replace(photoNameTemp, "&quot;", "", "ALL")#>

   <cfloop index="i" from="1" to="#Len(photoNameTemp)#" step="1">
    
    <cfset strChar = Mid(photoNameTemp,i,1)>
     
    <cfif #Find(strChar,acceptChars)#>     
     <cfset photoName = '#photoName##strChar#'>
    </cfif>
     
   </cfloop>

   <cfif #Len(photoName)# GT 45>
    <cfset photoName = '#Left(photoName,45)#'>
   </cfif>
        
   <cfset loopIndex = 1>
   <cfset nameAccepted = false>
   <cfset photoNameTemp = '#photoName#'>
    
   <cfloop condition="nameAccepted eq false">
    
    <cfset photoNameTemp = '#photoName##loopIndex#.jpg'>
    <cfset myFile = '#myPath##photoNameTemp#'>
    
    <cfif NOT FileExists(myFile)>
     <cfset nameAccepted = true>
     <cfset photoName = '#photoNameTemp#'>
    <cfelse>
     <cfset loopIndex = #loopIndex#+1>
    </cfif>
    
   </cfloop>

   <cffile action="write" file="#myPath##photoName#" output="#objGET.FileContent#"/>

   <!-- IMAGE MANIPULATIONS AND DATABASE UPDATE CODE HERE -->
      
  </cfif>
   
 </cfloop>

</cfif>
  1. <cfset myPath = '#ExpandPath("..\img\products\2\")#'>
  2. <cfquery datasource="#dsnName#" name="PHOTO_LIST" maxrows="1">
  3. SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL, PRODUCT.PRODUCT_ID, BUSINESS.BUSINESS_NAME, PRODUCT.PRODUCT_NAME, BUSINESS.BUSINESS_URL
  4. FROM PRODUCT, BUSINESS
  5. WHERE PRODUCT.BUSINESS_ID = BUSINESS.BUSINESS_ID
  6. AND PRODUCT.PRODUCT_PHOTO IS NULL
  7. AND PRODUCT.PRODUCT_GOOGLE_IMAGE_URL IS NOT NULL
  8. </cfquery>
  9. <cfif #PHOTO_LIST.RecordCount# NEQ 0>
  10.  <cfloop index="i" from="1" to="#PHOTO_LIST.RecordCount#">
  11.  
  12.   <cfset businessURL = '#PHOTO_LIST.BUSINESS_URL[i]#'>
  13.   <cfset photoURL = '#PHOTO_LIST.PRODUCT_GOOGLE_IMAGE_URL[i]#'>
  14.   <cfset productID = '#PHOTO_LIST.PRODUCT_ID[i]#'>
  15.   <cfset bizName = '#PHOTO_LIST.BUSINESS_NAME[i]#'>
  16.   <cfset productName = '#PHOTO_LIST.PRODUCT_NAME[i]#'>
  17.  
  18.   <cfhttp method="get" url="#photoURL#" useragent="#CGI.http_user_agent#" getasbinary="no" result="objGET">  
  19.    <cfhttpparam type="HEADER" name="referer" value="#businessURL#" />
  20.   </cfhttp>
  21.   <cfif FindNoCase("200",objGET.StatusCode)>
  22.    
  23.    <cfset acceptChars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'>   
  24.    <cfset photoNameTemp = '#productName#'>
  25.    <cfset photoName = ''>
  26.     
  27.    <cfset photoNameTemp = #Replace(photoNameTemp, "&quot;", "", "ALL")#>
  28.    <cfloop index="i" from="1" to="#Len(photoNameTemp)#" step="1">
  29.     
  30.     <cfset strChar = Mid(photoNameTemp,i,1)>
  31.      
  32.     <cfif #Find(strChar,acceptChars)#>     
  33.      <cfset photoName = '#photoName##strChar#'>
  34.     </cfif>
  35.      
  36.    </cfloop>
  37.    <cfif #Len(photoName)# GT 45>
  38.     <cfset photoName = '#Left(photoName,45)#'>
  39.    </cfif>
  40.         
  41.    <cfset loopIndex = 1>
  42.    <cfset nameAccepted = false>
  43.    <cfset photoNameTemp = '#photoName#'>
  44.     
  45.    <cfloop condition="nameAccepted eq false">
  46.     
  47.     <cfset photoNameTemp = '#photoName##loopIndex#.jpg'>
  48.     <cfset myFile = '#myPath##photoNameTemp#'>
  49.     
  50.     <cfif NOT FileExists(myFile)>
  51.      <cfset nameAccepted = true>
  52.      <cfset photoName = '#photoNameTemp#'>
  53.     <cfelse>
  54.      <cfset loopIndex = #loopIndex#+1>
  55.     </cfif>
  56.     
  57.    </cfloop>
  58.    <cffile action="write" file="#myPath##photoName#" output="#objGET.FileContent#"/>
  59.    <!-- IMAGE MANIPULATIONS AND DATABASE UPDATE CODE HERE -->
  60.       
  61.   </cfif>
  62.    
  63.  </cfloop>
  64. </cfif>


The code above is slightly modified from the following URL:

http://www.bennadel.com/blog/903-Passing-Referer-AS-ColdFusion-CFHttp-CGI-Value-vs-HEADER-Value-.htm

At first, I was using the CGI cfhttpparam type to retrieve the photos. This worked great for downloading 2,000 photos. However, the product catalogue for the major website I am working with has about 200,000 products. After I downloaded about 2,000 photos, I started getting forbidden errors (objGET.StatusCode).

I was downloading 1 photo per minute from the affiliate server. I did not want to overload their server and thought that was a reasonable rate, but I guess I flipped a switch and got the forbidden messages.

At this point, I am wondering how do I get the rest of the photos without having to download each one manually.

I have tried something like this:

Original Script Modification:

Code: [ Select ]

<cfset myLink = 'http://www.#myDomain#/scripts/displayProductPhoto.cfm?productID=#productID#'>

<cfhttp method="get" url="#myLink#" useragent="#CGI.http_user_agent#" getasbinary="no" result="objGET">  
 <cfhttpparam type="HEADER" name="referer" value="#businessURL#" />
</cfhttp>
  1. <cfset myLink = 'http://www.#myDomain#/scripts/displayProductPhoto.cfm?productID=#productID#'>
  2. <cfhttp method="get" url="#myLink#" useragent="#CGI.http_user_agent#" getasbinary="no" result="objGET">  
  3.  <cfhttpparam type="HEADER" name="referer" value="#businessURL#" />
  4. </cfhttp>


Display Product Photo Page:

Code: [ Select ]

<cfparam name="URL.productID" default="-1" type="integer">
<cfset URL.productID = '#HTMLEditFormat(URL.productID)#'>

<cfif #URL.productID# NEQ -1>

 <cfquery datasource="#dsnName#" name="PHOTO_CHECK" maxrows="1">
 SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL
 FROM PRODUCT
 WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.productID#">
 </cfquery>
 
 <cfif #PHOTO_CHECK.RecordCount# EQ 1>
 
  <cfset imageURL="#PHOTO_CHECK.PRODUCT_GOOGLE_IMAGE_URL#">
  
  <cfoutput>
  
   <img src="#imageURL#" /> 
 
  </cfoutput>
  
 </cfif>

</cfif>
  1. <cfparam name="URL.productID" default="-1" type="integer">
  2. <cfset URL.productID = '#HTMLEditFormat(URL.productID)#'>
  3. <cfif #URL.productID# NEQ -1>
  4.  <cfquery datasource="#dsnName#" name="PHOTO_CHECK" maxrows="1">
  5.  SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL
  6.  FROM PRODUCT
  7.  WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.productID#">
  8.  </cfquery>
  9.  
  10.  <cfif #PHOTO_CHECK.RecordCount# EQ 1>
  11.  
  12.   <cfset imageURL="#PHOTO_CHECK.PRODUCT_GOOGLE_IMAGE_URL#">
  13.   
  14.   <cfoutput>
  15.   
  16.    <img src="#imageURL#" /> 
  17.  
  18.   </cfoutput>
  19.   
  20.  </cfif>
  21. </cfif>


On the display product webpage, the image is displaying correctly. However, the object returned in the cfhttp call is actually the binary webpage and not actual product photo.

Thus, I tried modifying the display product page as follows:

Code: [ Select ]

<cfparam name="URL.productID" default="-1" type="integer">
<cfset URL.productID = '#HTMLEditFormat(URL.productID)#'>

<cfif #URL.productID# NEQ -1>

 <cfquery datasource="#dsnName#" name="PHOTO_CHECK" maxrows="1">
 SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL
 FROM PRODUCT
 WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.productID#">
 </cfquery>
 
 <cfif #PHOTO_CHECK.RecordCount# EQ 1>
 
  <cfset imageURL="#PHOTO_CHECK.PRODUCT_GOOGLE_IMAGE_URL#">
  <cfimage source="#imageURL#" name="img" action="read">
  <cfset blob = ImageGetBlob(img)>
  <cfcontent type="image/jpeg" variable="#blob#">
  
 </cfif>

</cfif>
  1. <cfparam name="URL.productID" default="-1" type="integer">
  2. <cfset URL.productID = '#HTMLEditFormat(URL.productID)#'>
  3. <cfif #URL.productID# NEQ -1>
  4.  <cfquery datasource="#dsnName#" name="PHOTO_CHECK" maxrows="1">
  5.  SELECT PRODUCT.PRODUCT_GOOGLE_IMAGE_URL
  6.  FROM PRODUCT
  7.  WHERE PRODUCT.PRODUCT_ID = <cfqueryparam cfsqltype="CF_SQL_NUMERIC" value="#URL.productID#">
  8.  </cfquery>
  9.  
  10.  <cfif #PHOTO_CHECK.RecordCount# EQ 1>
  11.  
  12.   <cfset imageURL="#PHOTO_CHECK.PRODUCT_GOOGLE_IMAGE_URL#">
  13.   <cfimage source="#imageURL#" name="img" action="read">
  14.   <cfset blob = ImageGetBlob(img)>
  15.   <cfcontent type="image/jpeg" variable="#blob#">
  16.   
  17.  </cfif>
  18. </cfif>


When it tries reading the cfimage it gives me the following error:

Quote:
An exception occurred while trying to read the image.

javax.imageio.IIOException: Can't get input stream from URL!


The URL I am trying to receive the photo from appears to be on an image server. The actual URL for the product image has no file extension. When I view the source on the product image URL, it appears to be in binary format. It looks like an image might be created on the image server when somebody visits the URL and outputs it in binary format.

So my question to you guys is how do I dynamically retrieve 198,000 photos at a reasonable rate and store the photos on my local server?

Thanks in advance for any suggestions or assistance.

Sincerely,
Travis Walters
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

Post Information

  • Total Posts in this topic: 1 post
  • Users browsing this forum: No registered users and 94 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.