regular expression help needed in PHP

  • ahevans
  • Graduate
  • Graduate
  • ahevans
  • Posts: 181

Post 3+ Months Ago

Hi,

I am trying to use a regular expression in the following code to take some HTML and pull out all the rapid share links in to an array, I don't think I've quite got my regular expression right having pulling my hair out spending three hours trying to get this right. I can't get case insensitivity to work and setting a minimum and maximum matched string length so that the beginning of the string isn't matched at the top of a webpage and the final bit three paragraphs down!

Code: [ Select ]
preg_match_all("/rapidshare.com\/files\/([0-9]{4,15})\/((.*?)(rar|zip|avi|mp3|html|mpg|mpeg|mp4|wmv|htm?))/i", $html, $matches);


Can anyone help please?
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

regular expressions in PHP support variable pattern delimiters in order to give you a chance to think about what type of pattern you're going to use and allow you to select a delimiter that will result in as few escaped characters as possible and increase readability.

Any time you plan on working with a URL, use another patten delimiter than the forward slash, such as the tilde or even a pound symbol since there shouldn't be more than a single pound sign or any tildes at all in a URL.

Also, be carefull about using double quotes around your pattern because the dollar sign has different special meanings when it's in a pattern and when it's used in plain PHP. Most of the time it's easier to understand a pattern if you use single-quotes and force variables to get sprintf'd or contat'd in.

For "rapidshare.com", you need to escape the period. You're getting away with it matching because of the situation it's in, but it gets confusing when sometimes you escape it and sometimes you don't.

I don't see any problems with your no-case modifier.
You can use the "U" modifier to make the pattenr match in an non-greedy way.

Here is how I'd write that pattern to start out with based on what I see.

Code: [ Select ]
#rapidshare\.com/files/(\d{4,15})/(.+\.rar|zip|avi|mp3|html?|mpe?g|mp4|wmv)#Ui
  • ahevans
  • Graduate
  • Graduate
  • ahevans
  • Posts: 181

Post 3+ Months Ago

Thanks for the help, I think I was confusing myself using online tools and downloadable tools that seem to use different characters in expressions, for example, one tool told me my expression was wrong when using the exact same text to compare with another tool

I changed my code to create a pattern variable using single quotes, then using the variable within the preg_match_all statement

Is it possible to limit the overall length of a match e.g. it could be possible to have a matching string that is a thousand characters long
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13504
  • Loc: Florida

Post 3+ Months Ago

Nothing immediately comes to mind.
I didn't see anything that stands out on the php.net regex manual pages.

It's probably possible with an overly complicated pattern, but you're likely better off just using strlen on the matched strings before working with them.
  • ahevans
  • Graduate
  • Graduate
  • ahevans
  • Posts: 181

Post 3+ Months Ago

thanks

Post Information

  • Total Posts in this topic: 5 posts
  • Users browsing this forum: No registered users and 36 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.