explode() question. results not accurate.

  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 10:27 am

Here is the code in question.

Code: [ Download ] [ Select ]
<?php
    // This Represents how data is stored in the database with <p> tags.
$article_content = "<p>This is paragraph 1.</p><p>This is paragraph 2</p><p>This is paragraph 3</p><p>This is paragraph4</p>";

    // Number of Paragraphs per Page
$para = 1;

    // Split up article by paragraphs
$paragraph = explode("<p>",$article_content);

$total_paragraph = count($paragraph);
echo $total_paragraph;

?>
  1. <?php
  2.     // This Represents how data is stored in the database with <p> tags.
  3. $article_content = "<p>This is paragraph 1.</p><p>This is paragraph 2</p><p>This is paragraph 3</p><p>This is paragraph4</p>";
  4.     // Number of Paragraphs per Page
  5. $para = 1;
  6.     // Split up article by paragraphs
  7. $paragraph = explode("<p>",$article_content);
  8. $total_paragraph = count($paragraph);
  9. echo $total_paragraph;
  10. ?>


When I run the code and print $total_paragraph it result is 5 but there is only 4 paragraphs. Why is is showing 5. could it be counting the </p> as well. is there a better way or another function other then explode to parse out each paragraph seperately?
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post May 5th, 2008, 10:27 am

  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 10:43 am

It might be doing that because it uses the <p> as the divider or boundary, and since you are starting with a <p> it might be counting the empty side to the left of it.

If I were to do this I would actually use the preg_match function as in something like this:

Code: [ Download ] [ Select ]
preg_match("/<p>.*?<\/p>/i",$article_content,$matches);
$total_paragraph = count($matches);
echo $total_paragraph;
  1. preg_match("/<p>.*?<\/p>/i",$article_content,$matches);
  2. $total_paragraph = count($matches);
  3. echo $total_paragraph;


The PHP preg_match function basically uses regular expressions for matching and stores each match into the $matches array. At that point I just counted how many elements were put into the array to figure out how many paragraphs it found. I believe this should work correctly.
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 11:00 am

Thanks big, I will play with your code i new regex would be better but im still really trying to learn them I at the point I understand what preg_match or any regex does its i get thrown off by what symbols to use in the parameters, the *,$,? ect., all those little sysmbols tend to throw me off and make my eyes crossed eye.

Anyways thats for the tip I will play with it see how it works in my little script I'm writing.
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 11:05 am

Using regex can be confusing, especially if you have not had much practice with it. I would highly recommend you spend time learning and becoming comfortable with it though as it is very powerful for many types of things you will need to do. I use pattern matching in almost every program I write at some point.
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 11:27 am

Yeah, I been looking for a good book that covers regex and php, I haven't found one yet I have found some articles on the web but haven't found one yet that puts it in words I can understand to well.

As far as your code I tried it return the result of 1 paragraph instead of 3 since there are 4 but stored in array first element is 0. but I can still access all elements of the array by using echo $paragraph[2], and so forth.

So does that mean its working or should the result be 3 instead of 1?
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 11:39 am

corretion i forgot to take out the explode() in my code thats why i was able to access paragraph array, but still figuring out why it displays 1 instead of 4 paragraphs with your preg_match
  • spork
  • HB
  • Silver Member
  • User avatar
  • Joined: Sep 22, 2003
  • Posts: 5488
  • Loc: Rochester, NY
  • Status: Online

Post May 5th, 2008, 12:05 pm

Code: [ Download ] [ Select ]
preg_match("/<p>.*<\/p>/Ui", $article_content, $matches);
$total_paragraph = count($matches);
echo $total_paragraph;
  1. preg_match("/<p>.*<\/p>/Ui", $article_content, $matches);
  2. $total_paragraph = count($matches);
  3. echo $total_paragraph;

Try adding the U option to the end of the regex to make it non-greedy.
How to Maintain Simple, Static Pages in a CakePHP Application
EEEEEEEEE! It's here!!
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 12:40 pm

The purpose of the ? in the .*? is to make it non-greedy as well. So either of those ways should have the same effect. I did make a typo earlier and had this line:

Code: [ Download ] [ Select ]
$total_paragraph = count($matches);


as:

Code: [ Download ] [ Select ]
$total_paragraph = $count($matches);


So hopefully you had taken the corrected version without the $.
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • joebert
  • Weathered
  • Genius
  • User avatar
  • Joined: Feb 10, 2004
  • Posts: 11883
  • Loc: Clearwater, FL
  • Status: Offline

Post May 5th, 2008, 12:49 pm

count($matches) is good with preg_match_all, but realisticly you're only ever going to have 0 or 1 returned from preg_match.

Using preg_match_all will likely clear this issue up. :)

Quote:
but still figuring out why it displays 1 instead of 4 paragraphs with your preg_match
Why yes, yes I am.
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 12:55 pm

I just did a quick test and joebert is correct. This code works:

Code: [ Download ] [ Select ]
preg_match_all("/<p>.*?<\/p>/",$article_content,$matches);
$total_paragraph = count($matches[0]);
echo $total_paragraph;
  1. preg_match_all("/<p>.*?<\/p>/",$article_content,$matches);
  2. $total_paragraph = count($matches[0]);
  3. echo $total_paragraph;


I figured it out after seeing the results returned from using the print_r($matches) function.
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 1:09 pm

I think i figured it out duh, i found a little tutorial on regex and just reading it as I try to learn. So let me know if this sounds correct.

Code: [ Download ] [ Select ]
preg_match("/<p>.*?<\/p>/",$article_content,$matches);


if I'm understanding this tutorial I'm reading preg_match will return a boolan value of 0 for false no matches found or 1 for true if found a match? If that is correct then that could be why when I echo $total_paragraph it has a value of 1 for the matches found and not actually the number of paragraph found?
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 1:48 pm

okay I was wrong about my last post cause if I type:

echo $matches[0];

it displays the content between the first <p> tags. if I replace [0] with another element number prints noting so all its doing is finding the first paragraph where I want to store all the paragraphs in array is what my over all goal in this project. but using all your examples its just showing its storing only 1 of the 4 paragraphs I want stored.
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 2:10 pm

Did you use the example I last posted above with the preg_match_all function? The problem with the preg_match function is that it will only match once, while preg_match_all is like using the 'g' option with perl where it will match globally until everything is used up. I probably should have tested my code the first time before posting here, but I did test the code above and it returns 4 correctly for me now.
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • CStrauss
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 23, 2006
  • Posts: 119
  • Loc: St. Louis MO. USA
  • Status: Offline

Post May 5th, 2008, 2:12 pm

okay lol finally got it to work, i missed bigwebmasters last post but one thing kinda of throws me off if i can get an explination for. this statement what it means

Code: [ Download ] [ Select ]
$total_paragraph = count($matches[0]);


whats the purpose of count($matches[0]); and not just using $matches?

because matches is an array and [0] is the first element in that array it looks like to me count the first element in the array only?

Anyways if anyone could explain that so i can understand it better be helpful. over all thanks for the replies and the help to you all
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Joined: Dec 20, 2002
  • Posts: 7349
  • Loc: Seattle, WA
  • Status: Offline

Post May 5th, 2008, 2:51 pm

The reason why is because it is a multidimensional array. If you look at the definition for preg_match_all here:

http://www.php.net/preg_match_all

It says:

Quote:
matches:
Array of all matches in multi-dimensional array ordered according to flags.


So for example if you set the flag: PREG_PATTERN_ORDER then it would order results so that $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.

Since you used no flags it actually sets the default flag to PREG_PATTERN_ORDER and simply stored all of the matches in $matches[0]. If you used parenthesis for additional pattern matching then it would have stored another array in $matches[1].
Ozzu Hosting - Want your website on a fast server like Ozzu?
Contact US for more information about our plans and rates
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post May 5th, 2008, 2:51 pm

Post Information

  • Total Posts in this topic: 19 posts
  • Users browsing this forum: No registered users and 197 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© Unmelted Enterprises 1998-2009. Driven by phpBB © 2001-2009 phpBB Group.