WP
36 0
Asked
Updated
Viewed
13.1k times

I create an output file called $output which is a number of text lines with line breaks. These lines are created on a webpage for which I then use cURL to get the contents which are the essentially lines of text with line breaks.

I then create an array as follows:

$arrayoutput = explode("\n", $output);

This works quite good until the $output file becomes so large I get the following error:

Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes) ---> 128M

The $output file is not always that big, but on occasion can be.

Now I do not want to increase memory levels or anything like that. I would like to be able to limit the maximimum length of the array.

Let's say, create the array with only, 100,000 lines maximum in it, or whatever to be within memory limits. When it reaches 100,000 elements, it stops.

Is it possible to do this?

add a comment
1

2 Answers

  • Votes
  • Oldest
  • Latest
SP
106 4
Answered
Updated

Don't use explode(), instead write a function to produce the array by iteratively regex-matching a line and adding it to the array, stopping if you've reached max size. For example:

$arrayLimit = 100000;
$arrayoutput = [];

$separator = "\r\n";
$line = strtok($output, $separator);

while ($line !== false && count($arrayoutput) < $arrayLimit) {
    $arrayoutput[] = $line;
    $line = strtok($separator);
}

// $arrayoutput will contain 100000 elements or less

See strtok. Note: only the first call to strtok uses the string argument. Every subsequent call to strtok only needs the token to use, as it keeps track of where it is in the current string.

I believe you could also do this with preg_split and avoid memory issues:

$arrayLimit = 100000;
$arrayoutput = [];

foreach(preg_split("/((\r?\n)|(\r\n?))/", $output) as $line){
    $arrayoutput[] = $line;

    if(count($arrayoutput) >= $arrayLimit) {
        break;
    }    
}

// $arrayoutput will contain 100000 elements or less

With the second example you also have an easier way to match output that has different line endings as that typically varies between Linux, Windows, and Mac.

add a comment
1
Answered
Updated

It looks like you are parsing in some type of delimited file which each row is usually determined by the line break \n.

So you load each row into an array so it makes sense that something with hundreds of thousands rows is crapping out your memory. There are a few things you can look into.

  1. Write some PHP parsing logic that will literally go line by line so you can just limit the loop. This would not be an explode() command but something like fread() or if you have PHP 5+ then you should have a stream_get_line() that may work nicely for you.

  2. Why do you need to store text in an array? Figure out exactly what type of post-processing you are wanting to do once you have your array and there may be a better way of storing that information such as just writing each row into a db table.

  3. Reconfigure your webserver or pick a new one. I did a ton of text based processing for a supply chain back in 2008 and I ran into memory errors all the time. I ended up swapping from Apache to Lighttpd and for my project I never hit a memory error again. My server hardware was just a workstation converted to a webserver so I didn't have the best hardware and lighttpd seemed to work much better.

add a comment
1