PHP IMAP Reading Message Body Character Set Problem

  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

We have an POP/IMAP email bin setup that receives emails containing client followup information.

Here is how I get my message body from one of the messages:

Code: [ Select ]
$body = imap_qprint(imap_fetchbody($connection, $msgno, 2));


If I print $body to the screen I get an html formatted output with this meta tag:

Code: [ Select ]
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">



I need to parse the information but there's no delimiters. However, the text is fairly structured like:

Code: [ Select ]

TELEPHONE:
HOME: _ WORK #: _ MOBILE #: (555) 555-5555
Email: dan@mail.net
Father's Name:
  1. TELEPHONE:
  2. HOME: _ WORK #: _ MOBILE #: (555) 555-5555
  3. Email: dan@mail.net
  4. Father's Name:



It appears I can use a strpos, substr, strlen function to get the text between, for instance, 'HOME:' and 'WORK #:'. In this case it would be just an underscore. My problem is with a field like Father's Name:.

My 'get text between' function does not seem to read the apostrophe or commas for that matter.

I'm just not having any luck converting the character set.

Code: [ Select ]
$converTo = 'ISO-8859-1';
$current_encoding = mb_detect_encoding($body, 'auto');
$body = mb_convert_encoding($body,$converTo,$current_encoding);
  1. $converTo = 'ISO-8859-1';
  2. $current_encoding = mb_detect_encoding($body, 'auto');
  3. $body = mb_convert_encoding($body,$converTo,$current_encoding);


I've tried several different methods. At this point, I need to just get some guidance. How do I handle apostrophes when using PHP IMAP to read email message bodies?
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Posts: 9089
  • Loc: Seattle, WA & Phoenix, AZ

Post 3+ Months Ago

If it were me I would probably just use preg_match and write something like:

PHP Code: [ Select ]
$matches = array();
 
// Get Home
if(preg_match("/^HOME: (.*?)WORK/", $body, $matches)) {
   $home = $matches[1];
}
 
// Get Father's Name:
if(preg_match("/^Father's Name: (.*?)/", $body, $matches)) {
   $father = $matches[1];
}
  1. $matches = array();
  2.  
  3. // Get Home
  4. if(preg_match("/^HOME: (.*?)WORK/", $body, $matches)) {
  5.    $home = $matches[1];
  6. }
  7.  
  8. // Get Father's Name:
  9. if(preg_match("/^Father's Name: (.*?)/", $body, $matches)) {
  10.    $father = $matches[1];
  11. }


If your format stays the same all the time you may be able to just construct one large preg_match for all of the matches using preg_match_all. I broke it apart above for simplicity.
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

Actually, there are some other issues which won't allow me to do that on all the elements even though the format is the same all the time. However, I can piecemeal it and for the elements with apostrophes I can just do a match and remove them from my 'get text between' function.

I'm going to work that up using your Get Father's Name example. Thanks BWM.
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

I think I'm running into the same problem. Preg_match won't resolve the string because of the apostrophe. I'm betting 100,000,000 to 1 it's the apostrophe. The preg_match 'if' statement isn't firing which I believe is because it is not matching the string.

My problem is that I don't know exactly what PHP is reading. If I did then I could just put that in my preg_match or strpos.

Otherwise, I would need a little less restrictive preg_match that would ignore the apostrophe.
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

Got it.

I connected Outlook to the mailbox and viewed the message in Outlook. In the reading pane of Outlook I highlighted the apostrophe and copied and pasted it in my expression in my PHP script. It pasted a backwards looking apostrophe in my PHP editor. This seems to recognize the apostrophe.


I should be able to now run comparisons for that symbol in my PHP script.

In this case, it seems character conversion really doesn't do as I would expect. I would assume the PHP conversion functions would do the heavy lifting as far as finding the correct match. For instance, I have a string in Windows-1252 and I want UTF-8. I would just use iconv or something and I would get the same string in UTF-8. I understand it's not a perfect match, but apostrophes... c'mon man. PHP should find the equivalent in any character set. However, I get black diamonds with questions marks inside of them.

Thanks again for your help. This along with the preg_match should get me rolling.

Post Information

  • Total Posts in this topic: 5 posts
  • Users browsing this forum: No registered users and 70 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.