UTF BOM

  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Joined: Oct 30, 2007
  • Posts: 2008
  • Loc: Belgrade, Serbia
  • Status: Offline

Post November 23rd, 2007, 8:59 am

I don't even know where to put this post......

W3C warned me I had an UTF BOM in a site I coded, and it might cause problems. So I googled it and I read and I read and I still don't get even what it is, much less how it got there (I sure didn't put it in) nor how to get it out (they give a perl script for it, but how do you use a perl script on a html document? paste it in? or attach it externally like css?) or even whether I should get it out or not.

Briefly my question is this - what are the consequences of an UTF BOM contained in your code?

How does it get there?

How do you get it out?

Thanks in advance..........................
Eagles may soar in the sky but weasels don't get sucked into jet engines.

celandine designblog
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post November 23rd, 2007, 8:59 am

  • joebert
  • Sledgehammer
  • Genius
  • No Avatar
  • Joined: Feb 10, 2004
  • Posts: 13455
  • Loc: Florida
  • Status: Offline

Post November 23rd, 2007, 12:25 pm

Quote:
they give a perl script for it, but how do you use a perl script on a html document? paste it in? or attach it externally like css?


Chances are you're supposed to pass a directory name when running the script from the command line, then the script will save each file in that directory without the signature/BOM.

What's the script look like ?
Strong with this one, the sudo is.
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Joined: Oct 30, 2007
  • Posts: 2008
  • Loc: Belgrade, Serbia
  • Status: Offline

Post November 23rd, 2007, 12:47 pm

Code: [ Select ]
# program to remove a leading UTF-8 BOM from a file
# works both STDIN -> STDOUT and on the spot (with filename as argument)

if ($#ARGV > 0) {
  print STDERR "Too many arguments!\n";
  exit;
  }

my @file;  # file content
my $lineno = 0;

my $filename = @ARGV[0];
if ($filename) {
  open( BOMFILE, $filename ) || die "Could not open source file for reading.";
  while (<BOMFILE>) {
    if ($lineno++ == 0) {
      if ( index( $_, '' ) == 0 ) {
        s/^\xEF\xBB\xBF//;
        print "BOM found and removed.\n";
        }
      else { print "No BOM found.\n"; }
      }
    push @file, $_ ;
    }
  close (BOMFILE) || die "Can't close source file after reading.";

  open (NOBOMFILE, ">$filename") || die "Could not open source file for writing.";
  foreach $line (@file) {
    print NOBOMFILE $line;
    }
  close (NOBOMFILE) || die "Can't close source file after writing.";
  }
else { # STDIN -> STDOUT
  while (<>) {
  if (!$lineno++) {
    s/^\xEF\xBB\xBF//;
    }
  push @file, $_ ;
  }

  foreach $line (@file) {
    print $line;
    }
  }
  1. # program to remove a leading UTF-8 BOM from a file
  2. # works both STDIN -> STDOUT and on the spot (with filename as argument)
  3. if ($#ARGV > 0) {
  4.   print STDERR "Too many arguments!\n";
  5.   exit;
  6.   }
  7. my @file;  # file content
  8. my $lineno = 0;
  9. my $filename = @ARGV[0];
  10. if ($filename) {
  11.   open( BOMFILE, $filename ) || die "Could not open source file for reading.";
  12.   while (<BOMFILE>) {
  13.     if ($lineno++ == 0) {
  14.       if ( index( $_, '' ) == 0 ) {
  15.         s/^\xEF\xBB\xBF//;
  16.         print "BOM found and removed.\n";
  17.         }
  18.       else { print "No BOM found.\n"; }
  19.       }
  20.     push @file, $_ ;
  21.     }
  22.   close (BOMFILE) || die "Can't close source file after reading.";
  23.   open (NOBOMFILE, ">$filename") || die "Could not open source file for writing.";
  24.   foreach $line (@file) {
  25.     print NOBOMFILE $line;
  26.     }
  27.   close (NOBOMFILE) || die "Can't close source file after writing.";
  28.   }
  29. else { # STDIN -> STDOUT
  30.   while (<>) {
  31.   if (!$lineno++) {
  32.     s/^\xEF\xBB\xBF//;
  33.     }
  34.   push @file, $_ ;
  35.   }
  36.   foreach $line (@file) {
  37.     print $line;
  38.     }
  39.   }
Eagles may soar in the sky but weasels don't get sucked into jet engines.

celandine designblog
  • joebert
  • Sledgehammer
  • Genius
  • No Avatar
  • Joined: Feb 10, 2004
  • Posts: 13455
  • Loc: Florida
  • Status: Offline

Post November 23rd, 2007, 1:37 pm

It works with single files at once instead of a directory.

On Debian/Linux/etc you would do somthing like this assuming that script is named "debom.pl" & the troubled file is named "bom.html".
Code: [ Select ]
perl /home/me/debom.pl /var/www/bom.html


Windows wouldn't be much different, though Perl is less likely to be installed on a Windows system if you don't already know about it.
Code: [ Select ]
perl "c:\documents and settings\me\desktop\debom.pl" "c:\documents and settings\me\desktop\bom.html


When it's finished that script will print "BOM found and removed." if the file contains a BOM, otherwise it will print "No BOM Found.".
Strong with this one, the sudo is.
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Joined: Oct 30, 2007
  • Posts: 2008
  • Loc: Belgrade, Serbia
  • Status: Offline

Post November 25th, 2007, 2:15 am

thanks for that!
Eagles may soar in the sky but weasels don't get sucked into jet engines.

celandine designblog
  • joebert
  • Sledgehammer
  • Genius
  • No Avatar
  • Joined: Feb 10, 2004
  • Posts: 13455
  • Loc: Florida
  • Status: Offline

Post December 6th, 2007, 6:45 am

I looked around a bit more on this subject instead of sleeping last nite.

I put together a function in PHP to remove a UTF-8 BOM from a file, but couldn't quite figure out how to detect the BOM at the binary level. I ended up converting the 3 potential BOM bytes into an ASCII hexidecimal & doing a string comparison. :scratchhead:
Strong with this one, the sudo is.
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Joined: Oct 30, 2007
  • Posts: 2008
  • Loc: Belgrade, Serbia
  • Status: Offline

Post December 10th, 2007, 3:58 am

ok, that's so cool I don't even understand what you said :)

I'll read the blog entry with great interest....
Eagles may soar in the sky but weasels don't get sucked into jet engines.

celandine designblog

Post Information

  • Total Posts in this topic: 7 posts
  • Users browsing this forum: No registered users and 94 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 2011 Unmelted, LLC. Ozzu® is a registered trademark of Unmelted, LLC.