UTF BOM

  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

I don't even know where to put this post......

W3C warned me I had an UTF BOM in a site I coded, and it might cause problems. So I googled it and I read and I read and I still don't get even what it is, much less how it got there (I sure didn't put it in) nor how to get it out (they give a perl script for it, but how do you use a perl script on a html document? paste it in? or attach it externally like css?) or even whether I should get it out or not.

Briefly my question is this - what are the consequences of an UTF BOM contained in your code?

How does it get there?

How do you get it out?

Thanks in advance..........................
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

Quote:
they give a perl script for it, but how do you use a perl script on a html document? paste it in? or attach it externally like css?


Chances are you're supposed to pass a directory name when running the script from the command line, then the script will save each file in that directory without the signature/BOM.

What's the script look like ?
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

Code: [ Select ]
# program to remove a leading UTF-8 BOM from a file
# works both STDIN -> STDOUT and on the spot (with filename as argument)

if ($#ARGV > 0) {
  print STDERR "Too many arguments!\n";
  exit;
  }

my @file;  # file content
my $lineno = 0;

my $filename = @ARGV[0];
if ($filename) {
  open( BOMFILE, $filename ) || die "Could not open source file for reading.";
  while (<BOMFILE>) {
    if ($lineno++ == 0) {
      if ( index( $_, '' ) == 0 ) {
        s/^\xEF\xBB\xBF//;
        print "BOM found and removed.\n";
        }
      else { print "No BOM found.\n"; }
      }
    push @file, $_ ;
    }
  close (BOMFILE) || die "Can't close source file after reading.";

  open (NOBOMFILE, ">$filename") || die "Could not open source file for writing.";
  foreach $line (@file) {
    print NOBOMFILE $line;
    }
  close (NOBOMFILE) || die "Can't close source file after writing.";
  }
else { # STDIN -> STDOUT
  while (<>) {
  if (!$lineno++) {
    s/^\xEF\xBB\xBF//;
    }
  push @file, $_ ;
  }

  foreach $line (@file) {
    print $line;
    }
  }
  1. # program to remove a leading UTF-8 BOM from a file
  2. # works both STDIN -> STDOUT and on the spot (with filename as argument)
  3. if ($#ARGV > 0) {
  4.   print STDERR "Too many arguments!\n";
  5.   exit;
  6.   }
  7. my @file;  # file content
  8. my $lineno = 0;
  9. my $filename = @ARGV[0];
  10. if ($filename) {
  11.   open( BOMFILE, $filename ) || die "Could not open source file for reading.";
  12.   while (<BOMFILE>) {
  13.     if ($lineno++ == 0) {
  14.       if ( index( $_, '' ) == 0 ) {
  15.         s/^\xEF\xBB\xBF//;
  16.         print "BOM found and removed.\n";
  17.         }
  18.       else { print "No BOM found.\n"; }
  19.       }
  20.     push @file, $_ ;
  21.     }
  22.   close (BOMFILE) || die "Can't close source file after reading.";
  23.   open (NOBOMFILE, ">$filename") || die "Could not open source file for writing.";
  24.   foreach $line (@file) {
  25.     print NOBOMFILE $line;
  26.     }
  27.   close (NOBOMFILE) || die "Can't close source file after writing.";
  28.   }
  29. else { # STDIN -> STDOUT
  30.   while (<>) {
  31.   if (!$lineno++) {
  32.     s/^\xEF\xBB\xBF//;
  33.     }
  34.   push @file, $_ ;
  35.   }
  36.   foreach $line (@file) {
  37.     print $line;
  38.     }
  39.   }
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

It works with single files at once instead of a directory.

On Debian/Linux/etc you would do somthing like this assuming that script is named "debom.pl" & the troubled file is named "bom.html".
Code: [ Select ]
perl /home/me/debom.pl /var/www/bom.html


Windows wouldn't be much different, though Perl is less likely to be installed on a Windows system if you don't already know about it.
Code: [ Select ]
perl "c:\documents and settings\me\desktop\debom.pl" "c:\documents and settings\me\desktop\bom.html


When it's finished that script will print "BOM found and removed." if the file contains a BOM, otherwise it will print "No BOM Found.".
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

thanks for that!
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

I looked around a bit more on this subject instead of sleeping last nite.

I put together a function in PHP to remove a UTF-8 BOM from a file, but couldn't quite figure out how to detect the BOM at the binary level. I ended up converting the 3 potential BOM bytes into an ASCII hexidecimal & doing a string comparison. :scratchhead:
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

ok, that's so cool I don't even understand what you said :)

I'll read the blog entry with great interest....

Post Information

  • Total Posts in this topic: 7 posts
  • Users browsing this forum: No registered users and 83 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.