PDF - MOBI conversion question :)

  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

Aaaand here I am out of the blue again with a totally random question :D

Husband is trying to convert a PDF format e-book into the MOBI format so he could read it on his Kindle, except when he does the conversion the header and footer end up stuck in random places in the middle of the text on each page.

so he tried to remove the header and footer, and the program he's using (Calibre) offers some wizard that generates code for header and footer removal, in something called REGEX, which I've never heard of. But anywho the generic code it offers looks like this:

Code: [ Select ]
(?i)(?<=<hr>)((\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?\d+<br>\s*.*?\s*)|(\s*<a name=\d+></a>((<img.+?>)*<br>\s*)?.*?<br>\s*\d+))(?=<br>)


aaaaaand I can't for the love of me figure out how that should be amended to actually work on this individual book. The actual header of the book, when you look at it in code form, looks like this:

Code: [ Select ]
<A name=2></a>Header text<br>


anybody got any ideas? Also, am I in the right forum? Thanks in advance :D
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • SpooF
  • ٩๏̯͡๏۶
  • Bronze Member
  • User avatar
  • Posts: 3422
  • Loc: Richland, WA

Post 3+ Months Ago

Hmm, just so I understand. This program gave you some REGEX code. Did it also run the code on your book and it didnt do what you needed?

I'm not much of a REGEX person, so I can't tell you right off the bat if the expression is wrong. I'd have to play around with it a bit first.
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

Yeah, when we run the code as is, it doesn't seem to do anything. It could be that we're just hapless and doing it wrong though.

The running was not entirely clear to me either. The thing has a little checkbox that says 'remove header', and next to it is that line of REGEX code which you can modify. So we were working on the assumption that when you click the box it runs the code automatically (there's no box we saw that said 'run' or anything), and if you modify the code it runs it modified, I guess.

If you were feeling extra special awesome and kind, you could always download Calibre (it's around 30 megs of freeware I think) and see for yourself :D
  • SpooF
  • ٩๏̯͡๏۶
  • Bronze Member
  • User avatar
  • Posts: 3422
  • Loc: Richland, WA

Post 3+ Months Ago

Is the expression above the default or did you use the little magic wand (wizard) to the right to make it?
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

Based on what I can get from your post, I would try this regex pattern,

Code: [ Select ]
<[aA]\s+[nNaAmMeE]{4}=\d+>\s*</[aA]>[^<]+<[bBrR]{2}
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

Spoof - yeah, that was the default. The wizard wand thingie just lets you modify code, it didn't seem to generate anything new on its own. That we noticed anyway.

Joe - you are a prince! *kisses loudly on both cheeks*

Worked like a charm. We still have the '>' sign appearing throughout the text at random places, (mostly at the beginning of each paragraph) but we can totally live with that :D

Big thanks to the both of you for bothering to help out :D That's what I love about this place - whatever the problem, I just know someone's gonna crack it. Thanks guys!!!
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

Add a > at the end of the pattern. I omitted it before because I wasn't sure whether the input would use <br> or <br/> and I didn't know whether the regex engine being used needed to have forward slashes escaped. Looking back now though, I guess the forward slash in the </a> should have tipped me off.
  • celandine
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 2008
  • Loc: Belgrade, Serbia

Post 3+ Months Ago

thanks so much :D

Post Information

  • Total Posts in this topic: 8 posts
  • Users browsing this forum: No registered users and 86 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
cron
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.