Searchable Word Documents Programming

  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Joined: Nov 18, 2007
  • Posts: 429
  • Status: Offline

Post October 24th, 2012, 7:28 am

We have a folder full of word documents that are examples for specific purposes. There's over 1000. The filenames are not always descriptive as some are just named AC120.doc. That document may be a thank you letter template.

We would like a way to search the document for keywords, like using a search term "Thank You" and it find all the documents with that word. My first guess would be to read the doc with PHP and then maybe regexp it but it seems like a lot of processing and memory. Another time consuming project would be for myself to go through each document and tag it and then create a tag reference database and just match search terms to the tags.

I figured there had to be a python, java, or VB program out there that allowed searching of keywords within word documents.

The next stretch would be that it would be nice to have a small preview of the results.

Is this possible and/or any recommendations?

My google searches seem to pull software that just recreates the windows search utility.
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post October 24th, 2012, 7:28 am

  • spork
  • Brewmaster
  • Silver Member
  • User avatar
  • Joined: Sep 22, 2003
  • Posts: 6134
  • Loc: Seattle, WA
  • Status: Offline

Post October 24th, 2012, 1:13 pm

Important distinction: are you working with .doc or .docx?
The Beer Monocle. Classy.
  • Zealous
  • Guru
  • Guru
  • User avatar
  • Joined: Apr 15, 2011
  • Posts: 1201
  • Loc: Sydney
  • Status: Online

Post October 25th, 2012, 2:12 am

I would assume this would be for the office and so i like to use web sites for db management lets say you had a local web server setup and then you had all the documents in this local site.

Simple one could be using html or just slapping the text on or better yet run it through php and have a search feature and tag feature in an mysql db.

Even can be fancy and have some javascript make forms and so you can generate your documents and fill in a form for the blanks like name addresses and digits and such.

Something more simple would just be to arrange a folder system that arranged the documents into category's and then share the folders across a network for easy access.
  • Satwant
  • Graduate
  • Graduate
  • User avatar
  • Joined: Dec 27, 2010
  • Posts: 126
  • Loc: Bangalore
  • Status: Offline

Post October 30th, 2012, 12:10 am

if you are using windows then windows's search can help.
go to Search-> Documents(word processing, spreadsheet, etc.)

Click Use advance search option , it will open a new text box with label "A word or phrase in the document"

type the word and it will search any office document containing that keyword
Thank You
Satwant Singh Hundal
http://www.mrhundal.com
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Joined: Nov 18, 2007
  • Posts: 429
  • Status: Offline

Post October 31st, 2012, 1:51 pm

They appear to mixes some .doc and .docx. Windows search was alright but really need more of a preview. Ideas?

Zealous, that was kinda my question and you're saying read them in with php and create a tag database which is what I was thinking but it just seemed like a lot memory and processing power.

We are actually trying the copernic desktop which appears to just extend windows search functionality but does provide a nice preview pane. This may work for the few that need it until perhaps I can clean it up into categories.
  • spork
  • Brewmaster
  • Silver Member
  • User avatar
  • Joined: Sep 22, 2003
  • Posts: 6134
  • Loc: Seattle, WA
  • Status: Offline

Post October 31st, 2012, 3:50 pm

PHP? SQL? Javascript? That seems largely overkill for something like this.

devilwood: I'm assuming you're on Windows? You might want to consider using PowerShell to do this. I haven't done it personally but I know PS is easily capable of things like this. A quick search brings up a few promising results.
The Beer Monocle. Classy.
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Joined: Nov 18, 2007
  • Posts: 429
  • Status: Offline

Post November 5th, 2012, 9:12 am

Got it. That's what I was looking for. Thanks for your help.

Post Information

  • Total Posts in this topic: 7 posts
  • Users browsing this forum: No registered users and 165 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 2011 Unmelted, LLC. Ozzu® is a registered trademark of Unmelted, LLC.