Searchable Word Documents Programming

  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

We have a folder full of word documents that are examples for specific purposes. There's over 1000. The filenames are not always descriptive as some are just named AC120.doc. That document may be a thank you letter template.

We would like a way to search the document for keywords, like using a search term "Thank You" and it find all the documents with that word. My first guess would be to read the doc with PHP and then maybe regexp it but it seems like a lot of processing and memory. Another time consuming project would be for myself to go through each document and tag it and then create a tag reference database and just match search terms to the tags.

I figured there had to be a python, java, or VB program out there that allowed searching of keywords within word documents.

The next stretch would be that it would be nice to have a small preview of the results.

Is this possible and/or any recommendations?

My google searches seem to pull software that just recreates the windows search utility.
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • spork
  • Brewmaster
  • Silver Member
  • User avatar
  • Posts: 6252
  • Loc: Seattle, WA

Post 3+ Months Ago

Important distinction: are you working with .doc or .docx?
  • Zealous
  • Guru
  • Guru
  • User avatar
  • Posts: 1244
  • Loc: Sydney

Post 3+ Months Ago

I would assume this would be for the office and so i like to use web sites for db management lets say you had a local web server setup and then you had all the documents in this local site.

Simple one could be using html or just slapping the text on or better yet run it through php and have a search feature and tag feature in an mysql db.

Even can be fancy and have some javascript make forms and so you can generate your documents and fill in a form for the blanks like name addresses and digits and such.

Something more simple would just be to arrange a folder system that arranged the documents into category's and then share the folders across a network for easy access.
  • Satwant
  • Graduate
  • Graduate
  • User avatar
  • Posts: 127
  • Loc: Bangalore

Post 3+ Months Ago

if you are using windows then windows's search can help.
go to Search-> Documents(word processing, spreadsheet, etc.)

Click Use advance search option , it will open a new text box with label "A word or phrase in the document"

type the word and it will search any office document containing that keyword
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

They appear to mixes some .doc and .docx. Windows search was alright but really need more of a preview. Ideas?

Zealous, that was kinda my question and you're saying read them in with php and create a tag database which is what I was thinking but it just seemed like a lot memory and processing power.

We are actually trying the copernic desktop which appears to just extend windows search functionality but does provide a nice preview pane. This may work for the few that need it until perhaps I can clean it up into categories.
  • spork
  • Brewmaster
  • Silver Member
  • User avatar
  • Posts: 6252
  • Loc: Seattle, WA

Post 3+ Months Ago

PHP? SQL? Javascript? That seems largely overkill for something like this.

devilwood: I'm assuming you're on Windows? You might want to consider using PowerShell to do this. I haven't done it personally but I know PS is easily capable of things like this. A quick search brings up a few promising results.
  • devilwood
  • Silver Member
  • Silver Member
  • User avatar
  • Posts: 436

Post 3+ Months Ago

Got it. That's what I was looking for. Thanks for your help.

Post Information

  • Total Posts in this topic: 7 posts
  • Users browsing this forum: No registered users and 65 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
cron
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.