.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
Gaurav Pal

Home >> Articles >> .Net Framework >> Post New Resource Bookmark and Share   

 Subscribe to Articles

Find text in Word documents

Posted By:JosipK       Posted Date: September 10, 2013    Points: 25    Category: .Net Framework    URL: http://www.dotnetspark.com  

Small Windows application that reads multiple DOCX files and performs a search operation on their textual content.


This article shows how to perform string or regex search on multiple DOCX files in the specific directory.

Accompanied application will demonstrate how to read DOCX files, convert them to text and search for specific string or regex on that text. It is based on Show Word file in WPF article which explains DOCX file format and implements DOCX reader used in this article so I would recommend reading it before this one.


We will use the same DocxReader class from the article mentioned above to unzip the DOCX files and to read DOCX main part (document.xml) with XmlReader. Also we will implement a converter (DocxToStringConverter) which will convert specific XML elements (or their content) from document.xml to strings.


This class inherits from the DocxReader and overrides its virtual reading methods to create strings like this:

  • while DocxReader is reading document element (<document>), we will create a new StringBuilder which will be used for appending all of the DOCX text content:

protected override void ReadDocument(XmlReader reader)
       this.text = new StringBuilder();

  • after DocxReader reads paragraph element (<p>), we will append new line to the StringBuilder:

protected override void ReadParagraph(XmlReader reader)


  • while DocxReader is reading text element (<t>), we will append the content of that element to the StringBuilder:

protected override void ReadText(XmlReader reader)


This simple Windows Form user interface will enable you to search DOCX files in specific directory (and its subdirectories) and will show the search results in the ListView control using the below code:

private void btnSearch_Click(object sender, EventArgs e)
       // ...

       foreach (var filePath in Search(this.txtDirectory.Text, this.txtSearch.Text, this.cBoxUseSubdirectories.Checked, this.cBoxCaseSensitive.Checked, this.rBtnRegex.Checked))
              var file = new FileInfo(filePath);
              this.resultListView.Items.Add(new ListViewItem(new string[] { file.Name, string.Format("{0:0.0}", file.Length / 1024d), file.FullName }));

Depending on the user choice we will perform regex or string search on current DOCX file. To accomplish this we will use Predicate<T> delegate to implement these two search options like in the following code:

var isMatch = useRegex
              ? new Predicate<string>
              (x => Regex.IsMatch(x, searchString, caseSensitive ? RegexOptions.None : RegexOptions.IgnoreCase))
              : new Predicate<string>
              (x => x.IndexOf(searchString, caseSensitive ? StringComparison.Ordinal : StringComparison.OrdinalIgnoreCase);

Delegate isMatch is used in method which iterates over all DOCX files in the specified directory, converts them to text and returns path to every DOCX file that satisfies the isMatch delegate using the C# iterator (yield return statement) like in the following code:

foreach (var filePath in Directory.GetFiles(directory, "*.docx", searchSubdirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly))
       string docxText;

       using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
              docxText = new DocxToStringConverter(stream).Convert();

       if (isMatch(docxText))
             yield return filePath;

The resulting DOCX files listed in the ListView control can be activated to show them in your default DOCX viewer (usually Microsoft Word).

private void resultListView_ItemActivate(object sender, EventArgs e)
       string filePath = ((ListView)sender).SelectedItems[0].SubItems[2].Text;
       if (File.Exists(filePath))


Article Show Word file in WPF demonstrated how to convert DOCX to WPF's FlowDocument, and this article demonstrated how to convert DOCX to plain text using the same DOCX reading code. By combining these two articles, you could, for example, convert DOCX to HTML. Hopefully this article has shown you some basis of reading DOCX files and how to convert DOCX to other representations by reusing the same DOCX reading code in all of these conversions.

 Subscribe to Articles


Further Readings:

Author: arronlee         Company URL: http://www.dotnetspark.com
Posted Date: September 29, 2013

I wonder what''s the difference between the tool you mentioned above and the Word document Creator. I prefer to do document processing work with the help of some manual tools. Do you have any ideas?
Author: JosipK         Company URL: http://www.dotnetspark.com
Posted Date: March 01, 2014

Apologize if I misunderstood you, but what tool are you referring to, a DocxReader class?
This article provides a pure .NET Framework solution for reading DOCX files, so in short it is free and open source.

You can rarely find any resources about Word processing in .NET without a use of any SDK and to me that is understandable because, as mentioned in the article, this written solution here is not a complete one, but intention was that people can use it for a simple DOCX processing scenarios and if needed enhance the solution with ease.
To cover a more complex scenarios it will require some time to complete, of course depending on the level of coverage which is required it can even take an extremely long time.

That is why for more advance document processing tasks I prefer to use some 3rd party member as well.
The tool you mentioned looks useful, unfortunately I didn't find time to try it out.
But at a first glance I was unable to find mail merging features (if it has eluded me, please post a link to it). To me personally mail merging and conversion to PDF are two key features that word processing 3rd party components needs to cover. For example take a look at a word component I'm working on: http://www.gemboxsoftware.com/document/overview

Post Comment

You must Sign In To post reply
Find More Articles on C#, ASP.Net, Vb.Net, SQL Server and more Here

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend