.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
Gaurav Pal

Home >> Articles >> WPF >> Post New Resource Bookmark and Share   

 Subscribe to Articles

Show Word file in WPF

Posted By:JosipK       Posted Date: September 09, 2013    Points: 25    Category: WPF    URL: http://www.dotnetspark.com  

Small WPF application that loads DOCX file, reads DOCX file and displays its content in WPF.


Word 2007 documents are Office Open XML Documents, a combination of XML architecture and ZIP compression used to store an XML and non-XML files together within a single ZIP archive. These documents usually have DOCX extension, but there are exceptions for macro enabled documents, templates etc.

This article will show how you can read and view a DOCX file in WPF with a use of only .NET Framework 3.0 (without using any 3rd party code).

DOCX overview

A DOCX file is actually a zipped group of files and folders, called a package. Package consists of package parts (files that contain any type of data like text, images, binary etc.) and relationships files. Package parts have a unique URI name and relationships XML files contain these URIs.

When you open the DOCX file with a zipping application you can see the document structure and its package's parts.


DOCX main content is stored in the package part document.xml, which is often located in word directory, but it does not have to be. To find out URI (location) of document.xml, we should read a relationships XML file inside the _rels directory and look for a relationship type http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument.


Document.xml file contains XML elements defined primarily in WordprocessingML XML namespace of Office Open XML specification. The basic structure of document.xml consists of a document (<document>) element which contains a body (<body>) element. Body element consists of one or more block level elements such as paragraph (<p>) elements. A paragraph contains one or more inline level elements such as run (<r>) elements. A run element contains one or more document's text content elements such as text (<t>), page break (<br>) and tab (<tab>) elements.


In short, to retrieve and display a DOCX text content, application will use two classes: DocxReader and its subclass DocxToFlowDocumentConverter.

DocxReader will unzip the file with a help of System.IO.Packaging namespace, find the document.xml file through the relationship and read it with XmlReader.

DocxToFlowDocumentConverter will convert the XML elements from XmlReader into a corresponding WPF's FlowDocument elements.


DocxReader constructor first opens (unzips) the package from the DOCX file stream and retrieves the mainDocumentPart (document.xml) with a help of its PackageRelationship.

protected const string MainDocumentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
private readonly Package package;
private readonly PackagePart mainDocumentPart;
public DocxReader(Stream stream)
	if (stream == null)
    		throw new ArgumentNullException("stream");
	this.package = Package.Open(stream, FileMode.Open, FileAccess.Read);
	foreach (var relationship in this.package.GetRelationshipsByType(MainDocumentRelationshipType))
    		this.mainDocumentPart = package.GetPart(PackUriHelper.CreatePartUri(relationship.TargetUri));

After retrieving the document.xml PackagePart, we can read it with .NET's XmlReader class, a fast forward-only XML reader which has the same path trajectory as depth-first traversal algorithm in tree data structure.

First path, 1 to 4, shows the simplest path in retrieving a text from the paragraph element. The second path, 5 - ., shows a more complex paragraph content. In this path we will also read paragraph properties (<pPr>) and run properties (<rPr>) which contain various formatting options.

We create a series of reading methods for every element we wish to support in this path trajectory.

protected virtual void ReadDocument(XmlReader reader)
	while (reader.Read())
    	if (reader.NodeType == XmlNodeType.Element && reader.NamespaceURI == WordprocessingMLNamespace && reader.LocalName == BodyElement)
            ReadXmlSubtree(reader, this.ReadBody);
private void ReadBody(XmlReader reader) {...}
private void ReadBlockLevelElement(XmlReader reader) {...}
protected virtual void ReadParagraph(XmlReader reader) {...}
private void ReadInlineLevelElement(XmlReader reader) {...}
protected virtual void ReadRun(XmlReader reader) {...}
private void ReadRunContentElement(XmlReader reader) {...}
protected virtual void ReadText(XmlReader reader) {...}

To point out a few things you will notice in DocxReader reading methods:

- We use XmlNameTable to store XML namespace, element and attribute names. This provides us with a better looking code but we also get better performance because now we can do an object (reference) comparisons on these strings rather than a more expensive string (value) comparison since XmlReader will use atomized strings from XmlNameTable for its LocalName and NamespaceURI properties and because .NET uses string interning and cleverly implements string equality by first doing reference equality and then value equality.

- We use XmlReader.ReadSubtree method while passing the XmlReader into a specific DocxReader reading method to create a boundary around that XML element. DocxReader reading methods will now have access to only that specific XML element, rather than to the entire document.xml. Using this method has some performance penalty which we traded for more secure and intuitive code.

private static void ReadXmlSubtree(XmlReader reader, Action<XmlReader> action)
	using (var subtreeReader = reader.ReadSubtree())
    	// Position on the first node.
    	if (action != null)


This class inherits from the DocxReader and it overrides some of the reading methods of DocxReader to create a corresponding WPF's FlowDocument element.

So, for example, while reading document element we will create a new FlowDocument, while reading paragraph element we will create a new Paragraph element and while reading run element we will create a new Span element.

protected override void ReadDocument(XmlReader reader)
	this.document = new FlowDocument();

protected override void ReadParagraph(XmlReader reader)
	using (this.SetCurrent(new Paragraph()))
protected override void ReadRun(XmlReader reader)
	using (this.SetCurrent(new Span()))

Also, this class implements setting some Paragraph and Span properties which are read from paragraph property element <pPr> and run property element <rPr>. While XmlReader is reading these property elements we have already created a new Paragraph or Span element and now we need to set their properties.

Because we are moving from the parent element (Paragraph) to child elements (Spans) and back to a parent, we will have to track our current element in the FlowDocument with a variable of type TextElement (an abstract base class for Paragraph and Span).

This is accomplished with a help of CurrentHandle and C# using statement syntactic sugar for try-finally construct. With a SetCurrent method we set a current TextElement and with a Dispose method will retrieve our previous TextElement and set it as the current TextElement.

private struct CurrentHandle : IDisposable
	private readonly DocxToFlowDocumentConverter converter;
	private readonly TextElement previous;
	public CurrentHandle(DocxToFlowDocumentConverter converter, TextElement current)
    	this.converter = converter;
    	this.previous = this.converter.current;
    	this.converter.current = current;
	public void Dispose()
    	this.converter.current = this.previous;

private IDisposable SetCurrent(TextElement current)
	return new CurrentHandle(this, current);

Using the Code

To get a FlowDocument all we need is to create a new DocxToFlowDocumentConverter instance from a DOCX file stream and call Read method on that instance.

After that we can display the flow document content in WPF application using the FlowDocumentReader control.

using (var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
	var flowDocumentConverter = new DocxToFlowDocumentConverter(stream);
	this.flowDocumentReader.Document = flowDocumentConverter.Document;
	this.Title = Path.GetFileName(path);


DOCX Reader is not a complete solution and is intended to be used for simple scenarios (without tables, lists, pictures, headers/footers, styles, etc.). This application can be enhanced to read more DOCX features, but to get a full DOCX support with all advanced features would require a lot more time and knowledge of DOCX file format. Hopefully, this article and accompanied application has shown you some insights into DOCX file format and might provide basis for doing more complex DOCX related applications.

Also check out the following article Find text in Word document which uses the same DocxReader class to implement a conversion of DOCX files to a string.

 Subscribe to Articles


Further Readings:


No response found. Be the first to respond this post

Post Comment

You must Sign In To post reply
Find More Articles on C#, ASP.Net, Vb.Net, SQL Server and more Here

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend