.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
Gaurav Pal
Post New Web Links

remove html tags from that text being extracted from html content

Posted By:      Posted Date: October 17, 2010    Points: 0   Category :.NET Framework

Hi there

I have been trying to extract text from the html content ,But, had resulted with some html tags within the extracted text  , what i should i do in order  to extract only plain text wothout having too much tags or stricpts , i am testing  on differents webpages,so html elements are unknow or chagnes from time to time .i am using Html Agility Pack also . here some of code

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

  doc.Load(new StringReader(result));

   HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//body");

   string results  = bodyNode.InnerText.ToString();

regex code
         public string removehtmltags_Regex(string source)
            return Regex.Replace(source, "<.*?>", string.Empty);
   result from this code ,


2:59:46 PM: var _GlobalNavHeaderUtf8Encoding=true;var includeHost="http://include.ebaystatic.com/";Skip to main contentBuyMy eBaySellCommunityContact usHelpBasketBasketvjo.darw

View Complete Post

More Related Resource Links

How to eliminate html tags from text boxes

Hi all,   In my application I am using a login screen. While a new user registers he has to fill the First name and Last name. There is a user search option. The details of the users will be shown in the user details page. Some users try to add html tags like </td></tr> with first name and last name. So after rendering from DB the page alignment is breaking.   How to fix this problem??   How can I avoid html tags while entering the user details?   Please give me a solution:-   Thank You  

Query Sharepoint list, rich text columns returning HTML tags

Hi! So I have an issue that maybe someone of you have encountered as well. I am querying a Sharepoint list and creating a report from the data collected from this list. This all works fine except that one of the columns in this list is of the type "Multiple lines of text" and also supports "rich text" with different fonts, sizes and so on. This has the effect that the text returned from this column is wrapped in HTML tags that specifies how the text should be formatted. Naturally I do not want this HTML code to appear in my report so for now I have used custom code to do a string.replace and replaced for example <div></div> and <br> with "". However, this column also has url references to documents and these links shows up like this "/servername/site/subsite/Gemensamma%20dokument/Mwh.docx">http://servername/site/subsite/Gemensamma%20dokument/Mwh.docx There is some sort of double reference to the document that I guess like the HTML tags are embedded in the answer from the rich text column and this is not so easy to do a string replace on because the url varies with the different document and document names. My question really is if anyone of you have done a similar report that I am doing, and have encountered this fenomen with this sharepoint column and how you did to solve this? I have tried to creat

Reading WPF rich text box content in HTML format

Hi, I am using WPF RichTextBox. From that, I am reading the content in RTF format using the following code. TextRange textRange = new TextRange(paramRichTextBox.Document.ContentStart, paramRichTextBox.Document.ContentEnd); MemoryStream msData = new MemoryStream(); textRange.Save(msData, DataFormats.Rtf); //get the data from the selected range //strData = textRange.Text; strData = ASCIIEncoding.Default.GetString(msData.ToArray()); I also found that there is an option called DataFormats.HTML. Hence, I tried using this but ended up with error. Is there any other way to read the content of RichTextBox in HTML format? Thanks in Advance!!! Best Regards, Subalakshmi Vijayarajan.

HTML Content into Text


Hey all,


is it possible to convert html into text in a field?

I've got e-mails in my report where it is html and I would need to print and display it without html.




WCF returning "The content type text/html of the response message does not match the content type of

I have a WCF service I am trying to run on a new installation of 64-bit Windows Server 2008 IIS. Although it runs fine on Windows 2003 IIS, it is throwing the error in the thread title, which appears to be a server config issue, but I am not sure. Googling and searching the MSDN forums did not turn up a solution. I tried running WCF Logging, but that didn't help either.

Does anyone have any suggestions on how to solve this probelm?

Here is the error:

The content type text/html of the response message does not match the content type of the binding (application/soap+xml; charset=utf-8). If using a custom encoder, be sure that the IsContentTypeSupported method is implemented properly. The first 1024 bytes of the response were: '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<meta http-equiv="Content-Type" content=&qu

need html tags to be removed from output text



i have some code written in c# to obtain the text and numbers from a web interface that required user inputs.

my problem is that the texts i am getting off the web is displays in html forms (i.e. it has tags all over the place)

hence, i would like to filter out the html tags, is that possible?

i searched up some codes but it didn't help me as it brought up even more problems when compile.

any helps is greatly appriciated.

Remove html tags


INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (1, N'<B>HEAD</B>', NULL)

INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (2, N'<B>Title</B>', NULL)

How to bind text containing HTML and ASP.NET tags from database



I have to show dynamic charts to clients.

I created table: ChartCodes

ChartCodeID     Code

1                      <asp:SqlDataSource ID='SqlDataSource_Chart1' runat='server' ConnectionString='<%$ ConnectionStrings:ConnectionString %>' SelectCommand='Usp_GetTotalEmissions_Org_chart' SelectCommandType='StoredProcedure'><SelectParameters>< . . .. . . . .

I wrote code of chart with sqldatasource in database column.

My problem is I could not able bind. I used DataList Control with a template and literal.

Bind literal with <%# Eval("Code") %>

But In sql I have ASP.NET tags <%  %>.   and when processing process %>, it assumes asp.net ends and wrote other text as it is in out put.

Please help me to find how to bind this text. Or any other Idea to display graphs from database as per user selection.

Background on html/rich text editor in content editor web part?


Hi All,

Does anyone know what in the CSS controls the background of the rich text/html editor in the Content Editor Web Part?  Mine is picking up the background of my page layout so I must have accidentally changed it somewhere, I'm just not sure where...


Client found response content type of 'text/html; charset=utf-8', but expected 'text/xml'.


when i access this webservice(c#) in window project it show this error

Client found response content type of 'text/html; charset=utf-8', but expected 'text/xml'. 

please reply soon....

remove unwanted parts of a string of html tags


hi,all, I have a string of tags like:

<A href

Rendering data with HTML tags in the DD Gridview for a selected column


I am having trouble finding out where and how to HTML Encode a cell's data on the Dynamic Data (v4.0) gridview of List.aspx.  As a simple case, suppose I have formatted cell data that is A<br/>B in the DB.    Obviously, I want A stacked on B in the cell.

It seems gridView1.HtmlEncode = true has gone away.

So maybe I'll try to catch it on the RowDataBound event:

protected void GridView1_RowDataBound(Object sender, GridViewRowEventArgs e)

            if (e.Row.RowType == DataControlRowType.DataRow)
                // Html Encode the cells


but this event never fires?

Has anyone figured out how to properly render HTML tags data in the List.aspx's GridView1? 

If I figure this out, then I can add a MetaAttribute called something like [EncodeAsHtml(true)] and be on my way.  Thanks!

Remove HTML formatting


Anyone know how to remove html-formatted-string to plain text programatically?

I want to get the same result as if i ctrl+a in a browser's page and then ctrl+v in a notepad.

I was googled how to strip html string, but didn't find the good result.

Removing tag using regex: "<(.|\n)*?>" causes some problems:

* The html-table is not nicely converted to tabbed plain text. It produces *crazy* table in plain text.

* the html-alternate-image lost

* the list ( <ul> & <ol> ) is not conveted well. No numbering or bullets in plain text

Extract text from html source code

i wan to extract Desired Title,Desired Text and Desired link from Webpage source code and paste in richTextBox1. <a href="/site/show/link/0/type/0">Desired Titl e</a> <a title="&lt;b&gt;Text preview&lt;/b&gt; &lt;br/&gt;Some Text" class="pdtip" href="/site/show/link/0/type/0">Text</a><br />Desired text <br /> <label class="url" for="">http://Desiredlink.com </label> Output: Desired Title Desired Title Desired link How can i do this?

Showing highlighted text (HTML) in calculated field?

Greetings all,   We're porting our 2007 list to 2010. In Sharepoint 2007, we used the following Calculated Column code to display a highlight over items that had different statuses:   =IF(Date="","<DIV style='background-color:#ff6666'>Not <b>Yet</b> Set</DIV>", IF(Finalized=TRUE,"<DIV style='background-color:#66ff66'>Finalized</DIV>", "<DIV style='background-color:#fffa91'>Tentative</DIV>"))   When we use this in Sharepoint 2010, it doesn't seem to apply the DIV structure. Any ideas of how we can change our code to make this work? Thanks! --Dave   No matter -- I need to add a code snipped ( http://marijnsomers.blogspot.com/2010/01/write-html-code-in-sharepoint-via.html ) to the page... thanks!

howcan I get whole html request text from an external website ?

for example,There is a webpage html shows "Hello World" text only.howcan I get whole html request text from an external website ?I wantto get server request html text for parsing my own needles.webclient class seems doesnot working on localhost. any clue?   thanks.
ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend