.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
david stephan
Gaurav Pal
Post New Web Links

Regular expression for Ms Word HTML Markup

Posted By:      Posted Date: April 10, 2011    Points: 0   Category :Windows Application

I am trying to create a regular expression that searchs for all html tags and attributes that start with 'mso', ending with a semi-colon.  If found, it would replace it with an empty string.

eg. Input string:

<span style="font-family: Arial, sans-serif; font-size: 11pt; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: Times New Roman; mso-bidi-font-family: Times New Roman; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA">My text goes here</span>


<span style="font-family: Arial, sans-serif; font-size: 11pt;">My text goes here</span>

All the attempts I have tried have yielded no results. I have tried: (\bmso)\w*[;]\b and this - \bmso\w+(?=;\b) among many other variations of the two.  Any help would be appreciated.


View Complete Post

More Related Resource Links

Regular expression for MS Word HTML Markup


I am trying to validate a rich text box to remove some ms word content but I want to keep the other attributes that are not MS office related.

input: <span style="font-size: 11pt; mso-bidi-font-size: 12.0pt; mso-fareast-font-family: &#34;Times New Roman&#34;; mso-bidi-font-family: Times New Roman&#34;; mso-ansi-language: EN-US; mso-fareast-language: EN-US; mso-bidi-language: AR-SA; font-family: Arial&#34;, &#34;sans-serif&#34;">mso-bidi-language: AR-SA; mso-My Text Here; mso-My Text Here2</span>

I first run this regex to remove the unformed html (\&\#34\;).  Then, I remove everything inside the =" ending with "

regex: (?<=\=\")(.*?)(?=\")
: <span font="" style="">mso-bidi-language: AR-SA; mso-My Text Here; mso-My Text Here2</span>

Now it removes everything inside the tags and not the text. With the following, it removes the matching pattern, but it also removes the text that i need and the end tags.

: <span font="font-size: 11pt;" style="font-family: Arial, sans-serif;">

When i try and combine the two expressions, it does not provide the correct output.

Regular Expression to Match


Here is the kind of text I want to match via a regular Expression

The id="dgSchedule" is always present in the TAG but its location may differ

The table is span over multiple line/contains white spaces/tabs...

I have the regular expression to match the start and end tag respectively over a single line

\<table .*\>


The problem is to match the whole table span over multiple lines

<table cellspacing="0" rules="all" border="1" id="dgSchedule" style="border-style:None;height:100%;width:100%;border-collapse:collapse;">
	<tr class="blackbar" align="center" style="background-color:#7FB4DE;font-family:Verdana;font-weight:bold;height:20px;">
		<td>From Place</td><td>To Place</td><td>Time</td><td>Bus Type</td>

	</tr><tr style="background-color:#DEECF5;">
	</tr><tr style="background-color:#EFF5FA;">
	</tr><tr style="background-color:#DEECF5;">


Regular Expression Word Count :: Problem when word wrapped in "", need "eg" to be counted as 1 word


Hi very new to regular expression, i have the following js function to return the number of words displayed in a text area.


 var matches = textarea.replace(/<[^<|>]+?>|&nbsp;/gi,' ').match(/\b/g);
 var count = 0;
 if(matches) {
  count = matches.length/2;


This function works perfectly except when a word is contained in double quotes: eg "word"

This will come back as 3 words, and I need it to be 1

Any help would be much appreciated.


Security Briefs: Regular Expression Denial of Service Attacks and Defenses


Microsoft security expert Bryan Sullivan believes denial-of-service blackmail attacks will become more common as privilege escalation attacks become more difficult to execute. He demonstrates how to protect your apps against regular expression DoS threats.

Bryan Sullivan

MSDN Magazine May 2010

Help with regular expression


I am using this regular expression: /.*-lyrics-.*$

and I need the expression find urls like this:

and it do really does that!

The problem is that it finds also this URL:

Whar regular expression should I use to exclude urls that end with /lyrics ?

Thanks :]



Ms word to HTML


Hi all, i designed some contents from MS word then save it as web page then open that web page in browser then view source, copy this view source and paste into the AJAX Html Editor and clik the button then this content will be display in a Gridview on another page. But the problem is if i applied this content then the display page design totally collapsed. I think there is some problem with MSword to HTML conversion.. what was the problem and how to solve it?

Need a regular expression


I have a required field validator for a texbox, but I just found out that the texbox can have a space in it (for a delimiter, let's say).

I checked the .net and regexlib.com, but couldn't find what I was looking for.

I simply need a regular expression that basically excepts any character string including spaces in it or even a space by itself

Can someone help me out?

regular expression

hi...can anobody help me that how to write regular expression for textbox..such as the textbox should accept only numbers otherwise alert msg should be displayed ..displaying that u cannot enter the charcter in the textbox..i want this in javascritp.. 

Matching set of characters via regular expression

I need to be able to match sets of quote and space characters in a string and replace these accordingly. I currently have this done in 2 lines of code, but would like to use regular expression so that I can use only 1 line of code. See below: var Qt = unescape("%22"); //quote char var Sp = unescape("%20"); //space var Cr = unescape("%0d"); //carriage return var Lf = unescape("%0a"); //line feed var CrLf = Cr + Lf;  //carriage return & line feed stringOut = stringIn.replace(Qt+Qt+Sp+Qt,Qt+Qt+CrLf+Qt); stringOut = stringOut.replace(Qt+Qt+Sp+Qt+Qt,Qt+Qt+CrLf+Qt+Qt);   I want to replace space in string like in pattern below with carriage return line feed: (in hex) 22 22 20 22 or 22 22 20 22 22. This needs to return 22 22 0D 0A 22 or 22 22 0D 0A 22 22   Line in green above will only match and replace 22 22 20 22 , but I want have regular expression into 1 line of code which  will match and replace either 22 22 20 22 or 22 22 20 22 22  

Regular Expression for Digits, comma & space combination

Hi Can you suggest a Regular Expression for the following category. A text box which should contain: Max size : 53 0-9 digits + a comma + a space The 0-9 numbers should be repeated 5 times and separated by a comma and a space Eg: 123456789, 123456789, 123456789, 123456789, 123456789 Thanks, David.      

Need Regular Expression for d/mm/yyyy hh:mm:ss

Hi everybody,i am stuck in generating a regular expresion for this typem/dd/yyyy hh:mm:ssbecause i am dealing with some date values, for example i have  date values like bellow, 3/29/2007 13:28:343/27/2007 17:12:36so i need to find out a regular expression type in order to validate this, i already got some regular expression string like this ,^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}$but this is working only for 03/29/2007but if i use 3/29/2007 it wont work , or if use 03/27/2007 17:12:36 still it wont work so i really need you guys help on this to generate my regular expression string thanksregrdssukavi

need regular expression to get message body

I have text like this: <tr class="NormalRow_Small"> <td colspan=2>&nbsp;</td> <td colspan=3> </td> </tr> <tr class="NormalRow_Small"> <!--description--> <td colspan="5"> <font size="2">Hi All,<div><!br /></div><div>I'm glad to introduce ...</div><div><!br /></div><div>Actually we already have ... <font size=1>(see link for full text)</font></font><br> <font size="1"> </td> </tr> <tr> <td colspan=5><hr /></td> </tr>   I want to get bold text, please help me.   Thanks, Alex.

Need a Specific Regular Expression

I am clueless when it comes to building regular expressions. I know what they are, but haven't been able to master them. I found this regular expression which will validate a Sprint/Nextel Direct Connect number - all numbers, only asterisks allowed. ^\d+\*\d+\*\d+$ http://www.regexlib.com/REDetails.aspx?regexp_id=1730 Can someone modify it for me so that it checks for a minimum and maximum number of numbers in the string and checks that there are (2) asterisks? If you can use a minimum of (6) and a maximum of (12), I should be able to figure out how to modify it myself after I call Sprint on Monday.  David H

regular expression: Read a multi line paragraph

I am trying to build a regular expression that will capture a paragraph of any length that starts with "Ordinance Summary:" and ends with "<=>" but doesn't actually include them in the selection.  And I need it to stop at the first instance of <=>. Here is an example of what what I might encounter: "Ordinance Summary: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas porttitor congue massa. Fusce posuere, magna sed pulvinar ultricies, purus lectus malesuada libero, sit amet commodo magna eros quis urna. Nunc viverra imperdiet enim. Fusce est. Vivamus a tellus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Proin pharetra nonummy pede. Mauris et orci. Aenean nec lorem.<=> Ordinance Sponsor:Rhuarch<=>"   And here is what I would need it to actually capture: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas porttitor congue massa. Fusce posuere, magna sed pulvinar ultricies, purus lectus malesuada libero, sit amet commodo magna eros quis urna. Nunc viverra imperdiet enim. Fusce est. Vivamus a tellus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Proin pharetra nonummy pede. Mauris et orci. Aenean nec lorem.   And here is the expression I tried which didn't work: (?<=Ordinance Titl

regular expression for file upload box

I have a regular expression that check for filenames in upload file dialog box . The main aim is to disallow any special characters in filenames.  ValidationExpression="^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w\-\. ]*))+\.([a-zA-Z]*)$" But this one will not allow filenames with back slashes   I want to modify the reg ex to except both of the following type filenames c:\temp\abcd.txt  and also the way filenames are displayed in firefox browser (with backslashes) file:///c:/BIDS2/abcd.txt   I need help with modifying the regex, so that it excepts both   Thank you

Function count through Regular Expression

Hi, I have a table with varchar(max) column. This column stores all the function calls to application. Example 1    SUM(SALARY); AVG(LEAVES)/COUNT(DAYS); 2    SUM(SALARY)/AVG(SALARY); AVG(LEAVES)/COUNT(DAYS); COUNT(EMPLOYEES); The above output shows the requests of User 1 and 2. User 1 called SUM, AVG and COUNT functions once. User 2 called SUM once and AVG, Count twice. I need to calculate these counts through Regular expression in SQL Server 2005. Following is script for table and data generation: CREATE TABLE [dbo].[UserRequests] (  [UserID] [int] NOT NULL,  [Requests] [varchar](255) NULL ) ON [PRIMARY] insert into [dbo].[UserRequests] values (1, 'SUM(SALARY); AVG(LEAVES)/COUNT(DAYS);') insert into [dbo].[UserRequests] values (2, 'SUM(SALARY)/AVG(SALARY); AVG(LEAVES)/COUNT(DAYS); COUNT(EMPLOYEES);') Any suggestion or idea would be highly admired. PS: if someone has another optimized solution for this problem, kindly suggest that as well. Regards.

Html editor word wrap problem

I created an html editor which uses an iframe for designing. The iframe doesnt wraps the text within it.... but the text keeps on extending in the same row towards the right. i tried setting scrolling='no' but it didnt work... tried word-wrap:break-word but thats also of no use........ please help! 
ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend