.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Home >> Articles >> C# >> Post New Resource Bookmark and Share   

 Subscribe to Articles

Working with Strings with Combining Characters

Posted By:Mohammad Elsheimy       Posted Date: July 07, 2010    Points: 25    Category: C#    URL: http://www.dotnetspark.com  

Working with Strings with Combining Characters.In this article we will see the working of strings with combining characters (like diacritics in Arabic).

This article was previously published in my blog, Just Like a Magic.


In some languages, like Arabic and Hebrew, you combine some characters with combining characters based on the pronunciation of the word. Combining characters are characters (like diacritics, etc.) that are combined with base characters to change the pronunciation of the word (sometimes called vocalization.) Some examples of combining characters are diacritics:

Base CharacterCombining Character(s)Result
1 Combining a single characterArabic Letter Teh
Arabic Letter Teh 0x062A
Arabic Damma
Arabic Damma 0x064F
Arabic Letter Teh + Damma.gif
Letter Teh + Damma
2 Combining two charactersArabic Letter Teh
Arabic Letter Teh 0x062A
Arabic Shadda
Arabic Shadda 0x0651
Arabic Fathatan
Arabic Fathatan 0x064B
Arabic Letter Teh + Shadda + Fathatan
Letter Teh + Shadda + Fathatan

When you combine a character with another one then you end up with two characters. When you combine two characters with a base one you end up with 3 characters combined in one, and so on.

Enumerating a String with Base Characters

Now we are going to try an example. This example uses a simple word,Word Muhammad (Mohammad; the name of the Islam prophet.) Word Muhammad Details

This word (with the diacritics) is consisted of 9 characters, sequentially as following:
  1. Meem
  2. Damma (a combining character combined with the previous Meem)
  3. Kashida
  4. Hah
  5. Meem
  6. Shadda (a combining character)
  7. Fatha (a combining character both Shadda and Fatha are combined with the Meem)
  8. Kashida
  9. Dal
After characters combined with their bases we end up with 6 characters, sequentially as following:
  1. Meem (have a Damma above)
  2. Kashida
  3. Hah
  4. Meem (have a Shadda and a Fatha above)
  5. Kashida
  6. Dal
The following code simply enumerates the string and displays a message box with each character along with its index:

string someName = "?????????";

for (int i = 0; i < someName.Length; i++)
  MessageBox.Show(string.Format("{0}t{1}", someName[i]));

What we get? When enumerating the string, we enumerate its base characters only.

Enumerating a String with Combining Characters

.NET Framework provides a way for enumerating strings with combining characters, it is via the TextElementEnumerator and StringInfo types (both reside in namespace System.Globalization.) The following code demonstrates how you can enumerate a string along with its combining characters:

string someName = "?????????";

TextElementEnumerator enumerator = StringInfo.GetTextElementEnumerator(someName);

while (enumerator.MoveNext())
  MessageBox.Show(string.Format("{0}t{1}", enumerator.ElementIndex, enumerator.Current));

Comparing Strings
Sometimes, you will be faced with a situation where you need to compare two identical strings differ only by their diacritics (combining characters) for instance. If you were to compare them using the common way (using String.Compare for instance) they would be different because of the combining characters.

To overcome this you will need to use a special overload of String.Compare method:

string withCombiningChars = "?????????";
string withoutCombiningChars = "????";

  withoutCombiningChars) == 0 ? "Both strings are the same." : "The strings are different!");

if (string.Compare(withCombiningChars, 
  withoutCombiningChars, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreSymbols) == 0)
  Console.WriteLine("Both strings are the same.");
  Console.WriteLine("The strings are different!");

The Kashida ? isn't of the Arabic alphabets. It's most likely be a space! So the option CompareOptions.IgnoreSymbols ignores it from comparison.

Writing Arabic diacritics

The following table summarizes up the Arabic diacritics and the keyboard shortcut for each character:

Unicode RepresentationCharacterNameShortcut
0x064BArabic FathatanFathatanShift + W
0x064CArabic DammatanDammatanShift + R
0x064DArabic KasratanKasratanShift + S
0x064EArabic FathaFathaShift + Q
0x064FArabic DammaDammaShift + E
0x0650Arabic KasraKasraShift + A
0x0651Arabic ShaddaShaddaShift + ~
0x0652Arabic SukunSukunShift + X

Using the Character Map Application

Microsoft Windows comes with an application that help you browsing the characters that a font supports. This application is called, Character Map.
You can access this application by typeing charmap.exe into Run, or pressing Start->Programs->Accessories->System Tools->Character Map.

Try it out!

Code examples for the reader to discover:


string someName = "?????????";



string a = "Adam";
string b = "Ádam";

Console.WriteLine(string.Compare(a, b) == 0 ? "They are the same." : "No, They are different.");

// Also try changing the CultureInfo object
if (string.Compare(a, b, Thread.CurrentThread.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
  Console.WriteLine("They are the same.");
  Console.WriteLine("No, They are different.");

 Subscribe to Articles


Further Readings:


No response found. Be the first to respond this post

Post Comment

You must Sign In To post reply
Find More Articles on C#, ASP.Net, Vb.Net, SQL Server and more Here

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend