This article was previously published in my blog, Just Like a Magic
Overview
Are you somewhat confused between
Serialization and Marshaling? This writing would break this confusion
up, it would give you a basic understanding of the process of
Serialization and the process of Marshaling, and how you can get the
most out of each.
Serialization
Serialization is the process of
converting a data structure or object into a sequence of bits so that it
can be stored in a file, a memory buffer, or transmitted across a
network connection to be "resurrected" later in the same or another
computer environment. And this sequence of bits can be of any format the
user chooses; however, they are usually formed as XML or binary.
Serialization comes in many forms in
.NET Framework, it can be observed in ADO.NET, Web services, WCF
services, Remoting, and others.
For example, calling the WriteXml()
function of a DataSet serializes this DataSet into a XML file.
ds.WriteXml("data.xml");
And if we have such this structure:
public struct User
{
public int id;
public string name;
}we can get the following results if we
serialize a collection of that structure into XML:ers>
12
Mark
13
Charles
14
John
Serialization can be observed in Web and
WCF services too. The request and parameter information for a function
are serialized into XML, and when the function returns the response and
the returned data too are serialized into XML.
Actually, you don't
have to think about these XML data, CLR handles this for you.
In the same vein, when it comes to
Remoting, the sender and recipient must agree to the same form of XML
data. That's, when you send some data CLR serializes this data for you
before it sends it to the target process. When the target process
receives this XML data, it turns it back (deserializes it)
to its original form to be able to handle it.
Thus, the process of converting data
structures and objects into series of bits is called Serialization.
The reverse of this process, converting these bits back to the original
data structures and objects, is called Deserialization.
Therefore, the following ADO.NET line
does deserializes the XML file:
DataSet ds;
ds.ReadXml("data.xml");And when your application receives
response from the server or from another process, the CLR deserializes
that XML data for you.
So why XML is preferred over binary
serialization? That's because XML is text-based. Thus, it's free to be
transmitted from a process to another or via a network connection, and
firewalls always allow it.
Marshaling
Marshaling is the process of converting
managed data types to unmanaged data types. There're big differences
between the managed and unmanaged environments. One of those differences
is that data types of one environment is not available (and not
acceptable) in the other.
For example, you can't call a function
like SetWindowText() -that sets the text of a given window- with a
System.String because this function accepts LPCTSTR and not
System.String. In addition, you can't interpret (handle) the return
type, BOOl, of the same function, that's because your managed
environment (or C# because of the context of this writing) doesn't have a
BOOL, however, it has a System.Boolean.
To be able to interact with the other
environment, you will need to not to change the type format, but to
change its name.
For example, a System.String is a series
of characters, and a LPCTSTR is a series of characters too! Why not
just changing the name of the data type and pass it to the other
environment?
Consider the following situation. You
have a System.String that contains the value "Hello":
System.String str = "Hello";
The same data can be represented in an
array of System.Char too, like the following line:
System.Char[] ch = str.ToCharArray();
So, what is the difference between that
System.String variable and that System.Char array? Nothing. Both contain
the same data, and that data is laid-out the same way in both
variables. That's what Marshaling means.
So what is the difference between
Serialization and Marshaling?
C# has a System.Int32, and Windows API
has an INT, and both refer to a 32-bit signed integer (on 32-bit
machines.) When you marshal the System.Int32 to INT, you just change its
type name, you don't change its contents, or lay it in another way
(usually.) When you serialize a System.Int32, you convert it to another
form (XML for instance,) so it's completely changed.
Summary
Look, after I get back to Wikipedia
documentation for Marshaling, I realized that my answer was so
specific to C#!
I mean that, Marshaling is a very
general term used to describe transformations of memory. Theoretically,
it's more general than Serialization. In Python for instance, the terms
Marshaling and Serialization are used interchangeably. There (in
Python,) Marshaling = Serialization, and Serialization = Marshaling,
there's no difference. In computer methodology, there's a silent
difference between Marshaling and Serialization (check the Wikipedia
definition.)
So what is that
System.MarshalByRefObject class? Why that name -specifically- was used?
First, System.MarshalByRefObject class allows objects to be passed by
reference rather than by value in applications that use Remoting.
Personally, I like to say that Microsoft
.NET Framework team's name was very scientific when they have called
that object "MarshalByRefObject" with respect to that silent difference
between serialization and marshaling or maybe that name was derived from
Python, dunno!
After all, we should keep in mind that
in .NET methodology, there's a big difference between Serialization and
Marshaling, Marshaling usually refers to the Interop Marshaling. In .NET
Remoting, it refers to that serialization process.
By the way, Marshalling is so named
because it was first studied in 1962 by Edward Waite Marshall, then with
the General Electric corporation.
That's all.
Have a nice day!