Converting existing html form into xml format?
Tidy is a command-line utility which runs on a wide variety of operating systems; it uses various command-line switches (parameters) to control its processing. At a minimum, it simply cleans up your HTML by ensuring that elements are properly nested and so on; it also warns you if your HTML uses non-standard code that's likely to cause cross-browser compatibility problems. One of the most useful command-line options is -asxml ("as XML," see?), which does w
hat you seem to be asking. It will properly balance elements, per usual, but it also adds some extra information to the document.
For instance, it tacks on an XML declaration, , and a statement, which unambiguously mark this as an XML document. To the root html element it also adds a namespace-declaring attribute that identifies all elements in the document as conforming to the specific XML vocabulary known as XHTML. It even forces all element names to lowercase, since the XHTML standard requires it.
If you're asking about converting HTML to a less generic form of XML than XHTML, your task may turn out to be quite complex. For example, if you've been using HTML to mark up customer invoices, not only the customer's name but also their number, item(s) ordered, quantity, and price are probably all wrapped up inside and tags. How do you know which "kind of paragraph" contains a given kind of information, so you can turn one instance of the p element into a custname element, another into custnumber, another into price, and so on?
If you've been using CSS for styling your HTML, you may have supplied the different p elements with class="custname" (etc.) attributes and so on; if that's the case, you may be able to generate meaningful XML using an XSLT stylesheet. There may also be customized software to do the sort of conversion you want. Otherwise you're probably looking down the barrel of an ugly gun.