Sunday, August 05, 2007

Henri Sivonen: HOWTO Avoid Being Called a Bozo When Producing XML

Itt van az egesz, de a az elejet es a tartalomjegyzeket be is masolom:

Note about the scope of this document: This document focuses on the Unicode layer, the XML 1.0 layer and the Namespaces in XML layer. Getting higher layers like XHTML and Atom right are outside the scope of this document. Also, anything served as text/html is outside the scope of this document, alhough the methods described here can be applied to producing HTML. In fact, doing so is even a good idea.

Contents

  1. Don’t think of XML as a text format
  2. Don’t use text-based templates
  3. Don’t print
  4. Use an isolated serializer
  5. Use a tree or a stack (or an XML parser)
  6. Don’t try to manage namespace declarations manually
  7. Use unescaped Unicode strings in memory
  8. Use UTF-8 (or UTF-16) for output
  9. Use NFC
  10. Don’t expect software to look inside comments
  11. Don’t rely on external entities on the Web
  12. Don’t bother with CDATA sections
  13. Don’t bother with escaping non-ASCII
  14. Avoid adding pretty-printing white space in character data
  15. Don’t use text/xml
  16. Use XML 1.0
  17. Test with astral characters
  18. Test with forbidden control characters
  19. Test with broken UTF-*