Saturday, June 11, 2005

Greg Wilson: Top Ten Data Crunching Tips and Tricks

Every day, all over the world, programmers have to recycle legacy data, translate from one vendor's proprietary format into another's, check configuration files, and yank data out of web server logs. This kind of programming is usually called data crunching, and while it's not glamorous, knowing how to do it with the least amount of effort can make the difference between meeting a deadline and making another pot of coffee. These ten tips will take the headache out of crunching data.

1. Master the Classic Tools
2. Separate Input, Processing, and Output
3. Store Format Information in the Data Itself
4. Understand International Character Sets
5. Use Reluctant Matches and Graphical Tools
6. Nest and Subtract to Negate
7. Check for Holes
8. Use XSLT (with Caveats)
9. Use XPath Without Using XSLT
10. Check Your Work