The new .docx etc formats of the newer versions of Microsoft Office are done in XML. Hence the -x in the extension. The problem with this, however, is something we all know: all MS programs are bloated
pieces of shit. Those of you who occasionally fiddle with HTML will probably have experimented with the oh-this-is-convenient “save as HTML” function in Word, only to look at the resulting code in stunned admiration of the amount of resulting crap MS has managed to program Office to include.
This is a different version of the same story (ie. a rant):
When I transcribe documents, I do this using plain text editors, and then save the resulting transcriptions as rtfs (ie. “rich text format”, plain text + moderate formatting). Mostly, I use the TextEdit program that comes in macs; occasionally I have used MS Word and saved as rtfs. The results look alike, but are different underneath the hood – and the fact that gives this away is the size of the text file, as the same document can be either 4KB or 49KB, depending on whether I’ve done it in TextEdit or MS Word.
Right, so here’s a clip of the text document in question – as you can see, there is very little formatting to deal with:
Looking at the code of this rtf file (I use a nifty little code editor called Smultron 😀 ), you can see that there’s not much code in there – as it should be:
But when I save it as rtf in MS Word, the difference is obvious and pronounced. This is the beginning of the resulting document viewed in Smultron:
..and this is the section of the text corresponding to those in images 1 and 2 above.
Check out and compare the stats at the bottom of images 2 and 4. Clearly MS Word is insane. The rtf saved from Word is twelve times the size it need be, and fifteen times the length in characters. The amount of code in the sane version is about 1,000 characters: in the MS Word version, it’s about 44,000 characters. Fourty-four thousand characters!!
So what’s my point? I guess this: Know Your Tools. At the very least, learn their failings, weaknesses and limitations.