Assumptions I’ve Seen in the HTML 5 Debate

It wasn’t until the recent flurry of the guardfather of web standards, Jeffrey Zeldman, and his posts about HTML 5 (see: In Defense of Web Developers, HTML 5 Nav Ambiguity, HTML 5 Is A Mess, and so forth) that I began looking into HTML 5.  I’m busy being pragmatic with my code today and making tough choices about browser support and figuring out how I can make HTML 4.01 work consistently in those browsers that I do support.  But with the promise of change coming I need to be on the hunt for details rather than waiting for the browsers to fully implement the spec.  If we wait for browsers to fully implement specs it could very well be after I’m out of the web development industry before they’re implemented.  I wish that last line were a funny joke, but sadly 100% implementation is not likely in the next few years because the spec isn’t complete.  The reason that the spec isn’t complete is because people on both sides of a bunch of arguments have been making assumptions.  Lets take a look at those assumptions, shall we?

Assumption 1: The Needs of the Web Are or Are Not Going To Be The Same in ‘N’ Years

I love this assumption either direction you take it. Its awesome if it stays the same because then we can clean up our markup so that WordPress 4.0 (or your blogging platform of choice) can have 16 elements repeated over and over.  We won’t have table nesting issues, we won’t have DIV-itis, we’ll have semantically pure documents.  Or not.  Because as long as we’re using the Sliding Door technique or any number of other hacks to get markup to map to CSS (note: the spec will change for this technology, too) we’ll be polluting the DOM.  Also, if the web will change: how will it change?  Assuming that we can create “The Perfect” markup language in HTML 5 is naive at best and possibly stupid at worst.  What did we need in HTML, JavaScript and CSS before the iPhone, Table PCs and the user interface in “Minority Report”?  Assuming that the human interface to data on the web and elsewhere will remain stagnant is flawed horribly.  If we ever get HTML 5 out the door, we’ll get HTML 6 out the door some decades after that (or HTML will die and we’ll move to some other markup format).  What if we’re closer to Matrix-like data input than we think and you stop taking in the web through what you consider a browser?

Assumption 2: We Need More or Less Markup Elements

There are great arguments for newer or different markup elements.  There are great arguments for using the old ones and just styling them with CSS.  There are great arguments but many of them have, on some core level, more assumptions.  Arguments for newer elements are valid for present web contents if you are looking for semantic markup.  If you take the negative view (and assumption) that there will be no really powerful algorithm in your lifetime that can really, truly process semantic markup then this is a voided argument and you move on.  The assumption and expectation trumps the ideal nature of semantic markup.  You will make no headway here.  If you assume that semantic markup will lead to better programs to parse the data then what you’re really looking for is XML + some sort of namespace and doctype information that will help computers parse the data beyond what the browser is doing.  Microformats help in this area, but are not complete enough to make all documents data fully parsable.  Also, if you’re talking about web standards I should not that data synchronization standards are diverse and incomplete in most implementations as well.

We may need less markup if we can style the div and span elements however we want.  Or so you may think.  That sort of thinking is based on the assumption that all markup is basically the same with different styling.  An oversimplification for sure, but I’ve found that oversimplification makes life easier, and so myself and others are often caught doing it here, there, or in other places we don’t talk about in polite company.  Its the Internet, so we do it publicly, but on sites that anonymize our usage, of course.  Hacks make Internet Explorer bugs more bearable, Firefox bugs less annoying, Opera work like every other browser, and Safari and Chrome like Internet Explorer (or not).

The Rise of the Pragmatist

I like to fancy myself a pragmatist.  It doesn’t buy me anything, but I can pretend that I’m practical, which is nice when someone asks you to do something and you don’t have an unlimited amount of time or money.  I don’t have time to wait for all of HTML 5 to be implemented.  I don’t have time for the web to catch up with desktop software of the 80′s and 90′s that had option/select/text entry elements that allowed the user to input any type of text, but also choose from some pre-populated options.  I don’t have time for the web to allow my DIV-itis to look much prettier when presented in a hierarchical tree-like structure.  I need markup that works now.  HTML 5 will be a step in the right direction, but it will be slowly implemented in ways we don’t know yet and we’ll see what happens.  If the Canvas element replaces Adobe Flash and SVG replaces VML: I’ll cheer.  If I don’t ever get to use either of those elements or technologies because the implementers of those technologies never get off their collective backsides within the walls of various browser vendors then I’ll be practical about what I do.  I’m going to make several assumptions about the future – as the above statements reveal.  I believe that the web will change through JavaScript and CSS libraries and hacks.  I believe that HTML standards will come, and I think that just like HTML 3, HTML 4 will one day be old.  I’m going to assume that practical software development is all about satisfying the demands of your boss, your client, or yourself with what you can do now.

But you wouldn’t be a pragmatist if you didn’t also believe that you could practically bring about resolution and change for HTML 5, HTML 6 or HTML 7.  Practical change.  Good change.  The things you discover you needed from HTML 5 that its purveyors could not foresee.  Keep pushing for better web standards, but don’t keep fighting for better web standards.  The fighting causes delay, the fighting isn’t pragmatic, and the fighting doesn’t clean up the web, it worsens it.  Data wants to be free, do your part to continue to free it.  I’m going to start by reading through the proposed HTML 5 specs to see what I’ll be up against in Internet Explorer 9.5 – whenever that comes out.

Go Native: JSON Vs eval is evil

In the world of browser performance you can find yourself looking for the little things to make big differences, or even a lot of little things to make a bigger difference together.  I’ve been researching one particular change that is coming down the pike: native JSON handling.  John Resig wrote about the need for native JSON support in the browser in 2007 and its finally come.  The difference it makes between Firefox 3.0 and 3.5 is major, the difference between Internet Explorer 7 and 8 is important, and the safety that native support brings for prevention of cross site scripting (XSS) is critical.

I’ve created two tests that you can try for yourself: the eval test and the JSON test.  The tests loop 20 times to give you a broader test range and reveal the average time.  There are notes in the test pages to clarify a few observations, but I’ll put them here just for the sake of a single source.  The test pulls in 1600 JSON objects and either evaluates them using the JavaScript eval function (eval(/*JSON String*/);), or it parses them with the native JSON parser (JSON.parse(/*JSON String*/)).  For consistency’s sake I used the data from John Resig’s test which I have copied onto my server to reduce the load on his server and not steal bandwidth.  My tests were run locally to reduce bandwidth latency influencing results, but you can see that over the Internet, even on a broadband connection, the performance only gets worse.

Firefox 3.5 has javascript tracing enabled and the typical test results will show a much slower first pass with subsequent results being much, much faster. It should never be assumed that the user will be getting the exact same data back like this test shows, so the slower performance should be expected.

Internet Explorer 8′s Eval test appears to be almost as fast as the JSON test and their eval code’s execution is pretty fast already. However, the JSON.parse() code appears to be much safer to use and is thus preferable.

The final results are based on the averages (which are much more consistent than comparing the ‘best’ numbers): eval is roughly 500 milliseconds (or 500%) slower in Firefox 3.5 the first time and nearly the same speed in Internet Explorer 8 with an average of 10 seconds slower in 20 passes.  So for either identical or much faster performance and greater safety against XSS it is a no-brainer to switch to including native JSON support as a preferred method of dealing with JSON data over eval.

I do want to note that I was impressed by Internet Explorer 8′s eval speed, it was much greater than I had expected, and generally disappointed with Firefox’s, but since it is an evil function to use, that’s not all bad.

Do Yourself a Favor: Learn Regular Expressions

The first programming language I learned was Perl. Perl was easy to do many things with and it also allowed me to manipulate text strings. Except that instead of doing it the easy way I would often write very, very convoluted chunks of code in an attempt to get the data into or out of a string that existed. I was afraid of this monster that they called RegEx, or as it is properly known: Regular Expressions. Regular Expressions allow you to write an abbreviated syntax structure that will look for matches and patterns within the text string and then, depending on the function you’re using, do a comparison (match) or do a text replacement.

Just last night I was trying to manipulate a URL and get one parameter out of it, the view parameter in JavaScript. instead of a bunch of indexOf calls and burying myself in lines of code I got the variable with one line of code:
tempStr.replace(/(^.+)(view=)([a-z_]+)(&.+)/, "$3");
JavaScript syntax allows you to use the forward slash to wrap the beginning and end of a regular experssion, then I used a group of regular expressions within that text to find the view value. Upon finding the match I printed out the third pattern that matched into a variable that is returned (patterns in this context are grouped with parenthesis).

You can learn more about regular expesesions here, but I recommend you find a tutorial for using regular expressions in the languages you code in.

Software Architecture Tips

I’m probably not the worlds foremost expert in software architecture, but I’m the son of an engineer. A civil engineer. That is to say that he’s nice to most people, and it shows in his engineering, too. Software architecture is something that I understood the least when I was first learning about software development. At that stage in the game I needed to learn how to write code and didn’t grasp the critical nature of designing the code in such a way that I didnt’ have to rebuild the wheel every time I did something. “Hello, World!,” meet impulsive idea man.

As time went on and I stubbed my toes on various ideas (accidents can lead to learning) I learned more by learning how not to do things. I didn’t go to school to be a software developer, I majored in history. As the infamous quote goes, “Those who don’t learn from history are doomed to repeat it.” Boy, did I repeat myself. Eventually I learned more about how to do things the right way and more recently I learned more about Model/View/Controller (MVC) coding. Here are some architecture tips I’ve learned, hopefully you’ll find them helpful as you learn:

Consume Open Source Code
Open source code, such as WordPress, xinha and the YUI code are places I learned about software architecture – free, and usually commented. See how and why things are done. Don’t just learn a technique, learn why to use the technique by trying to see why the open source software used a technique. Check out Sourceforge.

Learn Existing Code Libraries
If you have the chance learn some coding libraries and their functions. Often you’ll find that a code library is useful in more than one place. Matt Mullenweg has said that he re-uses components from WordPress in other projects. If your software architecture doesn’t fit with these libraries then you’ve probably got more thinking to do about your architecture. Closed systems can be a headache. I’m currently learning the PHP Cake library, and that’s powerful!

Build Your Own Libraries
Build your own libraries. No, not huge open source libraries necessarily, but find snippets of code that you can re-use as needed. Know your coding style and know how it will work for you. When you’re designing the structure of your code, these libraries will play a role in this.

Go To Libraries
Libraries house books. Free for you to look at and often for you to check-out and take home to learn from. Don’t underestimate the power of a good paper, physical book :) Software architecture is going to develop over time and you’re learning and reading is going to grow if you invest in it. it won’t happen by magic (usually).

Learn What Optimized Code Looks Like
I have studied optimized coding practices for every language I’ve learned. In PHP I learned about commas as concatenators, various loop type speeds, and of course MySQL optimization to help keep queries fast. Your code architecture will need to employ these things yourself.

Join a Community
Find an online forum or email list that you can participate in. You’ll quickly find out that you, too, can help others learn. This will most certainly help you develop better software architecture practices.

Microsoft and PHP (via a Yahoo! Merger)

Matt Mullenweg asks, “If Microsoft were to buy Yahoo, I wonder if that would have an impact on PHP?” I think that everyone will have to say, “Yes.” on some level. There are two ways that this impact could take place:

1) PHP is challenged to compete with other languages in the market, that’s either ASP, Ruby, Perl, Python or some new language that comes around. JSP could be re-written to be easier (ha! Like that’s going to happen). PHP is going to be challenged by these languages, at least one of which is tied into Microsoft.
2) PHP will be challenged because Yahoo! needs more of something to deal with their demand. If Microsoft is after Yahoo! for revenue/ad related things then the engines running the machines will be left alone over time. People will adopt PHP (or whatever language) because of its functionality. I personally think that WordPress is a compelling reason to use PHP, Yahoo’s use of it is not as strong as WordPress’ because its easy to get your hands on the source and learn PHP from it.

Yahoo! has a commitment to PHP at present, unless Microsoft dumps their entire staff in charge of making Yahoo! what it is, its going to be a slow transition if it were to change over to .NET/ASP. Industry leaders in the web development/software development community are at Yahoo! working on code and making choices, if Yahoo! loses interest for them they’ll move somewhere else and employ their killer PHP skills there.

PHP libraries like PHP Cake (which I’m using for a new project) make using PHP fun and easy. I think PHP will be around for years to come because of what it is: fast, easy and powerful. I hope Microsoft causes PHP to change, Yahoo! or not. I also hope that Yahoo! opens up more than their YUI library so that coders can learn PHP stuff from them as well.

Thanks Matt for the interesting question!

The Double Save

I just finished proving my residency as a Colorado state citizen for some continued education. I needed to do this so that I didn’t have to pay out-of-state tuition fees of over $1000.00. While the staff at the Community College were trying to change my status to resident they went through all sorts of heck with generic error messages that didn’t help and java dialogs asking them if they wanted to try again until either their computer crashed, their computer became outdated, or the miraculous happened and things actually saved as they were supposed to (I am not making up the number 48 when I say they had to try 48 times to get things to save).

What really concerned me though was that the staff had been trained to get things to work by saving the information twice. You see the first save worked, but the second save refreshed the information from the database. Whomever coded the sorry Java/Oracle/Pergatory application was too lazy to write the code correctly and failed to understand the need to refresh tainted data. If people use your software please don’t train them to double save. It just isn’t worth keeping your code around if you work that way.

Quite the Character

I am amazed at how fast character arrays are compared to their string alternatives in every programming language I’ve ever used. This shouldn’t be a huge surprise but I thought I’d point out that by switching from Standard Template Library strings to char arrays in C++ on a project I’ve got going things sped up from around 25 seconds to process 55,000 lines of files to 15 seconds. Furthermore, in Java, using threading and database insertion on a previous project I was able to get even more performance increases (it was an older version of the runtime, and I switched from String to StringBuffer, which is basically an array). In JavaScript, If you have to accumulate characters into a buffer, I strongly recommend using an array and either pushing the data (if you can afford to not support IE 5) or using arrayVar[arrayVar.length] = value;.

By learning to think this way you’ll have faster code out of the box, which is always nice because it means you don’t have to spend time refactoring. You do refactor, don’t you?

Thanks to Tony Nuzzi for the String->StringBuffer conversion in Java, and Craig Kaes for helping me to better understand char *pointers and char arrays a little better.

The ‘Less is…’ Controversy

Within the last couple years I have gotten more into design methodology, planning methodology and the evaluation of code quality verses quantity. I read several blogs that attempt to broach this subject: lesscode.org and 37signals‘ blog. They’ve had a disagreement recently, which is fine, its none of my business… I don’t care. However, Jason’s post that I read about 2 minutes ago reminded me of something: Steve Wozniac. Steve was the keynote speaker at the Gnomedex several years back and that speech was presented as an MP3 on IT Conversations. What really grabbed me about his talk was not just the funny stories, which were sometimes hilarious, but also the that he took the limitations of his components to be a challenge of their optimization. If I can only have X then how can I make X as fast as possible? This is the mindset that all hardware designers, coders and product developers should have (unless the device is SUPPOSED to be slow :) ).

Learn how to write already faster code. For example read books on speed optimization so that when you code you automatically use the loop type that is quickest for the language in your scenario (that means you need to know what type of loop is fastest in what scenario). I know in PHP that various appending operators for text strings are faster (‘,’ is faster than ‘.’), using single quotes instead of double quotes makes the page get processed faster because double quotes are always parsed for PHP variables. I recommend that you learn your language, know why C++ uses character arrays instead of string objects on a more basic level. In fact, learn why Java does that as well… if you’re using Java. But learn it and know it, you’ll be glad you did, and so will your employer if you have one. And if you’re self employed… this should be old news for you.

Do You Need This Feature?

A client has been evaluating various bug tracking software, and one of the packages, Mercury Quality Center, is written with many Active-X controls. Strangely enough they have a thesaurus button on the bug entry dialog. My co-contractor and friend Matt had the following to say:

Did you notice that the defect entry screen in mercury has a thesarus button?
Why do you need a thesarus when entering a defect… to find another meaning “the darn thing doesn’t work”?

I agree whole heartedly and want to know why someone would put a feature like that in a piece of software that requires clarity and precision for entering in bugs. Quality assurance means reproducing the bug, the developer fixing it, and then quality control confirming its fixed. A thesaurus is not needed for that processes.