Constraints, expectations and real estate

One of my favorite shows on TV these days is (don’t laugh) the show Property Virgins on HGTV. In it, an experienced Realtor walks first-time home buyers through the house selection and offer process.

A lot of times the “let’s watch a couple pick a house” type shows highlight the inexperience of the buyers. Buyers tend to focus on the wrong thing, like paint colors or light fixtures, and gloss over things that are hard to change, like room layout. But the best part of the show happens right at the beginning, in a brilliant move to reset the usually unrealistic buyer expectations.

At the beginning of the show, the host asks the buyers a few key questions to identify both their budgetary constraints as well as their aspects of a home they most value. It could be location, layout, lot size, etc. But the buyers always have some sort of dollar amount they won’t go over.

The next step the Realtor does is rather brilliant – they take the first-time home buyers to a house that meets all of their aspects for an ideal home. It has the perfect layout, the perfect location, all the amenities they want, in the right condition, at the right size and so on. The buyers inevitably love this home, at which point the Realtor asks the buyers to guess how much the home costs. Also inevitably, the buyers are pretty far off the actual price, and inevitably their ideal home is typically at least double, if not triple or quadruple the buyers’ budget.

And inevitably, the buyers have sticker shock!

Talking buyers off the ledge

This fantastic approach accomplishes two important goals:

  • Convey what perfection costs
  • Force a prioritization

The host is never mean about showing the expensive house, but instead presents it as something to aspire towards. This also resets in the buyer’s minds that every house they see from there on out is going to have some subset of their valued aspects.

But instead of discouraging the first time home buyers, this approach tends to force them to focus on what is most important to them in a house.

Resetting expectations

We often have clients that are first-time software buyers, or first-time-with-someone-who-knows-what-they-are-actually-doing software buyers. A lot of what we do in the initial part of the project is framing the project for success. We look at everything the client wants, talk about scope and budget, then reset expectations back down to an attainable level.

What we don’t want to see happen are those buyers that look at dozens or even hundreds of homes looking for that house that checks every single checkbox and comes in at their budget. That house might exist, or it might not, but a lot of time can be wasted searching and searching.

Constraints force prioritization and hard decisions. Having an experienced guide (like that Realtor host) ensures that the buyer understands what’s feasible for their budget, as well as the guiding hand on helping to share experience on what things matter (the electric wiring needs replacing) or do not (the bedroom has shag carpet).

Delivering value is really only half the equation. It’s up to us as developers to make sure the buyer understands what they are getting for their money (or time), relative to what’s out there in the market. If you bought a house in isolation, it’s hard to know whether you’re getting taken to the cleaners. But by having frames of reference (Product XYZ is similar to yours, and took ABC man hours/dollars to build), we can center the conversation around “what’s most valuable to me” instead of “how can we squeeze everything in”.

JSON Serialization / Deserialization of DateTime Not Equal

I used the DataContractJsonSerializer to serialize a class and noticed that for a DateTime field, the deserialized result appeared identical to to original value, but the equals operator failed.  The following simple program reproduced the problem. using System; using System.Runtime.Serialization.Json; using System.IO; using System.Runtime.Serialization; public static class JsonHandler { public static byte [] SerializeToJsonBytes( object obj) { DataContractJsonSerializer dcjs ...(read more)

Improving the Git Windows experience: Downloads

I love Git. It’s very powerful tool that lets me bend my repository to my will, with commands and features that blow the other source control providers I’ve used out of the water.

However, the tooling just doesn’t do it justice. From the download, installation, integration and CLI experience, it always feels like (in Windows land) that you’re playing in someone else’s back yard.

Over the next few posts, I’m going to compare the experience of using Git with that of Mercurial, who has, in my opinion, lesser features, but a much MUCH better experience.

The Mercurial download experience

Let’s look at searching and downloading the Mercurial client. When I google “Mercurial” or “Mercurial Windows” or “Mercurial Windows Download” or variants, two of the top results are the official Mercurial home page, or the official Windows client, TortoiseHg.

From there, I want to download Mercurial. Both websites offer very clear ways of doing so. The Mercurial site:

image

And the Tortoise Hg site:

image

Two very large “download buttons”. These buttons are interesting in that:

  • They link directly to the file to be downloaded.
  • They both link to the exact same installer
  • They know what OS you’re using, and display the correct installer accordingly

TortoiseHg is the official Hg client for Windows, and includes:

  • The command line interface
  • Windows Explorer integration
  • Visual tools (Workbench etc.)
  • Visual Studio integration
  • Merge tools

It’s a completely out-of-the box client that includes EVERYTHING that you might need to run Mercurial, all in one package, and consistently presented to the end user.

Next, let’s look at the Git download experience.

The Git download experience

When searching for Git downloads, you’re primarily directed to one of two sites – the official Git site, or the official Git tools site for Windows, hosted on Google Code (and also GitHub, curiously enough). The Git site is clean enough:

image

Except I have 3 download links instead of one. Not a big deal most of the time, but already choices are presented to the end user over the Mercurial site. Clicking on the Windows link takes me to this page:

image

Instead of linking me directly to the installer file to download immediately, I’m directed to the downloads page of the Google Code site, where I am presented with yet even more options. There is nothing in this screen that screams “THIS IS THE INSTALLER YOU WANT IGNORE THE OTHERS”. As someone new to Git, how do I know which to choose? Probably the first one, and most people would choose the first one, but presenting choices here is pointless and confusing.

Not to mention, I’m whisked away to a site that has nothing to do with the original Git site. The official Git site didn’t mention “msysgit” but now I’m on the msysgit Google Code site. Even more confusing is that the file has the name “preview” in it, and the installer is labeled as “Beta”. So is the right one or not? I might be inclined to search for the last “good” release and not a beta/preview one.

The Git installer is also less featured than the Mercurial one. The official Git Windows tools include:

  • Windows Explorer integration (very limited)
  • A CLI through the Git Bash or directly in a command prompt
  • Visual tools (OK tools)

However, I typically don’t point folks to the official Git client. Instead, I point them to Git Extensions, a more fully featured toolset that includes:

  • Windows Explorer integration
  • Visual Studio integration
  • Richer visual tools
  • Bundled merge tool
  • Bundled Git installer

This isn’t the official Git Windows client, so you basically have to know it exists to find it. Almost none of the online tutorials recommend it, even though it matches much more closely to what Mercurial provides out of the box.

Improving the Git download experience

In two easy steps:

  • Have the official Windows client be as full featured as the Hg one. Could just start with Git Extensions and go from there.
  • Copy the Mercurial website’s behavior

In short, prefer Simplicity over Choices. Have defaults, and obvious ways to get to the non-defaults.

Multiple messages and transport messages in NServiceBus

Andreas Öhlund posted recently on the concept of the “transport message” in NServiceBus. One of the mistakes I often see (and made myself) was misunderstanding the boundary of the unit of work NServiceBus applies to messages, especially around sending multiple messages.

In many of our systems, we consume flat files from third party integration partners. We take these flat files and deserialize each line of the file into a distinct message, so we first tried to do something like this:

ProcessLineInFileMessage[] messages = ConvertFileToMessages(file);

Bus.Send(messages); // Sends all logical messages in 1 transport message

The problem we hit was that the unit of work boundary in NServiceBus is around the transport message, not the logical message. In a file of a million lines, that’s a million logical messages bundled together into one single transport message, and one transaction boundary! We had assumed that the overload for sending multiple messages was simply a helper method that encapsulated a “foreach”. Well, no, it doesn’t. All the messages are wrapped in an envelope known as a “transport message”, and it’s the transport message that defines the unit of work boundary (since that’s the physical message sitting in the queue).

Needless to say, we saw database connection timeouts pretty much immediately. Instead, we modified our use of the bus to instead send one logical message per physical transport message, with our friend the “foreach”:

ProcessLineInFileMessage[] messages = ConvertFileToMessages(file);

foreach (var message in messages)
    Bus.Send(message); // Send one logical message at a time

So when would you use the overloads for sending multiple messages? I’m not sure, but I’ll update if I find out!

ASP.NET MVC DropDownList, MultiSelect and jQuery

The most frequent question posted in the ASP.NET MVC forums was on using the DropDownList   (DDL) helper. I wrote a tutorial and a blog to address these questons; you can find my tutorial here and a blog entry here . One common UI requirement the DDL doesn't provide, how do you insert new categories in the DDL list? The image below shows the completed project which allows you to insert new genre’s and new artists into the popular Music Store tutorial . Providing a simple and elegant mechanism...(read more)

Hazards of Converting Binary Data To A String

Back in November, someone asked a question on StackOverflow about converting arbitrary binary data (in the form of a byte array) to a string. I know this because I make it a habit to read randomly selected questions in StackOverflow written in November 2011. Questions about text encodings in particular really turn me on.

In this case, the person posing the question was encrypting data into a byte array and converting that data into a string. The conversion code he used was similar to the following:

string encoded = System.Text.Encoding.UTF8.GetString(data);

That isn’t exactly their code, but this is a pattern I’ve seen in the past. In fact, I have a story about this I want to tell you in a future blog post. But I digress.

The infamous Jon Skeet answers:

You should absolutely not use an Encoding to convert arbitrary binary data to text. Encoding is for when you've got binary data which genuinely is encoded text - this isn't.

Instead, use Convert.ToBase64String to encode the binary data as text, then decode usingConvert.FromBase64String.

Yes! Absolutely. Totally agree. As a general rule of thumb, agreeing with Jon Skeet is a good bet.

Not to give you the impression that I’m stalking Skeet, but I did notice that this wasn’t the first time Skeet answered a question about using encodings to convert binary data to text. In response to an earlier question he states:

Basically, treating arbitrary binary data as if it were encoded text is a quick way to lose data. When you need to represent binary data in a string, you should use base64, hex or something similar.

This perked my curiosity. I’ve always known that if you need to send binary data in text format, base64 encoding is the safe way to do so. But I didn’t really understand why the other encodings were unsafe. What are the cases in which you might lose data?

Round Tripping UTF-8 Encoded Strings

Well let’s look at one example. Imagine you’re receiving a stream of bytes and you store it as a UTF-8 string and pop it in the database. Later on, you need to relay that data so you take it out, encode it back to bytes, and send it on its merry way.

The following code simulates that scenario with a byte array containing a single byte, 128.

var data = new byte[] { 128 };
string encoded = System.Text.Encoding.UTF8.GetString(data);
var decoded = System.Text.Encoding.UTF8.GetBytes(encoded);

Console.WriteLine("Original:\t" + String.Join(", ", data));
Console.WriteLine("Round Tripped:\t" + String.Join(", ", decoded));

The first line of code creates a byte array with a single byte. The second line converts it to a UTF-8 string. The third line takes the string and converts it back to a byte array.

If you drop that code into the Main method of a Console app, you’ll get the following output.

Original:      128
Round Tripped: 239, 191, 189

WTF?! The data was changed and the original value is lost!

If you try it with 127 or less, it round trips just fine. What’s going on here?

UTF-8 Variable Width Encoding

To understand this, it’s helpful to understand what UTF-8 is in the first place. UTF-8 is a format that encodes each character in a string with one to four bytes. It can represent every unicode character, but is also backwards compatible with ASCII.

ASCII is an encoding that represents each character with seven bits of a single byte, and thus consists of 128 possible characters. The high order bit in standard ASCII is always zero. Why only 7-bits and not the full eight?

Because seven bits ought to be enough for anybody:

When you counted all possible alphanumeric characters (A to Z, lower and upper case, numeric digits 0 to 9, special characters like "% * / ?" etc.) you ended up a value of 90-something. It was therefore decided to use 7 bits to store the new ASCII code, with the eighth bit being used as a parity bit to detect transmission errors.

UTF-8 takes advantage of this decision to create a scheme that’s both backwards compatible with the ASCII characters, but also able to represent all unicode characters by leveraging the high order bit that ASCII ignores. Going back to Wikipedia:

UTF-8 is a variable-width encoding, with each character represented by one to four bytes. If the character is encoded by just one byte, the high-order bit is 0 and the other bits give the code value (in the range 0..127).

This explains why bytes 0 through 127 all round trip correctly. Those are simply ASCII characters.

But why does 128 expand into multiple bytes when round tripped?

If the character is encoded by a sequence of more than one byte, the first byte has as many leading "1" bits as the total number of bytes in the sequence, followed by a "0" bit, and the succeeding bytes are all marked by a leading "10" bit pattern.

How do you represent 128 in binary? 10000000

Notice that it’s marked with a leading 10 bit pattern which means it’s a continuation character. Continuation of what?

the first byte never has 10 as its two most-significant bits. As a result, it is immediately obvious whether any given byte anywhere in a (valid) UTF?8 stream represents the first byte of a byte sequence corresponding to a single character, or a continuation byte of such a byte sequence.

So in answer to the question of why does 128 expand into multiple bytes when round tripped, I don’t really know other than a single byte of 128 isn’t a valid UTF-8 character. So in all likelihood, the behavior shouldn’t be defined.

I’ve noticed a lot of invalid ITF-8 values expand into these three bytes. But that’s beside the point. The point is that using UTF-8 encoding to store binary data is a recipe for data loss and heartache.

What about Windows-1252?

Going back to the original question, you’ll note that the code didn’t use UTF-8 encoding. I took some liberties in describing his approach. What he did was use  System.Text.Encoding.Default. This could be different things on different machines, but on my machine it’s the Windows-1252 character encoding also known as “Western European Latin”.

This is a single byte encoding and when I ran the same round trip code against this encoding, I could not find a data-loss scenario. Wait, could Jon be wrong?

To prove this to myself, I wrote a little program that cycles through every possible byte and round trips it.

var encoding = Encoding.GetEncoding(1252);
for (int b = Byte.MinValue; b <= Byte.MaxValue; b++)
{
    var data = new[] { (byte)b };
    string encoded = encoding.GetString(data);
    var decoded = encoding.GetBytes(encoded);

    if (!decoded.SequenceEqual(data))
    {
        Console.WriteLine("Rount Trip Failed At: " + b);
        return;
    }
}

Console.WriteLine("Round trip successful!");

The output of this program shows that you can encode every byte, then decode it, and get the same result every time.

So in theory, it could be safe to use Windows-1252 encoding of binary data, despite what Jon said.

But I still wouldn’t do it. Not just because I believe Jon more than my own eyes and code. If it were me, I’d still use Base64 encoding because it’s known to be safe.

There are five unmapped code points in Windows-1252. You never know if those might change in the future. Also, there’s just too much risk of corruption. If you were to store this string in a file that converted its encoding to Unicode or some other encoding, you’d lose data (as we saw earlier).

Or if you were to pass this string to some unmanaged API (perhaps inadverdently) that expected a null terminated string, it’s possible this string would include an embedded null character and be truncated.

In other words, the safest bet is to listen to Jon Skeet as I’ve said all along. The next time I see Jon, I’ll have to ask him if there are other reasons not to use Windows-1252 to store binary data other than the ones I mentioned.


Shipped: xUnit.net 1.9

On January 2nd, Jim and I shipped xUnit.net 1.9. We updated NuGet with the 1.9 build binaries, and for the first time, we're including the MSBuild runner inside the "xunit" NuGet package.

There are a few big new features that are worth calling out.

Async Unit Tests

Late in 2010, the C# team announced a new feature that was coming in C# 5: the "async" and "await" keywords. These keywords allow the developer to consume Task-based asynchronous APIs with code that looks linear and procedural, and mimics the code that the developer would write when calling synchronous APIs. In addition, .NET 4.5 is introducing new Task-based async APIs to supplement the existing event-based asynchronous APIs, and new async APIs will only offer Task-based versions.

Before now, unit testing these asynchronous APIs meant resorting to calls like .Wait() and .ContinueWith(), since unit testing frameworks are inherently synchronous by nature. With the release of 1.9, xUnit.net allows you to write asynchronous unit tests by marking your test method with the "async" keyword, and changing the return value from void to Task.

Prior to 1.9, a unit test around an asynchronous API might look something like this:

[Fact]
public void MyAsyncUnitTest()
{
    // ... setup code here ...

    Task task = CallMyAsyncApi(...)
               .ContinueWith(innerTask =>
    {
        var result = innerTask.Result;

        // ... assertions here ...
    }

    task.Wait();
}

The code gets sufficiently more complex with every additional asynchronous API you add into the mix (for example, calling some async APIs during the setup phase of the unit tests). Adding try/catch logic becomes difficult and/or redundant, as the exception handling logic needs to be duplicated for both the setup code and the ContinueWith handler).

The same unit test can be simplified by using xUnit.net 1.9 (and either the Async CTP, or the pre-release version of .NET 4.5 which includes C# 5):

[Fact]
public async Task MyAsyncUnitTest()
{
    // ... setup code here ...

    var result = await CallMyAsyncApi(...);

    // ... assertions here ...
}

xUnit.net does the rest of the work. It sees that you're returning a Task and waits for it to complete. Traditional sequential coding mechanics like try/catch/finally and using are much easier to reason about when using async/await, and the compiler takes care of the boilerplate code necessary to ensure that it all works properly.

Generic Theories

One of the most used features in the xUnit.net Extensions project is support for theories. In quick review: Facts are for expressing tests which are invariants; Theories are used for expressing tests that are only necessarily true for a given set of input data. As such, theories are sometimes called "data-driven unit tests", because part of testing a theory is providing sets of conforming data.

The new Generic Theories feature allows the developer to write their theories using the .NET generic method syntax. xUnit.net will attempt to determine the best signature for the generic types of the method based on the provided data, and it makes this decision individually on a data-row by data-row basis. This is most useful for writing unit tests against generic APIs, wherein you want to choose the generic API to call based on the type of the input data.

For example, here is a unit test which is testing a generic get-and-cast behavior of a container:

[Theory]
[InlineData(42)]
[InlineData(21.12)]
[InlineData("Hello, world!")]
public void CastingGetTest<T>(T value)
{
    MyContainer container = new MyContainer();
    container.SetValue("name", value);

    T result = container.Get<T>("name");

    Assert.Equal(value, result);
}

This generic theory will be called 3 times as expected, and the type of T will be implied based on the value that was provided. We've also updated the output from theories to include any generic types that were used, so if this theory fails, its output method names would be:

CastingGetTest<Int32>(value: 42)
CastingGetTest<Double>(value: 21.12)
CastingGetTest<String>(value: "Hello, world!")

This highlights another new feature of 1.9 as well: theory names include parameter names in addition to parameter values.

The rules for matching are fairly straightforward:

  • If the generic type has no matching parameters, we use Object
  • If the generic type has one matching parameter:
    • If the parameter value is non-null, we use value's concrete type
    • If the parameter value is null, we use Object
  • If the generic type has two or more matching parameters:
    • If the parameters are the exact same concrete type, we use the concrete type (note that a null value is type-compatible with any reference type, but type-incompatible with any value type)
    • If the parameters are not the exact same concrete type, we use Object

We could've gone with a more complex matching algorithm, but we wanted the results to be easy to understand and reasonably predictable without stashing away dozens of matching rules in your head.

An important note: this support is limited to generic test methods only. xUnit.net does not support generic test classes.

Speaking at San Diego DNUG tonight

If you’re in the San Diego area tonight, I’ll be giving my talk on domain modeling. Details below:

http://www.sandiegodotnet.com/

I’ve been told that there is free pizza. If not, I might be able score some stale bagels from my hotel’s lobby, but no promises there.

Hope to see you there!

CodeMash 2012 wrap up

This year was my first to attend the bacon debauchery that is CodeMash. I had been suggested to go by pretty much everyone that I’ve met that has gone, and this year I was fortunate enough to be selected as a speaker.

My talk was on “Crafting Wicked Domain Models” that while was not recorded, all the slides and code can be found on my GitHub:

https://github.com/jbogard/presentations

Although that event wasn’t recorded, check out Claudio Lassala’s blog, where he recorded me doing the talk at the Houston Code Camp a couple months before.

A couple of folks came up to me afterwards telling me that they could never code/refactor in front of a Live Studio Audience. But, since the example was straight out of a real project I had lived through, it made going through the code a lot easier.

The questions afterwards were great, too. I always get some discussion around “it’s great and all, but now I have a bunch more classes to deal with”. It’s a fair criticism, and one to keep an eye out for. The way I see it, if the code is more understandable and more representative of real-world concepts, it’s a win. If, however, I’m having a hard time thinking of the names of classes for which I’m building responsibilities around, it’s a bit of a smell that I’m just inventing abstractions.

CodeMash really was a blast. The people were fantastic, the location was great, the food was fantastic, the beer and bacon were plentiful, and as always, the conversations were unforgettable. Hope to see everyone there next year!

Getting Older

Birthdays are a funny thing, aren’t they? Let’s look at this tweet for example,

It's @haacked's birthday. Give him crap about getting old.

No gifts, please. Especially not what Charlie suggests.

Of course I’m getting older. We’re all getting older. Every second of every day and twice on Monday. Every femtosecond even. Perhaps the only time we’re not getting older is the moment within a Planck time interval. But once that interval is up, yep, you’re older.

Yet people apparently live their lives completely oblivious to this fact until they’re next birthday comes along. As the chronometer slides the next number into place, the realization dawns, “Damn! I’m older!” What? You didn’t know this?!

Feeling Older

The odd thing to me is that I don’t really feel older, mentally. I mean, I consciously know I’m older, but I feel like there’s this smooth continuum from my first memory to now. While the things I spend time thinking about have changed, the way I think about others and about myself feels like it hasn’t changed. I’m the same person then as I am now, and that kind of blows my mind.

For example, I still think fart jokes are funny.

In my mind, old people tell you how they used to walk miles uphill both ways to get to school. But I realize that these days, old people tell you about how they used to have to use their phone to connect online at 1200 baud. And there was no internet!!! OMG! What the hell were we connecting to?

Rather than feeling older, I am observing the evidence that I’m older. For example, I used the word “baud” in this blog post. Another example is how injuries now take much longer to heal. I have two kids, a four year old and a two year old and I’m pretty sure that if I were to slice them clean in half, that’d only put them out of commission for a week. They’d heal up and have no scars! Meanwhile, if I get a paper cut on a finger I can pretty much kiss that finger goodbye. Write it off as a loss and start practicing typing with two bloody stumps for hands.

Getting Experienced

But it’s not just physical. I do notice that while I don’t feel older, I do have the benefit of many more years of experience to draw upon. But more importantly, I’m finally actually paying attention to that. Go figure.

Last week, we had our GitHub summit and Friday was our field trip day to a distillery then a bar. This was the night set aside to party hard. Which is amazing to me because the night before I’m pretty sure we as a company consumed enough alcohol to bring elephants to extinction.

But I drew upon my experience and took it easy because I had a flight early the next morning and I did not want to be sick on an airplane. Contrast this to a few years before at Tech-Ed Hong Kong when I was out with some local friends and at 5:00 AM I had to leave the bar early to catch a flight. For the first time in my life, I contemplated suicide.

Some might call that getting wiser. I call it pain avoidance.

Knowing Less

The other evidence of my getting older is that I know a lot less now than I did when I was younger. Certainly that can’t be true in the absolute sense since I don’t have alzheimer’s (that I’m aware of anyways). But I remember as a young programmer I knew everything!

I knew the right way to do all things in all situations with absolute conviction. But these days, I’m not so sure. About anything. All I have is the breadth of my experience and pattern matching at my disposal. Each new situation is simply a pattern matching exercise against my database of experience followed by an experiment to see if what I thought I knew produces good results.

The great thing about this approach is when you know everything, you have nothing to learn. But now, I’m constantly learning. Many of my experiments fail because many of my experiences are no longer relevant today. The world changes. Quickly. But each experiment is an opportunity to learn.

Staying Young

So yeah, I’m getting older, but I’ve found a loophole. Remember the kids I mentioned slicing in half? I’m not going to do that because I’m worried I’d end up with four of them then and two are already a handful.

These two do a great job of making me feel young because they will laugh at every fart joke I can come up with.

So thanks for all the birthday wishes on Twitter, Facebook, and elsewhere. Here’s to getting older!