In the old days, when arguments like DataSets versus Objects still mattered and ORMs were unknown to the majority, I held a firm position on working with objects rather than raw database data, regardless the size of the application.

Things have changed now, and objects dominate almost every application. In the .net world, with tools like LINQ to <everything> and whole bunch of different ORMs, using raw database data seems redundant and possibly dangerous since it is less easier to enforce business and validation rules.

 

One thing that always bugged me though is to find ways to minimize wasted memory due to unecessary copies of objects (especially when serialization takes place between layer boundaries), minimize the chances memory leaks (leaked events anyone?), etc.

Image a Product object. Let’s say that this object contains a reference of a User that registered it, a collection of Users that bought it , a collection of Products that are related to it, a collection of Comments (which in turn contain references of Users that authored each comment) and the list goes on and on. You can see that the nesting can go quite deep. Thankfully, when all references live on the same application domain, the above nesting has no effect on the used memory.

When designing for n-tier, separated application domains must be taken under consideration, be them on the same machine or not. This is where serialization comes into play, since you need it for transporting the objects between the tiers. This means increased communications traffic (since you need to serialize pretty much the whole object graph). After the other side receives the object, those nice live references are now gone; two Product objects that refence the same User that registered, are now pointing to separate, yet same User objects.

 

You can fix most of these problems manually, but it’s boring and error prone. A practice I personally follow is to create two properties for each expensive child. For example, should we had a property RegisteredByUser of type User on the Product class, I’d create a second property named RegisteredByUserID of type int. The RegisteredByUser property would have to be read only, and retrieve the value from a repository by using the value of the RegisteredByUserID property. It would look something like that:

public int RegisteredByUserId { get; set; }
public User RegisteredByUser { get { return Repository.Find(this.RegisteredByUserId); } }

If null instances are allowed, I’d implement as in follows:

public int? RegisteredByUserId { get; set; }
public User RegisteredByUser { get { return this.RegisteredByUserId.HasValue ?
ARepository.Find(this.RegisteredByUserId.Value) : null; } }

I do this manually (I’ve found out that for most ORMs it’s a bit difficult and unelegant to have them play nice with n-tier architectures; let’s hope that Entity Framework 2.0 will improve things), but it’s not such a big deal. What this trick offers, is the ability to avoid a seek to the repository when I only need the Id of a child object, plus the reduced size of the serialized object. No child User object is serialized. Of course, a similar repository should exist on the other side, otherwise ARepository will simply throw a NullReferenceException. By only having indirect references, I am also guaranteed that no doppelgängers exist, since every reference is retrieved by the repository.

The repository that handles the above request can then load the required instance on first demand, or have it loaded preemptively. It’s also much more easier to implement cache expiration policies because, I can guarantee that a Product with an ID of 5 will be the same in every object that holds a reference to it.

 

So, the above work pretty nicely, but… what about collections? More specifically, what about dictionaries? How can I achieve this shallow-by-repository reference on a Dictionary<int, Product>? This is where the FlexibleDictionary comes into play.

Note: FlexibleDictionary has been designed to work with the Static Autofac locator (read about and download it) for convenience. You register an implementation of the interface IFlexibleRetrievable<Tkey, TValue> which contains a method Find(TKey key) which returns an instance of type TValue. This is used for automatic resolution of the repository that an instance of FlexibleDictionary<Tkey, TValue> will use to retrieve the values, and avoid specifying it at runtime.


Internally, the dictionary is but a HashSet, and does not store any values. Suppose an instance of FlexibleDictionary<int, Product> named fd. This has been configured to a repository ProductsRepository which also implements IFlexibleRetrievable<int, Product>. When you want to add a product to fd, you must only add the Id of that product, no instance of it! Remember that since every retrieval is done by the repository and the repository itself is responsible for retrieving the values requested by it, there would be no point in adding a reference directly to fd.

Now, let’s say that you want to retrieve a product with an ID of 5 from fd (of course, you must first have added that ID 5 to the fd first; whether it already exists on the repository or not is irrelevant). FlexibleDictionary will check its internal HashSet to verify that it contains that ID 5, and then return it from the repository.

 

This gives us the same nice advantages as in the shallow properties. I can see you asking though “yeah, but what about the performance of it? Those requests and seeks surely hurt and iterations should be very slow”. I was worried about that myself (since every item is retrieved by the repository individually), so apart from the profiling on my applications, I’ve run a small benchmark involving a simple value retrieval and a Where operator. Here are the results (Q9550 quad core with lots of RAM):

Retrieving a single value, 5.000.000 times:

Normal dictionary: 0.135 seconds (0.145 without optimizations)

Flexible dictionary: 0.385 seconds (0.683 without optimizations)

Performing a Where operator in a dictionary containing 5.000.000 items:

Normal dictionary: 0.198 seconds (0.198 without optimizations)

Flexible dictionary: 0.313 seconds (0.355 without optimizations)

 

The FlexibleDictionary might be 1.5 to 3 times slower, but judging from the test volume, that’s a non issue. I would increase the test numbers, but then the Visual Studio test process would run out of memory!

 

All comments. criticism and suggestions for improvement are welcome!

FlexibleDictionary.rar (68.78 kb)