Entity Framework 4.2

What's the deal with mapping foreign keys using the Entity Framework?

Since EF4 it has been possible to map foreign key columns to properties of entity classes. But is it really a good idea to do this? In this post I'll explain the reasons for keeping foreign keys out of your object model and contrast that with how mapping foreign keys may make your life easier.

Foreign keys are an artifact of relational databases. It would be very unusual to find a foreign key in a purely object-oriented domain model. The relationships in such a model are instead represented by normal programming language constructs—references, collections, object identities, and so on. Including foreign key properties just because the entities will be stored in a database means that storage concerns are leaking into the domain model. This is a violation of the separation of concerns principle. Put another way, the domain model is losing a certain degree of persistence ignorance.

So this means mapping foreign keys is bad, right? In an ideal world, yes, mapping foreign keys should be avoided. But in reality mapping foreign keys can often make working with and manipulating relationships much easier. There are two main reasons for this:

It is often useful to have simple, accessible, scalar property that represents a relationship. This is especially true when an the entire graph is not loaded, when entries are moved between tiers, or when doing data binding.
It is much easier to work with the “foreign key associations” that EF creates when foreign keys are mapped than is is to work with the “independent associations” that EF creates when foreign keys are not mapped.

There are many scenarios where using foreign keys can make things easier for the above reasons. I won't cover all the scenarios here, but two specific areas that illustrate the point are N-tier applications and dealing with optimistic concurrency.

N-tier applications with foreign key associations

On calling SaveChanges EF needs to know what updates to send to the database. To do this it needs to know what has changed since the database was queried. While entities are being tracked by a context this done by the EF for relationships by detecting when references between entities change or detecting when entities are added or removed from collections.

But what if a graph of entities is sent to another tier where changes cannot be tracked by the EF? Changes are made on this tier and then the graph is sent back. How does the EF then know what has changed?

There are many ways to tackle this problem and it isn't the point of this post to cover them. However, mapping foreign keys makes all of the solutions to this problem easier because changes to a relationship are determined by tracking a foreign key property's original value and comparing this value to its current value. This is easy to do on any tier. Furthermore, it is simple to tell EF what the original and current values are so that SaveChanges knows what to do.

N-tier applications with independent associations

When foreign key properties are not mapped it becomes harder to track and record changes on another tier because it is no longer as simple as using original and current values. That being said, this is still not rocket science, so let's assume that the changes have been tracked and transferred back. Now telling EF about these changes becomes much more difficult because you have to deal with independent associations.

Independent associations are so called because the relationships between entities can theoretically be manipulated independently of the entities involved. For example, if the relationship between two entities is deleted, then the entities themselves don't change and and will not be marked as Modified. Only the relationship is marked as Deleted. Contrast this to foreign key associations where deleting a relationship means setting the foreign key on the dependent entity to null and thereby modifying one of the entities itself.

Independent associations are theoretically interesting but practically speaking quite hard to use. EF uses two types of special ObjectStateEntry to handle independent associations. The first type are called “relationship entities” and represent the actual relationships. Relationships are stored in relationship entries by storing the keys of the two entities that are related.

The second type of special ObjectStateEntry are called “stub” or “key” entities. These are used when a relationship is being tracked but one of the related entities is not being tracked. To handle this case the EF creates a stub entry containing only the key information for the entity that isn't being tracked.

What this all means is that telling EF about changes coming from another tier involves creating and manipulating relationship and stub entities. A method called ChangeRelationshipState was added in EF4 to help make this easier, and compared to how it was in EF1 it is much easier…but it's still not easy. In fact, even just understanding the semantics of ChangeRelationshipState for simple graphs of entities is quite hard.

What if I don't care what has changed?

All of the talk above is about trying to figure out which relationships have changed and therefore need to be updated. But what if instead the code just pushes whatever relationship currently exists to the database? This may result in an update being sent when its not needed because it doesn't change anything, but this may be acceptable depending on the requirements of the application. (This is actually what many web apps do; whether it is a good idea or not is a different matter.)

This approach is easy and works fine when foreign keys are mapped because the current foreign key value is pushed to the database regardless of whether or not it is different to the value already stored in the database.

With independent associations things go wrong. The reason is that EF treats the foreign key as a concurrency token even though it is not mapped. This means that it will only update the foreign key column if that column has the value that it had when the entity was queried. In other words, the update needs to have the original value of the foreign key column. This value is stored as one of the keys in the relationship entry. So going this route puts ends up back in the realm of dealing changes to relationships and using ChangeRelationshipState.

Optimistic concurrency

When using optimistic concurrency updates are sent to the database with the expectation that nothing significant has changed in the database since the data was queried. If the data has changed then an exception is thrown and the application must somehow resolve the conflict before saving. Part 9 of my Using DbContext in EF 4.1 blog series described some patterns for dealing with optimistic concurrency.

Let's assume that an exception is thrown because the relationship between two entities has changed in the database between the time the entities were queried and the time SaveChanges is called. How do you deal with that new relationship? How can you see what it is? How can you accept the change or overwrite it? When foreign keys are mapped this is all very easy because the new relationship is simply a different value of the foreign key. However, if the foreign key is not mapped, then you are back to dealing with independent associations and relationship entries. In this situation even figuring out what new relationship exists in the database is quite hard.

Aren't there other ways to handle foreign keys?

Some of the difficulties with not mapping foreign keys stem from the use of independent associations and they way they work in EF. The APIs for dealing with independent associations could also be easier to use. So this is an area where EF has room to improve.

Alternately, foreign keys could be mapped but then kept in shadow state. This would remove the need to deal with independent associations while also not polluting the object model with foreign key properties. This is something that we would like to implement in EF, but it is not currently being worked on.

Summary

Mapping foreign keys is not a good idea from a purist's point of view because it results in storage concerns leaking into the domain model. However, practically speaking, this is often a small price to pay to achieve greater simplicity in the application as a whole. For this reason mapping foreign keys is usually the most pragmatic approach to take.

Thanks for reading.
Arthur

This page is up-to-date as of December 11th, 2011. Some things change. Some things stay the same. Use your noggin.