What’s the deal with mapping foreign keys using the Entity Framework?

Since EF4 it has been possible to map foreign key columns to properties of entity classes. But is it really a good idea to do this? In this post I’ll explain the reasons for keeping foreign keys out of your object model and contrast that with how mapping foreign keys may make your life easier.

Foreign keys are an artifact of relational databases. It would be very unusual to find a foreign key in a purely object-oriented domain model. The relationships in such a model are instead represented by normal programming language constructs—references, collections, object identities, and so on. Including foreign key properties just because the entities will be stored in a database means that storage concerns are leaking into the domain model. This is a violation of the separation of concerns principle. Put another way, the domain model is losing a certain degree of persistence ignorance.

So this means mapping foreign keys is bad, right? In an ideal world, yes, mapping foreign keys should be avoided. But in reality mapping foreign keys can often make working with and manipulating relationships much easier. There are two main reasons for this:

  • It is often useful to have simple, accessible, scalar property that represents a relationship. This is especially true when an the entire graph is not loaded, when entries are moved between tiers, or when doing data binding.
  • It is much easier to work with the “foreign key associations” that EF creates when foreign keys are mapped than is is to work with the “independent associations” that EF creates when foreign keys are not mapped.

There are many scenarios where using foreign keys can make things easier for the above reasons. I won’t cover all the scenarios here, but two specific areas that illustrate the point are N-tier applications and dealing with optimistic concurrency.

N-tier applications with foreign key associations

On calling SaveChanges EF needs to know what updates to send to the database. To do this it needs to know what has changed since the database was queried. While entities are being tracked by a context this done by the EF for relationships by detecting when references between entities change or detecting when entities are added or removed from collections.

But what if a graph of entities is sent to another tier where changes cannot be tracked by the EF? Changes are made on this tier and then the graph is sent back. How does the EF then know what has changed?

There are many ways to tackle this problem and it isn’t the point of this post to cover them. However, mapping foreign keys makes all of the solutions to this problem easier because changes to a relationship are determined by tracking a foreign key property’s original value and comparing this value to its current value. This is easy to do on any tier. Furthermore, it is simple to tell EF what the original and current values are so that SaveChanges knows what to do.

N-tier applications with independent associations

When foreign key properties are not mapped it becomes harder to track and record changes on another tier because it is no longer as simple as using original and current values. That being said, this is still not rocket science, so let’s assume that the changes have been tracked and transferred back. Now telling EF about these changes becomes much more difficult because you have to deal with independent associations.

Independent associations are so called because the relationships between entities can theoretically be manipulated independently of the entities involved. For example, if the relationship between two entities is deleted, then the entities themselves don’t change and and will not be marked as Modified. Only the relationship is marked as Deleted. Contrast this to foreign key associations where deleting a relationship means setting the foreign key on the dependent entity to null and thereby modifying one of the entities itself.

Independent associations are theoretically interesting but practically speaking quite hard to use. EF uses two types of special ObjectStateEntry to handle independent associations. The first type are called “relationship entities” and represent the actual relationships. Relationships are stored in relationship entries by storing the keys of the two entities that are related.

The second type of special ObjectStateEntry are called “stub” or “key” entities. These are used when a relationship is being tracked but one of the related entities is not being tracked. To handle this case the EF creates a stub entry containing only the key information for the entity that isn’t being tracked.

What this all means is that telling EF about changes coming from another tier involves creating and manipulating relationship and stub entities. A method called ChangeRelationshipState was added in EF4 to help make this easier, and compared to how it was in EF1 it is much easier…but it’s still not easy. In fact, even just understanding the semantics of ChangeRelationshipState for simple graphs of entities is quite hard.

What if I don’t care what has changed?

All of the talk above is about trying to figure out which relationships have changed and therefore need to be updated. But what if instead the code just pushes whatever relationship currently exists to the database? This may result in an update being sent when its not needed because it doesn’t change anything, but this may be acceptable depending on the requirements of the application. (This is actually what many web apps do; whether it is a good idea or not is a different matter.)

This approach is easy and works fine when foreign keys are mapped because the current foreign key value is pushed to the database regardless of whether or not it is different to the value already stored in the database.

With independent associations things go wrong. The reason is that EF treats the foreign key as a concurrency token even though it is not mapped. This means that it will only update the foreign key column if that column has the value that it had when the entity was queried. In other words, the update needs to have the original value of the foreign key column. This value is stored as one of the keys in the relationship entry. So going this route puts ends up back in the realm of dealing changes to relationships and using ChangeRelationshipState.

Optimistic concurrency

When using optimistic concurrency updates are sent to the database with the expectation that nothing significant has changed in the database since the data was queried. If the data has changed then an exception is thrown and the application must somehow resolve the conflict before saving. Part 9 of my Using DbContext in EF 4.1 blog series described some patterns for dealing with optimistic concurrency.

Let’s assume that an exception is thrown because the relationship between two entities has changed in the database between the time the entities were queried and the time SaveChanges is called. How do you deal with that new relationship? How can you see what it is? How can you accept the change or overwrite it? When foreign keys are mapped this is all very easy because the new relationship is simply a different value of the foreign key. However, if the foreign key is not mapped, then you are back to dealing with independent associations and relationship entries. In this situation even figuring out what new relationship exists in the database is quite hard.

Aren’t there other ways to handle foreign keys?

Some of the difficulties with not mapping foreign keys stem from the use of independent associations and they way they work in EF. The APIs for dealing with independent associations could also be easier to use. So this is an area where EF has room to improve.

Alternately, foreign keys could be mapped but then kept in shadow state. This would remove the need to deal with independent associations while also not polluting the object model with foreign key properties. This is something that we would like to implement in EF, but it is not currently being worked on.

Summary

Mapping foreign keys is not a good idea from a purist’s point of view because it results in storage concerns leaking into the domain model. However, practically speaking, this is often a small price to pay to achieve greater simplicity in the application as a whole. For this reason mapping foreign keys is usually the most pragmatic approach to take.

Thanks for reading.
Arthur

About Arthur Vickers

Developer on the Entity Framework team at Microsoft.
This entry was posted in Change Tracking, Entity Framework, Foreign Keys, Independent Associations and tagged , , , . Bookmark the permalink.

4 Responses to What’s the deal with mapping foreign keys using the Entity Framework?

  1. Greg Foote says:

    Arthur,

    I have an error that is caused by a for loop enumeration in the System.Data.Objects.DataClasses.RelatedEnd class

    This error happens when I have this EF Model

    I get this exception:
    Collection was modified; enumeration operation may not execute.

    probably because there is no .ToList() on the wrappedEntity.RelationshipManager.Relationships ?

    internal abstract void VerifyType(IEntityWrapper wrappedEntity);
    private static void WalkObjectGraphToIncludeAllRelatedEntities(IEntityWrapper wrappedEntity, bool addRelationshipAsUnchanged, bool doAttach)
    {
    foreach (RelatedEnd end in wrappedEntity.RelationshipManager.Relationships)
    {
    end.Include(addRelationshipAsUnchanged, doAttach);
    }
    }

    I have a repro of this issue and others… if you guys need one

    Thanks
    Greg

  2. Keyse says:

    Please dont take my comment as me flaming EF but OMG my head hurts. Reading this article reminds me of why Ted Neward coined the phrase: “Object/relational mapping is the Vietnam of Computer Science”. I am currently knee deep in ORM hell becouse I thought code first was the silver bullet that will finally free me from Database first hell. To be fair it’s my fault to think so but still I had no idea that I have RELEARN how to map basic one-one and one-many association using shining new Fleunt interface.

    I am loving and reading your articles becouse of the geek inside of me but the adult part of me is saying: its 2012 so why am I still learning how specify associations? and all the various ways I can map them? and the pro and cons of each way? Shouldn’t the ORM just shield all the low level stuff from me alread so I can consentrate on the real business functionality?

    Yes, I am impressed with all the covension over configuration smarts in EF and I really realy really apprecieta the smart developers and architects who built EF but still I am kinda hating on them for making me have to read pages and pages of blog posts and books just to figure out the difference between “uni-direction and bi-direction” one-to-one relationships :))

    Sorry about my rant but it’s 1:30 am ET and I had a Vietnam kinda day trying to figure it out why my ORM layer is adding an extra Customer record when ever I save a new Order object! I know I will eventually get out of quagmire and figure out the correct mapping that will fix my simple one-many relationship but hopefully it will not be too late ;)

    Thanks for listening to a frustrated fellow developer!

    • Sorry for your frustration. I agree that learning all the terminology can be difficult–do you think it would be useful to have a glossary on msdn.com/ef?

      The reason an additional Customer object is being added is probably because both the Customer and the Order are in the “Added” state, meaning that EF think they are both new entities. This doesn’t happen when you query for a Customer, add an Order, and save, but it does happen when the Customer is created in some other way and then added to the context, as might be happening in the controller of a web app or other n-tier scenario. If you know the Customer hasn’t changed, then you can set its state to Unchanged using something like “context.Entry(customer).State = EntityState.Unchanged;” before saving.

      Thanks,
      Arthur

  3. Mohamoud says:

    Arthur, First THANK you for your reply and for not only publishing my semi-ranting comment but also asking for my feedback and at the same time offering a solution to my one-many issue. I REALY appreciate it and it shows how much you care about us day-to-day devs that use your Great frameworks. You guys Rock!

    Glossary will be very helpful but also publishing usage patterns of EF/Code-First in the form of Problem/Solution format will be very helpfully too. For example the explanation you just gave me about why an additional customer is being added when I saved the order entity makes alot of sence and says to me that I should’ve followed this pattern:

    Problem: You have a one-many relationship between Customer and Orders and you want to save a newly created Order Entity in theDbContext

    Solution: (Given you have the correct Mappings)
    a. First Query the Customer Object from the DbContext
    b. Add the newly created Order Entity to DbContext.Orders
    c. SaveChanges

    By the way I have not tried this solution yet but if the above EF usage pattern (and others) will work for 80% of Web and Desktop applications then my suggestion is to “Formally” puplish them some where on msdn.com/ef site.

    Sorry for yet another lengthy comment and thanks again for listening to me and for building great frameworks that helps me make money, pay the rent and have FUN developing great apps all at the same time ;)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s