Deleting orphans with Entity Framework

It is common for a single parent entity to be related to many child entities. This relationship may be required or optional. A  required relationship means that the child cannot exist without a parent, and if the parent is deleted or the relationship between the child and the parent is severed, then the child becomes orphaned. In such situations it is often useful to have the orphaned child automatically deleted.

An example model

Consider the following model representing students, report cards, and honors advisors.

public class Student
{
    public int Id { get; set; }
    public string Name { get; set; }

    public int? HonorsAdvisorId { get; set; }
    public virtual HonorsAdvisor HonorsAdvisor { get; set; }

    public virtual ICollection ReportCards { get; set; }
}

public class ReportCard
{
    public int Id { get; set; }
    public decimal Gpa { get; set; }
    public string Remarks { get; set; }

    public int StudentId { get; set; }
    public virtual Student Student { get; set; }
}

public class HonorsAdvisor
{
    public int Id { get; set; }
    public string Name { get; set; }

    public virtual ICollection Students { get; set; }
}

Each student has many report cards and it doesn’t make sense for a report card to exist without it belonging to a student, so the relationship is required. Note that this doesn’t mean that the student must have report cards—she could have none. What it does mean is that any report card that exists must be associated with a student.

On the other hand, a student who is not enrolled in the honors program can happily exist without having an honors advisor, so this relationship is optional.

Relationships and foreign keys

When using foreign keys in your model a required relationship is usually represented by using a non-nullable foreign key. For example, the StudentId property of ReportCard.

Conversely, optional relationships are usually represented by nullable foreign keys. For example, the HonorsAdvisorId property of Student. If HonorsAdvisorId is set to null it means that the student does not have an honors advisor.

You can also use the [Required] attribute or the fluent API to force relationships to be required or optional regardless of FK nullability. This can be useful when using nullable types such as strings as keys. For example, here’s how the student/advisor relationship could be made required even though it has a nullable FK:

modelBuilder
    .Entity()
    .HasRequired(s => s.HonorsAdvisor)
    .WithMany(r => r.Students);

Cascade delete for required relationships

Let’s say that a student leaves the school and our application handles that by deleting the student from the database:

public static void StudentLeaves(string name)
{
    using (var context = new SchoolContext())
    {
        context.Students.Remove(context.Students.Single(s => s.Name == name));
        context.SaveChanges();
    }
}

What will happen to the student’s report cards? Dumping my test database contents before and after shows this:

Before:

Students:
  Student Pinky Pie with advisor Princess Celestia
  Student Rainbow Dash with advisor Princess Celestia
Report cards:
  Report card for Pinky Pie has GPA 4.00 and remarks 'Best student ever.'
  Report card for Pinky Pie has GPA 4.00 and remarks 'Still doing great.'
  Report card for Rainbow Dash has GPA 2.10 and remarks 'Spends too much time flying.'
  Report card for Rainbow Dash has GPA 2.20 and remarks 'Needs to sit still.'

After:

Students:
  Student Rainbow Dash with advisor Princess Celestia
Report cards:
  Report card for Rainbow Dash has GPA 2.10 and remarks 'Spends too much time flying.'
  Report card for Rainbow Dash has GPA 2.20 and remarks 'Needs to sit still.'

The answer is that they will be deleted automatically because Code First has setup a cascade delete between Student and ReportCard. A cascade delete means that if the parent is deleted then all the children will also be deleted. Code First did this because the relationship between students and report cards is required.

Code First not only placed the cascade delete in the model but also configured it in the database. This is important—it is expected that if an EF cascade delete exists then it must also exist in the database. If the two are not in sync then you risk getting constraint exceptions from the database. It is because there is a cascade delete in the database that the report cards were deleted without even loading them into the context.

No cascade delete for optional relationships

Let’s say that an honors advisor leaves the school:

public static void AdvisorLeaves(string name)
{
    using (var context = new SchoolContext())
    {
        context
            .HonorsAdvisors
            .Remove(context.HonorsAdvisors
                        .Include(a => a.Students)
                        .Single(a => a.Name == name));

        context.SaveChanges();
    }
}

What will happen to the advisor’s students? Dumping students before and after shows this:

Before:

Students:
  Student Pinky Pie with advisor Princess Celestia
  Student Rainbow Dash with advisor Princess Celestia

After:

Students:
  Student Pinky Pie with advisor
  Student Rainbow Dash with advisor

The answer is that each HonorsAdvisorId FK property and HonorsAdvisor navigation property is set to null but the students are not deleted. This is because Code First did not setup a cascade delete for the optional relationship.

Note that in this case I needed to load the students into memory so that EF could set the FKs to null before saving.

If you do want to force a cascade delete on an optional relationship you can do so using the fluent API:

modelBuilder
    .Entity()
    .HasRequired(s => s.HonorsAdvisor)
    .WithMany(r => r.Students)
    .WillCascadeOnDelete();

Severing relationships

Imagine a student complains about her report card and the teacher agrees to write a new one. (Maybe a parent agreed to donate a large sum to the school.) There might be some code like:

public static void DoctorReport(string name)
{
    using (var context = new SchoolContext())
    {
        var student = context.Students.Single(s => s.Name == name);

        student.ReportCards.Remove(
            student.ReportCards.OrderBy(r => r.Id).Last(r => r.Student.Name == name));

        student.ReportCards.Add(new ReportCard { Gpa = 3.5m, Remarks = "Doing better at staying still." });

        context.SaveChanges();
    }
}

What will happen when SaveChanges is called? The answer is that you get an exception reading:

System.InvalidOperationException: The operation failed: The relationship could not be changed because one or more of the foreign-key properties is non-nullable. When a change is made to a relationship, the related foreign-key property is set to a null value. If the foreign-key does not support null values, a new relationship must be defined, the foreign-key property must be assigned another non-null value, or the unrelated object must be deleted.

This is because EF cascade delete only kicks in when a parent is deleted. It doesn’t do anything when the parent still exists but the relationship has been severed. This is something that is on our backlog to fix.

You can solve this problem by directly deleting the orphaned child or by overriding SaveChanges to find and delete orphans:

public override int SaveChanges()
{
    ReportCards
        .Local
        .Where(r => r.Student == null)
        .ToList()
        .ForEach(r => ReportCards.Remove(r));

    return base.SaveChanges();
}

This code does the following:

  • Uses DbSet.Local to get access to non-deleted report card entities currently being tracked by the context without running any database query.
  • Filters this list for any that do not reference a student.
  • Makes a copy of this filtered list to avoid modifying a collection while enumerating it.
  • Marks each orphaned report card as deleted.

Summary

By default, Code First makes an optional relationship when the FK is nullable and a required relationship when the FK is non-nullable. Required relationships are configured to cascade delete so that if the parent is deleted then all the children will also be deleted.

The optional/required nature of a relationship can be changed with the fluent API or data annotations and cascade delete can be configured with the fluent API.

A cascade delete will not delete orphans that have been severed from their parent, but this can be done by overriding SaveChanges.

EF Trivia

The exception message above is known on the team as the “conceptual null message.” This is because normally when a relationship is severed the FK for that relationship is set to null. However, if the property is non-nullable then EF instead conceptually sets it to null without actually doing so. Such “conceptual nulls” cannot be saved to the database, hence the exception.

The text itself is a message that describes in detail what the problem is…in a way that most people don’t understand. It is therefore not very helpful. We often use it as an example of a pitfall to avoid when writing exception messages. That is, the message needs to describe the problem and suggest a way to fix the problem from the user’s perspective. Writing something in terms of the implementation details often does not help much.

We should really write a new message that is more helpful, but better still would be to fix the delete orphans problems so that the exception is no longer needed. Now if I could only stop blogging long enough to do that…

About Arthur Vickers

Developer on the Entity Framework team at Microsoft.
This entry was posted in Code First, Data Annotations, DbContext API, Entity Framework, Foreign Keys and tagged , , , , , , , . Bookmark the permalink.

33 Responses to Deleting orphans with Entity Framework

  1. Alan says:

    Sure, I’m being dim here but in your code first example (without modelBuilder) how does the Student & Report relationship exist? Or do you have to use modelBuilder to create it? Very new to CF EF so not obvious to me. Thanks.

    • Code First discovers the relationship by convention. It sees the ICollection property and the Student property and matches them up as the two navigation properties for the relationship and then matches the StudentId property by name to be the foreign key.

  2. Greg Foote says:

    Help ! I cannot track down WHICH reference is being nulled out.. What is the best way to figure out what reference EF is thinking should NOT be nulled out in this situation ???

    The operation failed: The relationship could not be changed because one or more of the foreign-key properties is non-nullable. When a change is made to a relationship, the related foreign-key property is set to a null value. If the foreign-key does not support null values, a new relationship must be defined, the foreign-key property must be assigned another non-null value, or the unrelated object must be deleted.

    Why is there no reference to the entity and naviagtion which raised the exception ?

    It makes it very difficult to debug, especially where there are multiple entities that need to be persisted in the saveChanges() call

    This is an MVC applicaiton so the entity is being loaded into the context and then I issue this code to change it:

    Thanks
    Greg

  3. Greg Foote says:

    I am trying to send back over the wire as little info as possible and use only fk’s to update the entity…

  4. Greg Foote says:

    Obviously, I am not populating a navigation property of some entity in the graph or perhaps a collection… But I don’t know which one it is… As long as the FKs are still set it should be ok to save correct ?

    What is the best way to tell exactly which ‘foreign-key property’ it is complaining about with this error ?

    • @Greg. I agree that it would be very useful to make the entities/relationships available in the exception object. I will file a bug for this. If you are _only_ setting FKs then you should not be getting this exception. This exception happens when a navigation property is set to null. If you just set the FK (not the nav prop) or if you set the FK after setting the nav prop to null then the new value of the FK should be used and you should not see this exception.

  5. glmnet says:

    Will this also work properly?

    public override int SaveChanges()
    {
    Set()
    .Local
    .Where(r => !Parent.Local.SelectMany(s => s.Children).Contains(r))
    .ToList()
    .ForEach(r => Set().Remove(r));

    return base.SaveChanges();
    }

  6. Chrisn says:

    This is a great post explaining the problem really well. I presume this problem hasn’t been addressed in EF5?

  7. smaganazook says:

    “The text itself is a message that describes in detail what the problem is…in a way that most people don’t understand. ”

    This statement is true of almost every entity framework (code first) exception I have come across. Most of the time, the exception message is absolutely useless in pointing to the cause of the exception. This is incredibly frustrating to users of your framework/API, and has on more than one occasion made my team consider moving to nHibernate.

    • It would be great if you could provide examples. We work really hard on new messages. A lot of the old ones from the core stack are terrible but specific examples would be useful. There is a particular problem with “MappingExceptions” in that they come from part of the stack that only understands the XML model and has no context about Code First. Those ones are hard to fix without significant changes to the stack.

  8. dabblernl says:

    Thanks for this tip. Implementing it can be very slow though due to the fact that on every Remove call DetectChanges is called under the covers. Detecting changes is paramount before using the Local call (which EF automatically does), but it can be safely turned off when calling the For Each {remove} call. My code became ten times faster after doing this when the context held just 360 Local entities that had te be checked for a loss of their parents.

    public override int SaveChanges()
    {
    var cardsToRemove=ReportCards
    .Local
    .Where(r => r.Student == null)
    .ToList();

    try
    {
    Configuration.AutoDetecChangesEnabled=false;
    for each card in cardsToRemove
    ReportCards.Remove(card);
    }
    finally
    {
    Configuration.AutoDetectChangesEnabled=true;
    }
    return base.SaveChanges();
    }

  9. Adrian says:

    Thanks for this great post!.
    I have a similar situation, but I don’t find a way to address a proper solution.
    Consider the following model representing offers formed by one pack, where each pack can contain many services.

    public class Service
    {
    public int ServiceId { get; set; }
    public string Name { get; set; }

    public virtual ICollection Packs { get; set; }
    }
    public class Pack
    {
    public int PackId { get; set; }
    public string Name { get; set; }
    public string Description { get; set; }

    public virtual ICollection Offers { get; set; }
    public virtual ICollection Services { get; set; }
    }
    public class Offer
    {
    public int OffeId { get; set; }
    public int PackId { get; set; }
    public string Name { get; set; }

    public virtual Pack Pack { get; set; }
    }

    a required relationship is used between Offers and Packs.

    I f I want to delete a pack included in one offer I get an exception due to “restrict” constrain of the FK, so the operation fail. EF mark the EntityEntry and RelationshipEntry as deleted. If after this operation I tried to add a pack the operation fail with the exception you point out in this post.

    So, how can I solve this situation?.

  10. @Adrian I don’t completely follow your question. Can you post some code showing what exactly you are doing with the entities?

    • Adrian says:

      @Arthur I’m using EF with a repository pattern and a UnitofWork in a WPF application. The lifetime of the EF context is the lifetime of the wpf window in most of cases.

      Delete and UpdateOrInsert methods in my generic o base repository are.

      public void AddOrUpdate(T entity)
      {
      var objContext = ((IObjectContextAdapter)UnitOfWork.DbContext).ObjectContext;
      object originalItem = null;
      ObjectStateManager objectStateManager = objContext.ObjectStateManager;
      ObjectStateEntry stateEntry = null;
      bool isPresent = objectStateManager.TryGetObjectStateEntry(entity, out stateEntry);
      EntityKey key = isPresent ? stateEntry.EntityKey : objContext.CreateEntityKey(entity.GetType().Name, entity);

      if (objContext.TryGetObjectByKey(key, out originalItem))
      {
      UnitOfWork.DbContext.Entry(originalItem).State = EntityState.Detached;
      UnitOfWork.DbContext.Entry(entity).State = EntityState.Modified;
      }
      else
      {
      UnitOfWork.DbContext.Set().Add(entity);
      }
      }

      public void Delete(T entity)
      {
      UnitOfWork.DbContext.Set().Remove(entity);
      }

      public void Save()
      {
      UnitOfWork.DbContext.SaveChanges();
      }

      My bussines layer(BLL) uses the generic repository’s methods in the following way

      public void AddOrUpdatePack(Pack pack)
      {
      _packsrepository.AddOrUpdate(pack);
      _packsrepository.Save();
      }

      public bool DeletePack(Pack pack)
      {
      var isError = false;
      try
      {
      _packsrepository.Delete(pack);
      _packsrepository.Save();
      isError = false;
      }
      catch (DbUpdateException ex)
      {
      isError = true;
      }
      catch (DataRelatedException ex)
      {
      isError = true;
      }
      catch (Exception ex)
      {
      isError = true;
      }
      return isError;
      }

      If I’m working with the entity Pack and attempted to delete one, with an associated offer, the delete operation fail due to a (Pack – Offer) relationship constraint , that is set to restrict”. Although the operation has failed, the EntityEntry (Pack) and the RelationshipEntry (Pack – Service) remains marked as deleted.

      If after this operation, I attempt to add a pack, the call to SaveChanges method fails because the context contains objects marked as deleted, that in fact I can’t delete.

      • @Adrian The relationship between Pack and Offer is required, which means that you cannot delete a Pack if an Offer is related to that Pack. There are various options for handling this. First, consider whether or not the relationship should be required. If it is acceptable to have an Offer without some associated Pack, then make the PackId FK nullable. This will allow the Pack to be deleted and the FK in the Offer set to null and then later another Pack can be associated with the Offer. If it is not acceptable to ever have an Offer without a Pack then the relationship can be left as-is, but you’ll need to handle the re-parenting of an Offer with a new Pack. By this I mean that the new Pack must be associated with the Offer _before_ SaveChanges is called. This should cause the new pack to be inserted before the old Pack is deleted and hence no constraint violations should happen.

  11. Adrian says:

    @Arthur Thanks for the comments and your time. The relationship Offer-Pack is always required. About the reparenting, in this case is no the way. let me explain in more detail my scenario.

    I have a window where the user can manage some IPTV packs, compound by some service, products and other properties. The manage of the Offers takes place in other window. So when the user is managing a pack, is not necessary to assign a pack to an offer. In fact , is interesting for me, that the exeception takes place in order to prevent the end user to delete a pack assigned to an offer.

    The annoying situation is the behavior of EF. I do not understand that if the delete operation fails and triggers the exception, EF do not rollback the state of the entity and its relationships to the unchanged state, and remains marked to delete during the lifetime of the context.

    • @EF leaves the state as it is so that the application has a chance to fix the problem and call SaveChanges again, or possibly just call SaveChanges again a bit later if the failure is external to the app and transient. If you don’t want the entities to be deleted (because there is no way for this to be saved in the database) then change the states of the entities. But then why make something deleted in the first place if it is impossible to save that state in the database? One reason you might do this is if the state that can’t be saved is an intermediate state, such as part way through a reparenting. But then the state needs to be changed again to something that can be saved before calling SaveChanges. To put it another way, if something is marked as deleted, then it will stay in this state until it is either successfully deleted or its state is changed to something else. EF is never going to make the assumption that if you mark something as deleted and then saving fails that you now don’t want the deleted thing to be deleted. Doing so would open up massive possibilities for data corruption.

  12. brockallen says:

    Arthur — due to some constraints I can’t use the approach you suggested. Instead I’m considering using the various events to attempt to achieve the same thing. Mind taking a quick peek and comment on how bad (or good) this idea is? And if it seems ok, I’m sure I’m missing something so any feedback on the approach would be appreciated. Thanks. Here’s the code:

    http://codepaste.net/awjm5d

  13. rednbi says:

    Thank you for your post. To me, orphans deleting is not even a problem if I have a custom repository for the data type and have access to a context, etc… as described in your post. This missing feature is really a problem in a DDD perspective, when we want to keep the deleting logic in the domain model. I’m using EF6 in my current project, everything was working perfect but then I suddently noticed the orphans deleting “issue” (between quotes because your idea was to allow the app to reattach the entity).

    But I would like to show you something, you’ve made a great job with EF and I was able to do that :
    1/ Entirely create my domain without worrying about persistence
    2/ When it came to using EF, I just had to add some “virtual” keywords and ICollection
    3/ I configured my model mapping by overriding onModelCreating
    4/ I’m using the Unit Of Work/Repository Pattern and the UoW only exposes the Repositories of my root aggregates, the other aggregates access through navigation properties without the need of having a repo. The root repositories and not even specialized, I just use a generic repository.
    4/ However it is amazing efficient to add, modify (and of course retrieve) entities from Collections without having to dig into any Repository/Context code. And up to this point my model is clean and the persistence mechanism is almost transparent. I just have to retrieve my root entity from Db, and call SaveChanges() at the end. No context call is required, I can work with my in-class business methods and add/update items to my collection without even thinking about the persistence mecanism. The key point is just to configure the mapping properly. It’s like a dream you made become reality.
    5/ But….. but! What about when i said : “Well, I’ll go ahead and test the deleting methods just in case. But that shouldn’t be issue”. It actually was.

    So therefore I had to make a choice : specializing my Generic repository to tweak the delete logic. But that would result in having to specifically call the specialized delete method of the repository. That means that I’d either had to : make my domain being aware of persistence by using “IRepository” interfaces or let the external world handle part of my deleting logic.

    The other choice is what I actually did, use Identity Relationship. The tradeof is that it messed my domain up by having to put foreign key in my models that didn’t make any sense in my domain. Well, one can actually see some kind of sense in them but there are clearly code smells.

    If I have to be honest, I was very frustrated when I discovered this missing feature. I don’t understand why it’s not part of your top priorities. It’s a top feature for DDD. Two “simple” features are really missing to make EF a killing ORM for DD : Orphans Deleting and an easy Fluent mapping of private properties. That would make EF the the best ORM for DDD with a persistence mecanism which is completely transparent for the domain, and with a number of repositories reduced to a minimum.

    It’s so bad that at the last minute I had to introduce smells in my model =(

  14. Jason says:

    Arthur,

    The information you have presented here seems to be a missing piece in a larger puzzle of mine. However, I cannot get the code provided to work. In the SaveChanges method, the list that contains the orphaned entities is empty as the navigation property is not set to null. I have observed that, if I change the foreign key property to be nullable, the same code will work as the navigation property will now be null.

    Is this the expected behavior? If not, do you have any suggestions or ideas where I may be making a mistake?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s