Entity Framework 6.0

Database initializer and Migrations Seed methods

Entity Framework contains two different methods both called Seed that do similar things but behave slightly differently. The first was introduced in EF 4.1 and works with database initializers. The second was introduced in EF 4.3 as part of Code First Migrations. This post describes how these two methods are used, when they are called, and how they differ from each other.

The basic idea

Regardless of the specific Seed method being used the general idea of a Seed method is the same. It is a way to get initial data into a database that is being created by Code First or evolved by Migrations. This data is often test data, but may also be reference data such as lists of known countries, states, etc.

In both cases the Seed method is a virtual Template Method that is overridden by application code to write seed data into the database.

Database initializers

Starting with EF 4.1 Code First and DbContext can be used to create a database for you. This behavior is encapsulated in objects called “database initializers” that implement the IDatabaseInitializer interface. Database initializers run the first time that a DbContext is used and can do things like check if a database already exists and create a new database if needed.

DropCreateDatabaseIfModelChanges

Several IDatabaseInitializer implementations are included in the EntityFramework assembly. Let's take DropCreateDatabaseIfModelChanges as an example since it defines a Seed method and is the most interesting initializer with regards to this discussion. This initializer does the following: This initializer and others like it are intended to be used during initial development of an application where no real data exists yet, or for tests that run against a test database that can be dropped and re-created at will. You can find many uses of this initializer in the open source EF tests on CodePlex.

Database initializer Seed

With respect to Seed the important thing to notice is that Seed is only ever called immediately after a new, empty database has just been created. Seed is never called for an existing database that might already have data in it. This has two important consequences:

Enter Migrations

EF 4.3 introduced Code First Migrations. Migrations provide a way for the database to be evolved without needing to drop and recreate the entire database. Use of Migrations commonly involves using PowerShell commands to manage updates to the database explicitly. That is, database creation and updates are usually handled during development from PowerShell and do not happen automatically when the applications runs. (See The Migrations initializer below for how this can be changed.)

Migrations Seed

Migrations introduced its own Seed method on the DbMigrationsConfiguration class. This seed method is different from the database initializer Seed method in two important ways: This second point is really important and is the reason why the AddOrUpdate extension method is included with Migrations. This method can check whether or not an entity already exists in the database and then either insert a new entity if it doesn't already exist or update the existing entity if it does exist.

Seeding when the model hasn't changed

In the section on database initializers I mentioned that the initializer Seed method will not be called if the database already exists and the model has not changed. This often turned out to be quite an inconvenience. Consider adding a new entity to the model and then running the application without remembering to update the Seed method. The database is dropped and recreated with a table for the new entity. However the new table is empty. So now you update the Seed method and run again…but the table is still empty because the model has not changed since the last run.

People would usually work around this by either:

The Migrations situation

So what should happen if you are using Migrations in a similar situation? The analogous case is that a new entity is added, a migration is created for it, and Update-Database is used to apply the migration without remembering to update the Seed method. As before, the new table is empty and you realize this, so now you update the Migrations Seed method to AddOrUpdate data into the new table. You now run Update-Database again; should the Seed method run?

If we were following the database initializers pattern then the Seed method would not run because the model has not changed since Update-Database was called last time. In other words, there is no new migration to apply. You would then need to create some sort of artificial migration just to get the Seed method to run—note that just deleting the database doesn't work in this case since the database is being evolved by Migrations.

However, since the Seed method must be able to handle existing data anyway why not just run the Seed method when Update-Database is executed regardless of whether or not there is a migration to apply? This is indeed what happens and it means that Seed can be updated and run at anytime without a change to the model being needed.

The Migrations initializer

The two worlds of database initializers and Migrations come together with the MigrateDatabaseToLatestVersion initializer. This is an IDatabaseInitializer implementation that uses your DbMigrationsConfiguration to programmatically run Update-Database when the application starts.

Since Update-Database causes the DbMigrationsConfiguration.Seed method to be called it follows that using this initializer causes that Seed method to be called. And given that Seed is always called when Update-Database is executed it means that Seed will be called every time that the application is started, regardless of whether or not any migrations were actually applied. So when using the Migrations initializer you never need to do anything artificial to get Seed to run.

Long Seed methods

As discussed above, the Migrations Seed method must handle a database that already contains data and the AddOrUpdate method is often used for this purpose. However, the AddOrUpdate method makes an intentional trade-off between ease-of-use in a Seed method and efficiency. This is because it queries the database each time it is called to get any already existing entity.

This behavior is fine for short Seed methods or even for Seed methods that are only run manually when Update-Database is executed in PowerShell. However, it can become a problem when a long running Seed method is used with the Migrations initializer since the Seed method is run every time the application starts.

The best way to handle this is usually to not use AddOrUpdate for every entity, but to instead be more intentional about checking the database for existing data using any mechanisms that are appropriate. For example, Seed might check whether or not one representative entity exists and then branch on that result to either update everything or insert everything.

There is also this CodePlex work item to allow the Migrations Seed method to be run only when a migration is actually applied. This can help with long running Seed methods, but keep in mind that, once this is implemented, switching it on will likely lead to situations where something special has to be done to run Seed because Seed changed but the model did not.

Summary

Both database initializers and Migrations have Seed methods which can be used to seed a database with initial data. However, the original initializer Seed methods only run immediately after database creation, whereas the Migrations Seed method runs anytime Update-Database is used.
This page is up-to-date as of May 28th, 2013. Some things change. Some things stay the same. Use your noggin.