Database initializer and Migrations Seed methods

Entity Framework contains two different methods both called Seed that do similar things but behave slightly differently. The first was introduced in EF 4.1 and works with database initializers. The second was introduced in EF 4.3 as part of Code First Migrations. This post describes how these two methods are used, when they are called, and how they differ from each other.

The basic idea

Regardless of the specific Seed method being used the general idea of a Seed method is the same. It is a way to get initial data into a database that is being created by Code First or evolved by Migrations. This data is often test data, but may also be reference data such as lists of known countries, states, etc.

In both cases the Seed method is a virtual Template Method that is overridden by application code to write seed data into the database.

Database initializers

Starting with EF 4.1 Code First and DbContext can be used to create a database for you. This behavior is encapsulated in objects called “database initializers” that implement the IDatabaseInitializer interface. Database initializers run the first time that a DbContext is used and can do things like check if a database already exists and create a new database if needed.

DropCreateDatabaseIfModelChanges

Several IDatabaseInitializer implementations are included in the EntityFramework assembly. Let’s take DropCreateDatabaseIfModelChanges as an example since it defines a Seed method and is the most interesting initializer with regards to this discussion. This initializer does the following:

  • Checks whether or not the target database already exists
  • If it does, then the current Code First model is compared with the model stored in metadata in the database
  • The database is dropped if the current model does not match the model in the database
  • The database is created if it was dropped or didn’t exist in the first place
  • If the database was created, then the initializer Seed method is called

This initializer and others like it are intended to be used during initial development of an application where no real data exists yet, or for tests that run against a test database that can be dropped and re-created at will. You can find many uses of this initializer in the open source EF tests on CodePlex.

Database initializer Seed

With respect to Seed the important thing to notice is that Seed is only ever called immediately after a new, empty database has just been created. Seed is never called for an existing database that might already have data in it. This has two important consequences:

  • Database initializer Seed methods do not have to handle existing data. That is, new entities can be inserted without any need to check whether or not the entities already exist in the database.
  • The Seed method will not be called when the application is run if the database already exists and the model has not changed since the last run. We’ll come back to this point later.

Enter Migrations

EF 4.3 introduced Code First Migrations. Migrations provide a way for the database to be evolved without needing to drop and recreate the entire database. Use of Migrations commonly involves using PowerShell commands to manage updates to the database explicitly. That is, database creation and updates are usually handled during development from PowerShell and do not happen automatically when the applications runs. (See The Migrations initializer below for how this can be changed.)

Migrations Seed

Migrations introduced its own Seed method on the DbMigrationsConfiguration class. This seed method is different from the database initializer Seed method in two important ways:

  • It runs whenever the Update-Database PowerShell command is executed. Unless the Migrations initializer is being used the Migrations Seed method will not be executed when your application starts.
  • It must handle cases where the database already contains data because Migrations is evolving the database rather than dropping and recreating it.

This second point is really important and is the reason why the AddOrUpdate extension method is included with Migrations. This method can check whether or not an entity already exists in the database and then either insert a new entity if it doesn’t already exist or update the existing entity if it does exist.

Seeding when the model hasn’t changed

In the section on database initializers I mentioned that the initializer Seed method will not be called if the database already exists and the model has not changed. This often turned out to be quite an inconvenience. Consider adding a new entity to the model and then running the application without remembering to update the Seed method. The database is dropped and recreated with a table for the new entity. However the new table is empty. So now you update the Seed method and run again…but the table is still empty because the model has not changed since the last run.

People would usually work around this by either:

  • Making a temporary artificial change to the model
  • Switching to DropCreateDatabaseAlways, with the consequence that the database is often dropped and recreated when it is not needed
  • Manually deleting the database

The Migrations situation

So what should happen if you are using Migrations in a similar situation? The analogous case is that a new entity is added, a migration is created for it, and Update-Database is used to apply the migration without remembering to update the Seed method. As before, the new table is empty and you realize this, so now you update the Migrations Seed method to AddOrUpdate data into the new table. You now run Update-Database again; should the Seed method run?

If we were following the database initializers pattern then the Seed method would not run because the model has not changed since Update-Database was called last time. In other words, there is no new migration to apply. You would then need to create some sort of artificial migration just to get the Seed method to run—note that just deleting the database doesn’t work in this case since the database is being evolved by Migrations.

However, since the Seed method must be able to handle existing data anyway why not just run the Seed method when Update-Database is executed regardless of whether or not there is a migration to apply? This is indeed what happens and it means that Seed can be updated and run at anytime without a change to the model being needed.

The Migrations initializer

The two worlds of database initializers and Migrations come together with the MigrateDatabaseToLatestVersion initializer. This is an IDatabaseInitializer implementation that uses your DbMigrationsConfiguration to programmatically run Update-Database when the application starts.

Since Update-Database causes the DbMigrationsConfiguration.Seed method to be called it follows that using this initializer causes that Seed method to be called. And given that Seed is always called when Update-Database is executed it means that Seed will be called every time that the application is started, regardless of whether or not any migrations were actually applied. So when using the Migrations initializer you never need to do anything artificial to get Seed to run.

Long Seed methods

As discussed above, the Migrations Seed method must handle a database that already contains data and the AddOrUpdate method is often used for this purpose. However, the AddOrUpdate method makes an intentional trade-off between ease-of-use in a Seed method and efficiency. This is because it queries the database each time it is called to get any already existing entity.

This behavior is fine for short Seed methods or even for Seed methods that are only run manually when Update-Database is executed in PowerShell. However, it can become a problem when a long running Seed method is used with the Migrations initializer since the Seed method is run every time the application starts.

The best way to handle this is usually to not use AddOrUpdate for every entity, but to instead be more intentional about checking the database for existing data using any mechanisms that are appropriate. For example, Seed might check whether or not one representative entity exists and then branch on that result to either update everything or insert everything.

There is also this CodePlex work item to allow the Migrations Seed method to be run only when a migration is actually applied. This can help with long running Seed methods, but keep in mind that, once this is implemented, switching it on will likely lead to situations where something special has to be done to run Seed because Seed changed but the model did not.

Summary

Both database initializers and Migrations have Seed methods which can be used to seed a database with initial data. However, the original initializer Seed methods only run immediately after database creation, whereas the Migrations Seed method runs anytime Update-Database is used.

About Arthur Vickers

Developer on the Entity Framework team at Microsoft.
This entry was posted in Code First, Code First Migrations, Database Initializers, DbContext API, Entity Framework and tagged , , , , , , , , . Bookmark the permalink.

6 Responses to Database initializer and Migrations Seed methods

  1. Gary says:

    I have an existing database with table A. I want to start using EF so I used “add-migration InitialSchema -IgnoreChanges” to give me an initial migration that does nothing but contains the metadada of the current model.
    I have a new model C which I want to create with a couple of default records.
    The first time I use the C model in the code, the table C is created OK. I wanted to add the default records in the Seed method, but it doesn’t get called at this time.Not sure why as a new model C is added at this point.
    The Seed method gets called only when I restart the application once the C table is created.

    Not sure why Seed doesn’t get called until I restart the application. Is there a way to call the Seed method without restarting ?

    I’m using the MigrateDatabaseToLatestVersion initializer.

    • @Gary If you are using the MigrateDatabaseToLatestVersion initializer than the initializer is just running update-database for you automatically when the application starts. To get the Seed method in the Migrations config to run at any other time just do a manual update-database from the Package Manager console.

      Thanks,
      Arthur

  2. mathewleising says:

    Hey Arthur,

    The powershell commands are amazing, I have also recently started playing around with a project Steve Sanderson and some other put together called t4scaffolding. In their project, they build out the scaffolders using a Cmdlet attribute (http://msdn.microsoft.com/en-us/magazine/cc163293.aspx). Looking through the EF source code, it seems like you guys created yours a little differently. I was curious on why you decided to choose a different route? I wanted to know because I wanted to see if I could maybe contribute some to the project by extending the amount of cmdlets that EF has. I like all the open source stuff and want to get involved and help.

    Thanks,
    Matt

    • @mathewleising The approach we took was driven by the need to use EntityFramework.dll remotely without locking it in the PowerShell app domain. This was especially problematic in earlier versions of NuGet.

  3. Bhushan Shah says:

    As far as I understand, the MigrateDatabaseToLatestVersion initializer just ensures that Update-Database is run whenever the application starts. However, I still need to run Add-Migration manually before running the application, which defeats the puepose of having such an initializer in the first place. I mighht as well run Update-Database manually alng with Add-Migratin before running the application. Is this right or am I missing something?

    Thanks in advance.

    • If automatic migrations are enabled in the migrations config then Add-Migration is not needed to run Update-Database. This was one intention for this initializer. However, I would not recommend automatic migrations because not every change can be done automatically by Migrations and also because they make management of the migrations difficult especially in a team environment. The other use for this initializer is to allow the database to be automatically migrated in, for example, a test environment at the time that the application/tests are run. So for example, migrations can be added/managed/applied on a dev box and then the initializer could automatically apply them when code is run on a CI server.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s