Entity Framework contains two different methods both called Seed that do similar things but behave slightly differently. The first was introduced in EF 4.1 and works with database initializers. The second was introduced in EF 4.3 as part of Code First Migrations. This post describes how these two methods are used, when they are called, and how they differ from each other.
The basic idea
Regardless of the specific Seed method being used the general idea of a Seed method is the same. It is a way to get initial data into a database that is being created by Code First or evolved by Migrations. This data is often test data, but may also be reference data such as lists of known countries, states, etc.
In both cases the Seed method is a virtual Template Method that is overridden by application code to write seed data into the database.
Starting with EF 4.1 Code First and DbContext can be used to create a database for you. This behavior is encapsulated in objects called “database initializers” that implement the IDatabaseInitializer interface. Database initializers run the first time that a DbContext is used and can do things like check if a database already exists and create a new database if needed.
Several IDatabaseInitializer implementations are included in the EntityFramework assembly. Let’s take DropCreateDatabaseIfModelChanges as an example since it defines a Seed method and is the most interesting initializer with regards to this discussion. This initializer does the following:
- Checks whether or not the target database already exists
- If it does, then the current Code First model is compared with the model stored in metadata in the database
- The database is dropped if the current model does not match the model in the database
- The database is created if it was dropped or didn’t exist in the first place
- If the database was created, then the initializer Seed method is called
This initializer and others like it are intended to be used during initial development of an application where no real data exists yet, or for tests that run against a test database that can be dropped and re-created at will. You can find many uses of this initializer in the open source EF tests on CodePlex.
Database initializer Seed
With respect to Seed the important thing to notice is that Seed is only ever called immediately after a new, empty database has just been created. Seed is never called for an existing database that might already have data in it. This has two important consequences:
- Database initializer Seed methods do not have to handle existing data. That is, new entities can be inserted without any need to check whether or not the entities already exist in the database.
- The Seed method will not be called when the application is run if the database already exists and the model has not changed since the last run. We’ll come back to this point later.
EF 4.3 introduced Code First Migrations. Migrations provide a way for the database to be evolved without needing to drop and recreate the entire database. Use of Migrations commonly involves using PowerShell commands to manage updates to the database explicitly. That is, database creation and updates are usually handled during development from PowerShell and do not happen automatically when the applications runs. (See The Migrations initializer below for how this can be changed.)
- It runs whenever the Update-Database PowerShell command is executed. Unless the Migrations initializer is being used the Migrations Seed method will not be executed when your application starts.
- It must handle cases where the database already contains data because Migrations is evolving the database rather than dropping and recreating it.
This second point is really important and is the reason why the AddOrUpdate extension method is included with Migrations. This method can check whether or not an entity already exists in the database and then either insert a new entity if it doesn’t already exist or update the existing entity if it does exist.
Seeding when the model hasn’t changed
In the section on database initializers I mentioned that the initializer Seed method will not be called if the database already exists and the model has not changed. This often turned out to be quite an inconvenience. Consider adding a new entity to the model and then running the application without remembering to update the Seed method. The database is dropped and recreated with a table for the new entity. However the new table is empty. So now you update the Seed method and run again…but the table is still empty because the model has not changed since the last run.
People would usually work around this by either:
- Making a temporary artificial change to the model
- Switching to DropCreateDatabaseAlways, with the consequence that the database is often dropped and recreated when it is not needed
- Manually deleting the database
The Migrations situation
So what should happen if you are using Migrations in a similar situation? The analogous case is that a new entity is added, a migration is created for it, and Update-Database is used to apply the migration without remembering to update the Seed method. As before, the new table is empty and you realize this, so now you update the Migrations Seed method to AddOrUpdate data into the new table. You now run Update-Database again; should the Seed method run?
If we were following the database initializers pattern then the Seed method would not run because the model has not changed since Update-Database was called last time. In other words, there is no new migration to apply. You would then need to create some sort of artificial migration just to get the Seed method to run—note that just deleting the database doesn’t work in this case since the database is being evolved by Migrations.
However, since the Seed method must be able to handle existing data anyway why not just run the Seed method when Update-Database is executed regardless of whether or not there is a migration to apply? This is indeed what happens and it means that Seed can be updated and run at anytime without a change to the model being needed.
The Migrations initializer
The two worlds of database initializers and Migrations come together with the MigrateDatabaseToLatestVersion initializer. This is an IDatabaseInitializer implementation that uses your DbMigrationsConfiguration to programmatically run Update-Database when the application starts.
Since Update-Database causes the DbMigrationsConfiguration.Seed method to be called it follows that using this initializer causes that Seed method to be called. And given that Seed is always called when Update-Database is executed it means that Seed will be called every time that the application is started, regardless of whether or not any migrations were actually applied. So when using the Migrations initializer you never need to do anything artificial to get Seed to run.
Long Seed methods
As discussed above, the Migrations Seed method must handle a database that already contains data and the AddOrUpdate method is often used for this purpose. However, the AddOrUpdate method makes an intentional trade-off between ease-of-use in a Seed method and efficiency. This is because it queries the database each time it is called to get any already existing entity.
This behavior is fine for short Seed methods or even for Seed methods that are only run manually when Update-Database is executed in PowerShell. However, it can become a problem when a long running Seed method is used with the Migrations initializer since the Seed method is run every time the application starts.
The best way to handle this is usually to not use AddOrUpdate for every entity, but to instead be more intentional about checking the database for existing data using any mechanisms that are appropriate. For example, Seed might check whether or not one representative entity exists and then branch on that result to either update everything or insert everything.
There is also this CodePlex work item to allow the Migrations Seed method to be run only when a migration is actually applied. This can help with long running Seed methods, but keep in mind that, once this is implemented, switching it on will likely lead to situations where something special has to be done to run Seed because Seed changed but the model did not.
Both database initializers and Migrations have Seed methods which can be used to seed a database with initial data. However, the original initializer Seed methods only run immediately after database creation, whereas the Migrations Seed method runs anytime Update-Database is used.