Entity Framework 4.1

Code First: Inside DbContext Initialization

A lot of stuff happens when you use a DbContext instance for the first time. Most of the time you don't worry about this stuff, but sometimes it's useful to know what's happening under the hood. And even if it's not useful, it's hopefully interesting for its geek value alone.

Note that even though there is a lot of detail below I've actually simplified things quite a lot to avoid getting totally bogged down in code-like details. Also, I'm writing this from memory without looking at the code so forgive me if I forget something. :-)

Creating a DbContext instance

Not very much happens when the context instance is created. The initialization is mostly lazy so that if you never use the instance, then you pay very little cost for creating the instance.

It's worth noting that SaveChanges on an un-initialized context will also not cause the context to be initialized. This allows patterns that use auto-saving to be implemented very cheaply when the context has not been used and there is therefore nothing to save.

One thing that does happen at this stage is that the context is examined for DbSet properties and these are initialized to DbSet instances if they have public setters. This stops you getting null ref exceptions when you use the sets but still allows the sets to be defined as simple automatic properties. The delegates used to do this are cached in a mechanism similar to the one described here.

DbContext initialization

The context is initialized when the context instance is used for the first time. “Use” means any operation on the context that requires database access or use of the underlying Entity Data Model (EDM). The initialization steps are:

The context tries to find a connection or connection string:

If a DbConnection was passed to the constructor, then this is used.
Else, if a full connection string was passed, then this is used.
Else, if the name of a connection string was passed and a matching connection string is found in the config file, then this is used.
Else, the database name is determined from the name passed to the constructor or from the context class name and the registered IConnectionFactory instance is used to create a connection by convention.

The connection string is examined to see if it is an Entity Framework connection string containing details of an EDM to use or if it is a raw database connection string containing no model details.

If it is an EF connection string, then an underlying ObjectContext is created in Model First/Database First mode using the EDM (the CSDL, MSL, and SSDL from the EDMX) in the connection string.
If it a database connection string, then the context enters Code First mode and attempts to build the Code First model as described in the next section.

I made a post on the EF Team blog that describes some of the connection handling in more detail.

Building the Code First model

The EDM used by Code First for a particular context type is cached in the app-domain as an instance of DbCompiledModel. This caching ensures that the full Code First pipeline for building a model only happens once when the context is used for the first time. Therefore, when in Code First mode:

DbContext checks to see if an instance of DbCompiledModel has been cached for the context type. If the model is not found in the cache, then:

DbContext creates a DbModelBuilder instance.

By default, the model builder convention set used is Latest. A specific convention set can be used by setting the DbModelBuilderVersionAttribute on your context.

The model builder is configured with each entity type for which a DbSet property was discovered.

The property names are used as the entity set names, which is useful when you're creating something like an OData feed over the model

The IncludeMetadataConvention convention is applied to the builder. This will include the EdmMetadata entity in the model unless the convention is later removed.
The ModelContainerConvention and ModelNamespaceConvention are applied to the builder. These will use the context name as the EDM container name and the context namespace as the EDM namespace. Again, this is useful for services (like OData) that are based on the underlying EDM.
OnModelCreating is called to allow additional configuration of the model.
Build is called on the model builder.

The model builder builds an internal EDM model representation based on configured types and reachability from those types and runs all the Code First conventions which further modify the model/configuration.

The connection is used in this process since the SSDL part of the model depends on the target database, as represented by the provider manifest token.

Compile is called on the DbModel to create a DbCompiledModel. DbCompiledModel is currently a wrapper around the MetadataWorkspace.

The model hash is also created by the call to compile.

The DbCompiledModel is cached.

The DbCompiledModel is used to create the underlying ObjectContext instance.

Database initialization

At this point we have an underlying ObjectContext, created either through Code First or using the EDM in the connection string.

DbContext now checks whether or not database initialization has already happened in the app-domain for the type of the derived DbContext in use and for the database connection specified. If initialization has not yet happened, then:

DbContext checks whether or not an IDatabaseInitializer instance has been registered for the context type.

If no initializer (including null) has been explicitly registered then a default initializer will be automatically registered.

In Code First mode, the default initializer is CreateDatabaseIfNotExists.
In Database/Model First mode, the default initializer is null, meaning that no database initialization will happen by default. (Because your database almost always already exists in Database/Model First mode.)

If a non-null initializer has been found, then:

A temporary ObjectContext instance is created that is backed by the same EDM as the real ObjectContext. This temp is used by the DbContext instance for all work done by the initializer and then thrown away. This ensures that work done in the initializer does not leak into the context later used by the application.
The initializer is run. Using the Code First default CreateDatabaseIfNotExists as an example, this does the following:

A check is made to see whether or not the database already exists.
If the database does not exist, then it is created:

This happens through the CreateDatabase functionality of the EF provider. Essentially, the SSDL of the model is the specification used to create DDL for the database schema which is then executed.

If the EdmMetadata entity was included in the model, then the table for this is automatically created at the same time since it is part of the SSDL just like any other entity.

If the EdmMetadata entity was included in the model, then the model hash generated by Code First is written to the database by saving an instance of EdmMetadata.
The Seed method of the initializer is called.
SaveChanges is called to save changes made in the Seed method.

If the database does exist, then a check is made to see if the EdmMetadata entity was included in the model and, if so, whether there is also a table with a model hash in the database.

If EdmMetadata is not mapped or the database doesn't contain the table, then it is assumed that the database matches the model. This is what happens when you map to an existing database, and in this case it is up to you to ensure that the model matches the database. (Note DropCreateDatabaseIfModelChanges would throw in this situation.)
Otherwise, the model hash in the database is compared to the one generated by Code First. If they don't match, then an exception is thrown. (DropCreateDatabaseIfModelChanges would drop, recreate, and re-seed the database in this situation.)

The temporary ObjectContext is disposed.

Control returns to whatever operation it was that caused initialization to run.

That's the basics. Like I mentioned above, I missed some details intentionally, and I probably missed some more by mistake. Hopefully it was somewhat useful/interesting anyway.

Thanks for reading!
Arthur

P.S. There is an alternate theory of how DbContext works that suggests nuget introduces a herd of unicorns into your machine which then run on treadmills to create magic entity juice that in turn magically connects your objects to your database. I cannot comment on this theory without breaking confidentiality agreements I have signed with the unicorn king. Or something.

This page is up-to-date as of April 15th, 2011. Some things change. Some things stay the same. Use your noggin.