Automating the creation of NuGet packages with different .NET versions


Image result for +nugetI created a couple of straightforward libraries to be used in almost every project. So evidently these libraries are a good candidate for NuGet. This will decouple the libraries from the projects that they are used in. It also forces the Single Responsibility principle because every NuGet package can be used on its own, with only dependencies on (very few) other NuGet packages.

Creating the packages for one version of .NET is quite straightforward, and well-documented. But the next request was: “can you upgrade all our projects from .NET 4.5.2 to .NET 4.6.1, and later to .NET  4.7?”.

The plan

We have over 200 projects that preferably all are compiled in the same .NET version. So clicking each project open, change it’s version, and compile it is not really an option…

  <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
  <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
  <TargetFrameworkProfile />


Investigating the .csproj files I noticed that there is 1 instance of the <TargetFrameworkVersion> element that contains the .NET version. When I change it, in Visual Studio the .NET version property is effectively changed. So this is easy: using Notepad++ I replace this in all *.csproj files and recompile everything. This works but …

What about the NuGet packages?

The packages that I created work for .NET 4.5.2, but now we’re at .NET 4.6.1. So this is at least not optimal, and it will possibly not link properly together. So I want to update the NuGet packages to contain both versions. That way developers who are still at 4.5.2 with their solutions will use this version automatically, and developers at 4.6.1 too. Problem solved.  But how …

Can this be automated?

Creating the basic NuGet package

This is quite good explained on the website. These are the basic steps:

Technical prerequisites

Download the latest version of nuget.exe from, saving it to a location of your choice. Then add that location to your PATH environment variable if it isn’t already.
Note:  nuget.exe is the CLI tool itself, not an installer, so be sure to save the downloaded file from your browser instead of running it.

I copied this file to C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7Tools, which is already in my PATH variable (Developer command prompt for VS2015). So now I have access to the CLI from everywhere, provided that I use the Dev command prompt of course.

So now we can use the NuGet CLI, described here.

Nice to have

From their website:

NuGet Package Explorer is a ClickOnce & Chocolatey application that makes it easy to create and explore NuGet packages. After installing it, you can double click on a .nupkg file to view the package content, or you can load packages directly from nuget feeds like, or your own Nuget server.

This tool will prove invaluable when you are trying some more exotic stuff with NuGet.

It is also possible to change a NuGet package using the package explorer. You can change the package metadata, and also add content (such as binaries, readme files, …).


Prerequisites for a good package

An assembly (or a set of assemblies) is a good candidate to be a package when the package has the least dependencies possible. For example a logging package would only do logging, and nothing more. Like that NuGet packages can be used everywhere, without special conditions. When dependencies are necessary, then they are preferably on other NuGet packages.

Creating the package

In Visual Studio, create a project of your choice. Make sure that it compiles well.

Now open the DEV command prompt and enter

nuget spec

in the folder containing your project file. This will generate a template .nuspec file that you can use as a starting point. This is an example .nuspec file:

<?xml version="1.0"?>
<package xmlns="">
    <!-- The identifier that must be unique within the hosting gallery -->

    <!-- The package version number that is used when resolving dependencies -->

    <!-- Authors contain text that appears directly on the gallery -->

    <!-- Owners are typically identities that allow gallery
         users to early find other packages by the same owners.  -->

    <!-- License and project URLs provide links for the gallery -->
    <!-- The icon is used in Visual Studio's package manager UI -->
    <!-- If true, this value prompts the user to accept the license when
         installing the package. -->

    <!-- Any details about this particular release -->
    <releaseNotes>Added binaries for .NET 4.6.1</releaseNotes>

    <!-- The description can be used in package manager UI. Note that the gallery uses information you add in the portal. -->
    <description>Logging base class </description>
    <!-- Copyright information -->
    <copyright>Copyright ©2017</copyright>

    <!-- Tags appear in the gallery and can be used for tag searches -->
    <tags>diagnostics logging</tags>

    <!-- Dependencies are automatically installed when the package is installed -->
      <!--<dependency id="EntityFramework" version="6.1.3" />-->

  <!-- A readme.txt will be displayed when the package is installed -->
    <file src="readme.txt" target="" />

Now run

nuget pack

in your project folder, and a Nuget package will be generated for you.

Verifying the package

If you want to know if the contents of your package are correct, use Nuget Package Explorer to open your package.


Here you see a package that I created. It contains some meta data on the left side, and the package in 2 versions on the right side. You can use this tool to add more folders and to  change the meta data. This is good and nice, but not very automated. For example, how can we create a Nuget package like this one, that contains 2 .NET versions of the libraries?

Folder organization

I wanted to separate the creation of the package from the rest of the build process. So I created a NuGet folder in my project folder.

I moved the .nuspec file into this folder, to have a starting point and then I created a batch file that solved the following problems:

  1. Create the necessary folders
  2. Build the binaries for .NET 4.5.2
  3. Build the binaries for .NET 4.6.1
  4. Pack both sets of binaries in a NuGet package

I also wanted this package to be easily configurable, so I used some variables.

The script

Initializing the variables

set ProjectLocation=C:\_Projects\Diagnostics.Logging
set Project=Diagnostics.Logging

set NugetLocation=%ProjectLocation%\NuGet\lib
set ProjectName=%Project%.csproj
set ProjectDll=%Project%.dll
set ProjectNuspec=%Project%.nuspec
set BuildOutputLocation=%ProjectLocation%\NuGet\temp

set msbuild="C:\Program Files (x86)\MSBuild\14.0\bin\msbuild.exe"
set nuget="C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\nuget.exe"

The 2 first variables are the real parameters. All the other variables are built from these 2 variables.

The %msbuild% and %nuget% variables allow running the commands easily without changing the path. Thanks to these 2 lines this script will run in any “DOS prompt”, not just in the Visual Studio Command Prompt.

Setting up the folder structure

cd /d %ProjectLocation%\NuGet
md temp
md lib\lib\net452
md lib\lib\net461
copy /Y %ProjectNuspec% lib
copy /Y readme.txt lib

imageIn my batch file I don’t want to rely on the existence of a specific folder structure, so I create it anyway. I know that I can first test if a folder exists before trying to create it, but the end result will be the same.

Notice that I created Lib\Lib. The first level contains the necessary “housekeeping” files to create the package, the second level will contain the actual content that goes into the package file. The 2 copy statements copy the “housekeeping” files.

Building the project in the right .NET versions

%msbuild% "%ProjectLocation%\%ProjectName%" /t:Clean;Build /nr:false /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.5.2
copy /Y "%BuildOutputLocation%"\%ProjectDll% "%NugetLocation%"\lib\net452\%ProjectDll%

%msbuild% "%ProjectLocation%\%ProjectName%" /t:Clean;Build /nr:false /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.6.1
copy /Y "%BuildOutputLocation%"\%ProjectDll% "%NugetLocation%"\lib\net461\%ProjectDll%

The secret is in the /p switch

When we look at a .csproj file we see that there are <PropertyGroup> elements with a lot of elements in them, here is an extract :

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="12.0" DefaultTargets="Build" xmlns="">
  <Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
    <!--  …   -->
    <!--  …   -->
    <!--  …   -->

Each element under the <PropertyGroup> element is a property that can be set, typically in Visual Studio (Project settings). So compiling for another .NET version is as simple as changing the <TargetFrameworkVersion> element and executing the build.

But the /p flag makes this even easier:

%msbuild% "%ProjectLocation%\%ProjectName%" 
          /t:Clean;Build /nr:false 
          /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.5.2

In this line MSBuild is executed, and the properties OutputPath, BuildOutputLocation, Release, Platform and TargetFrameworkVersion are set using the /p switch. This makes building for different .NET versions easy. You can find more information about the MSBuild switches here.

So now we are able to script the compilation of our project in different locations for different .NET versions. Once this is done we just need to package and we’re done!

cd /d “%NugetLocation%”
%nuget% pack %ProjectNuspec%


We automated the creation of NuGet packages with an extensible script. In the script as much as possible is parameterized so it can be used for other packages as well.

It is possible to add this to your build scripts, but be careful to not always build and deploy your NuGet packages when nothing has changed to them. This is a reason that I like to keep the script handy and run it when the packages are modified (and only then).


MSBuild Reference

NuGet CLI Reference

Posted in .Net, Architecture, Codeproject, Development | Tagged | Leave a comment

Implementing the Excel Simulator in F#


In my previous post we talked about how to structure an Excel workbook to make it easy to perform changes. As a side effect we have now a good idea of what is the data, and what is the functionality that the workbook implements. This allows us to replace the workbook by an equivalent F# program, which was our “hidden agenda” all along 😉

What we want to achieve:

  • high-level: use a set of input parameters to generate a data set of output records
  • mid-level: some calculations can already be done on the input parameters without the corresponding records in the database. We want to separate these out.
  • low-level: we want to read data from the database to lookup some values, to filter the rows that we need and to transform them into the output records.
  • We will need to implement some Excel functionality, for example the VLOOKUP function.

I am not an expert in F# (yet), so feel free to drop comments on how this can be done better or different.

Attack Plan

There are 2 main ways to start this journey: top-down or bottom-up. Actually there is also a third way in which we work top-down and bottom-up at the same time, to meet in the middle. This may be what we need here.

First helper functions (aka bottom-up)

let round100 (x:decimal) = // ROUND(i_ConsJr*r.NormalPricePerkWh/100;2)
    let y = System.Math.Round x 
    y / 100M
This is an easy function to start with: round a decimal on 2 digits after the decimal point. This function is used quite some times, and it is very easy to write, so why not use it 😉
let n2z (x:Nullable<decimal>) =
    if x.HasValue then x.Value else 0M

let n2zi (x:Nullable<int>) =
    if x.HasValue then x.Value else 0

let n2b (x:Nullable<bool>) =
    if x.HasValue then x.Value else false
Another set of easy functions to cope with database NULL values. Very simple, but useful!
let rec LookupNotExact ls f v =
    match ls with
    | [x] -> x
    | h::t::s -> 
        if f t > v then h
        else LookupNotExact (t::s) f v
    | [] -> raise (OuterError("LookupNotExact over empty list"))

(* Tests for LookupNotExact
let testls1 = [1;2;3;4;5]
let res1_3 = LookupNotExact testls1 (fun x -> x) 3
let res1_5= LookupNotExact testls1 (fun x -> x) 7

let testls2 = [1;2;3;6;7]
let res2_3 = LookupNotExact testls2 (fun x -> x) 3
let res2b_3 = LookupNotExact testls2 (fun x -> x) 5
let res2_7 = LookupNotExact testls2 (fun x -> x) 7
let res2b_7 = LookupNotExact testls2 (fun x -> x) 9

The LookupNotExact function mimics the behavior of the Excel VLookup function. It finds in a sorted list the first value that is greater or equal than v. The nice thing is that this function can easily be tested using F# interactive. Just remove the comment from the tests , select the function with its tests and hit alt+enter. This will execute the selected code and display the results in the F# interactive window.

Some data structures

The next data structures serve only to make the code more readable. We could do without them just as easy. Some examples:

type Currency = decimal

type GasVolume =
    | KWH of decimal
    | M3 of decimal

type Languages = NL |FR
Using Currency instead of decimal makes it easy to see what is the purpose of a variable. It takes away the guessing about what a variable holds. Technically it is not different from decimal.
The gas volume can be expressed in either KWH or cubic meters. That is what we see in this data type. Again using 2 different constructors make clear what we are dealing with.
Languages is just an enumerator of 2 languages, as we would do in C#.
With these functions in place (and then some more boring ones) we can start to emulate the Excel formulas that we need. I’m not going into detail on this because I don’t want law suits 😉

Converting to CSV

In the end the results are exported. We export the values as a CSV file, which can be easily read back into Excel (for validation purposes). This will involve some reflection, here is the code:

module Csv

let ref = box "#REF!"

// prepare a string for writing to CSV            
let prepareStr obj =
    if obj = null then "null"
            .Replace("\"","\"\"") // replace single with double quotes
            |> sprintf "\"%s\""   // surround with quotes

let combine s1 s2 = s1 + ";" + s2   // used for reducing

let mapReadableProperties f (t: System.Type) =
        |> Array.filter (fun p -> p.CanRead)
        |> f
        |> Array.toList

let getPropertyvalues x =
    let t = x.GetType()
    t   |> mapReadableProperties (fun p -> 
            let v = p.GetValue(x)
            if v = null then ref else v

let getPropertyheaders (t: System.Type) =
    t   |> mapReadableProperties (fun p -> p.Name)
        |> prepareStr
        |> Seq.reduce combine

let getNoneValues (t: System.Type) =
    t   |> mapReadableProperties (fun p -> ref)

let toCsvString x =
    x |> getPropertyvalues
      |> prepareStr
      |> Seq.reduce combine
Let’s start with the ToCsvString function. It almost says what it does:
  • Get the property values from x (which is the part using reflection).
  • Each property value is mapped to a good CSV value (if it contains a double quote, then the double quote will be doubled, surround the value by double quotes)
  • Everything is combined in a comma-separates string using the Seq.reduce method.

The other functions are quite easy to understand as well.

The actual program

let main  = 
    let newOutput = Calculate myInput

    // output
    let newOutputHeaders = newOutput |> List.head |> newOutputPropertyHeaders
    let newOutputCsv = newOutput |> newOutputPropertyValues

	newOutputHeaders :: newOutputCsv);

    printfn "Find the output in %s" @"C:\temp\NewOutput.csv"

    printfn "Press enter to terminate..."
    Console.ReadLine() |> ignore

The program is composed of some simple statements that use the functions that we previously described. This makes the program very easy to read. Not much explanation is needed, but here goes:

  • newOutput will contain the result of the calculations using the input. This is the main purpose of the program. If this were implemented as a service, newOutput would be returned and that’s it.
  • For debugging purposes we output this as a CSV file, using the functions in the Csv module.


Writing this simulation as an F# program was not too hard. Immutability is baked into the FP paradigm, which was perfect for this case. So you could say that this is a nice match.

The Excel workbook itself is quite complex (and big). It is hard to maintain and to extend. The F# code on the other hand is quite readable. A nice (and unexpected) side-effect is that now we understand much better what goes on in the Excel, which helps us to maintain the Excel for as long as it is still used. Another nice thing is that the (non-technical) end-used is able to understand the F# code (with some explanation).


Posted in .Net, Codeproject, Development | 2 Comments

Structuring your Excel – “the hidden agenda”


Most developers don’t like Excel as a “development platform”. Many projects are migration projects from an Excel solution to a “real application”. And often this is worth the trouble. But in some cases Excel has a lot of advantages.

An Excel workbook can be set up in many ways, and it usually starts off very small, to end in a big spaghetti where nobody knows what is going on. And nobody dares to change a thing. Sounds like a typical spaghetti .NET (or enter your favorite language) project. So let’s try to make our Excel project manageable.

Structuring your Excel

Every workbook starts with the first sheet. And initially there are some cells that we use as input, and then some cells are used as output. If you want to keep a nice overview of what is happening is is worth creating separate sheets for separate concerns.


Input sheet(s)

Use this sheet to gather all the input parameters from your users. This can be a simple sheet. In the end this us the user interface to your workbook, so make it easy for users to enter data. Use data validation and lookups to limit errors. The input sheets should be the only thing that can be modified by the end-users.

Enriching the input sheet

imageYou can also do some simple calculations that are local to the input sheet. For example adding 2 input fields together to see their total may be handy here. This also gives immediate feedback to the user. Looking up the city that goes with a ZIP code is another good example. I know that most of us have a problem remembering the syntax of “lookup” in Excel, hence the link Winking smile.

Depending on the nature of your applications there can be 1 or more input sheets. For example a simulation using some parameters will typically contain 1 input sheet, where an accounting application (with cash books, bank books, …) may contain multiple input sheets.

Principle: all the calculations that only concern the input data can be done already here. Using some data from the datasheets (as static data) can also be done here. This will give us the first intermediate results.

Output sheet(s)

imageThese sheets will contain the output for your users. The sheets should not contain calculations, only formatting. They will present the results from the calculation sheet(s). Of course formatting means changing fonts, colors, … and also cell formats.

Principle: This sheet contains no calculations, only formatting. Everything should be calculated in the calculations sheets already.

Data sheet(s)

imageYour workbook will probably need some data, and if you’re lucky this data is in a structured format. This data serves as static input for your calculations.

Often you will want to calculate some results per data row. This can be done in a separate sheet per data sheet that will contain all the necessary calculations that are only using data from that data sheet. Eventually you will want this data to be stored in a database to be able to access it from many applications (think reporting, for example).

To accommodate for this you can create the data sheets to contain only raw data, and then

  • Add columns to the raw data that contain calculations on the data per row. Maybe you have some fields to be added already in the data that you’ll need later, some Lookups to do. All that does not involve the input data can be done here. Make sure you put the calculations away from the data (separate them by some empty columns for later expansion of the raw data). It may also be a good idea to use colors to indicate which are the calculated fields.
  • Add a new sheet that will contain all the calculations for this data sheet that depend on the input parameters. This sheet will calculate intermediate results that are needed later.

Principle: Separate data from calculations, either by adding calculated columns at the end of the raw data, or by adding dedicated sheets to calculate these intermediate results.

Calculation sheet(s)

This is where you perform more complex calculations. Some simple calculations can be done on the input sheets and the data sheets already, but all the complex work should happen here. The calculation sheets will use the (calculated) data from many sheets and combine this data to become the final (unformatted) results. Because all the simple calculations have been done in the input- and datasheets, the calculation sheet is just consolidating this data.

Principle: This is where the sheets in the workbook are combined to produce the end result. The local calculations per sheet are done already, so that only the consolidation logic remains.

Keep your calculations as local as possible

In the proposed structure it is easy to see that calculations are done in the input sheet over only the input data (and maybe some lookups using the data sheets). Calculations over the data is done in a separate sheet per data sheet.

It is only in the calculation sheets that data will be combined from different sheets. In this way the workbook remains manageable, even when it grows bigger.

Structuring your workbook like this will also make intermediate results visible. These intermediate results will be calculated only once, and can be used everywhere. This has a couple of advantages:

  • The workbook is simpler because the intermediate results are calculated as close as possible to their data.
  • It is easier to create more complex formulas when the intermediate results are calculated already. Instead of trying to squeeze everything in 1 formula there are now building blocks available to be used.
  • The results become more consistent. The formulas for calculating the intermediate results are not repeated all over the workbook. So there is less of a risk that in some instances a formula is (slightly) different. This will prevent hard to solve errors.
  • The results are easy to test. The intermediate results are simple, so verifying if they are correct is easy. The more complex formulas are using the intermediate results as building blocks, so they become much easier to test too. If something goes wrong it is easier to track back where it went wrong. This can be seen as a kind of unit testing.

Did you notice how I don’t care about performance in this case? That is because the other advantages outweigh this by far. But of course there could be a performance gain as well.

Named cells and regions

This is an easy one. Excel allows to name cells or ranges of cells. Try to use a naming convention, for example:

  • Cells containing input fields will have a name starting with I_  (for example I_Zip code)
  • Ranges containing data for lookups can start with l_  (for example l_locations)
  • Cells containing intermediate results can start with c_  (for example c_TotalConsumption)

In this way the purpose of each cell or range is clear.

An additional advantage is that you can now move the named cells and ranges elsewhere in the workbook if you need to. As long as the new zone gets the same name, all the formulas referring to it will still work.

Excel as a functional language

If you look closely at an Excel workbook, then you’ll notice that the cells either contain input values which can be modified by the users, or output values. The output values are obtained by calculating formulas in various sheets, and with many dependencies. But each formula contains functions that have no side effects (unless you start to use non-deterministic functions of course).

So calling a function will never do something bad to your input data. It will of course update the cell which contains the formula, in a deterministic way.

Conclusion – Our hidden agenda

Once the workbook is properly structured it becomes easy to separate the data from the functions. The functions are now very clear and easy to implement in some programming language. The names ranges can be the names of variables in your program, making the match easy.

We use the Excel workbook here as input – processing – output, which can be perfectly done in a Functional language, such as F#. You will end up with data coming from the database, input parameters and functions to be applied. This will then give the end results. More on this in a later post.

Posted in Architecture, Codeproject | Tagged | 5 Comments

Improving throughput by using queue-based patterns


In my current project we let the users run simulations. Because of flexibility, the calculations are performed by Excel spreadsheets. There are many different spreadsheets available, for different kinds of simulations. When the calculations rules for one of the simulations change it is just a matter of modifying the corresponding Excel spreadsheet. So in the current workflow this is how things happen (simplified):


Some of the spreadsheets contain a lot of data. The advantage of this is that the spreadsheet is completely self-contained, and there is no dependency on a database. This also means that the spreadsheets can easily be tested by the domain specialists.

But as you can see in the picture, some of the spreadsheets turn very big. So loading the spreadsheet in memory can sometimes take up to 10 seconds. We are lucky enough (?) to not have too many users using the simulations simultaneously. It is not always possible to keep the all the different spreadsheets in memory as memory is still not unlimited.

In some situations it is necessary to recalculate simulations. This is done by a daily batch job. As you can imagine this job runs pretty long. And the users must wait until the next day to see the new results because the job only runs once per day.

As a practical remark: Most of the pictures below contain a link to a page with more explanations.


As we can see, the design is really flexible but it has its drawbacks:

In the user interface it can take some time for a simulation to finish. When not too many users are performing simulations at the same time, this is more or less acceptable. Once the load on the server becomes heavier the throughput is not sufficient.

This solution is not easily scalable. We can put several servers behind a load balancer, but the servers will require a lot of memory and CPU(s) to perform the calculations timely.

The spreadsheets have a limit on their size. I don’t know by heart the maximum size of an Excel spreadsheet, but I am sure that it will be far less than any database. Also a lookup in a database happens very fast (if you put the indexes right etc), whereas a VLookup (or an alike function) in Excel will plough through the data sequentially. So the solution is not very extensible.

When many users are performing calculations the server will be very charged, and at quite moments the server does nothing. But when the batch job starts, the server will be charged again. Luckily we have a timeframe during the night where the server is less used, so we can then run the batch jobs without disturbing our users.

Introducing a queue


To spread the load of the simulations better in time we can start by putting requests on a queue. Several clients can put their requests on the queue at a variable rate. The messages are processed by a worker at a consistent rate. As long as there are messages on the queue, the working will handle them one by one. So no processing time goes lost.

So now when a simulation needs to be recalculated, it can be put on the queue and handled in time. The lengthy batch job is not needed anymore.

Of course with this initial design we introduce new problems. It is clear that requests coming from the UI should be processed before the background messages (fka “the batch job”).

A second problem (that is out of scope for this post) is that we put a message on the queue in a “fire and forget” mode. We just count on the fact that the queue with its worker will handle the message eventually. For the background messages this is ok, but the requests coming from the UI must return a result. This can be done in several ways, one of the most obvious ways being a response queue. When the worker has finished a calculation the result is put on the response queue, which is handled by a working on the UI side. This design will require the use of correlation IDs to work properly.

Improving the throughput

The previous solution will improve the average throughput because the batch requests are handled together with the UI requests. But it may take a while to empty the queue.

So the “Competing Consumers Pattern” was invented.


The left side of the queue can still receive requests from multiple (types of) clients. On the right side there can be more than 1 worker processing the messages. More handlers mean that more work is done in a shorter period.

Depending on the queue depth (the number of messages that are waiting on the queue) more workers can be added or removed. This is what we call elasticity.

So now the average time that a message is on the queue will be shorter, and we don’t use more CPU than necessary. This can be important in cloud scenarios where you have to pay per cycle.

The problem that background requests are still mixed with UI requests remains, but they will be handled faster, and in a scalable way.

Giving some priorities

We want UI requests to be executed first. So ideally they are put first on the queue. This takes us right into the “Priority Queue Pattern.


Each request will receive a priority. When a high priority request is placed on the queue, it will be executed before the lower level requests. So we put the UI requests on the queue with a high priority to make our users happy again.

This pattern can either be implemented with 1 queue handling the priorities, or by creating a high-priority queue and a low-priority queue. The high-priority queue can have more workers than the low-priority queue, saving CPU cycles on the low-priority queue again.

What about security?

We can create a small component (with its own endpoint) before the queue. This component can verify for each request if the user has the necessary rights to execute the request. We call this the “Gatekeeper Pattern”.


We can also validate the requests before they go on the queue (fire and forget), so we can give immediately a fault back to the client. We want to prevent exceptions in the workers, because this poses a problem: we can’t always report the error back to the client. Some solutions are to log the errors, or to create an error queue that can be handled elsewhere. This is also out of scope for this post.

Intermediate result 1


The solution that we have so far:

  • On the left side we can have many (different) clients sending requests to the queue.
  • Before the requests are handled, the gatekeeper verifies the permissions, and also the content of the message. This provides a fail fast mechanism and an initial feedback to the client.
  • Then the request is put on the queue with a specific priority.
  • The worker roles handle the messages in order of priority. When needed; more workers can be spawned to improve the throughput.
  • Finally the results are stored. Possibly the results of the calculations are put on a response queue to be consumed by the client.

Further improving the throughput

Currently the simulation spreadsheet is self-containing. It contains all the necessary data to execute a simulation. Let’s say that one of the input parameters is a zip code, and that we want to look up a record for this zip code. This means that the spreadsheet now contains thousands of rows with zip codes and their associated, of which only 1 row is needed.

So we could pass the request to a dedicated queue that will enrich the request with the zip data and then pass it to a next queue to perform the rest of the calculations.

Maybe after the calculation is done we want to enrich the results (as an example, we may perform translations). Of course there is a pattern for this: the “Pipes and Filters Pattern“.


Example pipeline:

  • Task A: enrich input parameters with data from the database (ex: lookups)
  • Task B: perform the Excel simulations
  • Task C: enrich the results (ex: translations)

There are some obvious advantages to this approach:

  • The spreadsheet doesn’t need to contain the data for the lookups, so it becomes smaller. This means that it will load faster, its memory footprint will be less, and it will be more performant because the (sequential) lookups aren’t necessary anymore.
  • The simulation becomes less complex. The domain experts can now concentrate on the problem at hand instead of performing all the lookups.
  • Tasks A and C will probably be much faster than task B (the simulation itself). So we can assign more workers to task B to balance the workload.

Adding more queues to simplify the work

In the current design every request must be analyzed to see which type of simulation must be executed. It would be simpler to have a queue or a pipeline per type of simulation. This can be accomplished by the “Message Router Pattern“.


The first queue implements the Message Router. Based on the content of the request, he message is routed to one of the queues.

Each type of simulation gets its own queue, making the processing per simulation simpler. Of course more queues will be involved, and it may be a good idea to start drawing the solution now.

Intermediate Result 2


The flow now becomes:

  • A client send a request to the Gatekeeper endpoint. If the request is allowed and valid, it is passed to the Message Router.
  • The Message Router analyzes the content of the request, and sends it to the corresponding queue (Simulation 1, 2 or 3).
  • The simulation queues are implemented as a pipeline where the input is enriched, the simulation is performed and the output is enriched. Finally the result is stored.
  • Depending on the tasks to be performed in each queue one or more workers can be assigned. This makes the solution highly scalable.

There are some more advantages:

  • Separation of Concerns. The implementation of each worker is simple because the whole workload is separated over multiple simple jobs.
  • Monitoring. It is easy to see where messages are in the process. This is impossible in a monolithic implementation.
  • Find the bottleneck. We only need to check the queue depths to find out where a possible bottleneck is. We can then assign more workers to this queue (or let Azure do this automatically).


Performing the simulation in one service made it very hard to cache the spreadsheets. The spreadsheets were big, and there are many types of simulations that would reside in one address space. Now we can load the spreadsheet in the worker role(s) where it is needed, resulting in the “Cache-Aside Pattern“.


The data for the lookups (enriching of the input parameters) can easily be kept in memory and the data for the translations as well.

Final Result


By separating all the workers it is easy to cache only the data that is needed. Client processes can be on different servers, and the worker processes as well. So we have effectively decoupled the clients from the workers. The throughput can be easily managed by playing with the number of workers, and the efficiency can be greatly enhanced by using caching.

In the end the implementation looks more complicated, but it is implemented in small, simple pieces that work together.


In this post I tried to take you through a couple of cloud design patterns. It is clear that this solution is very well suited to run in the cloud, because a lot of the functionality is already available. For example in Azure it is quite easy to set up the necessary queues, define the workers, and make it work.

There are many variations on this solution, each with its own advantages and drawbacks. So this is not THE solution to all problem. But it does show that we can create scalable, performant solutions by decoupling functionality using queue patterns.

If you have any ideas to add more patterns, or use different ones in this solution, feel free to use the comments section!


Cloud Design Patterns

Enterprise Integration Patterns

Posted in Architecture, Azure, Codeproject, Design Patterns, Development | Tagged , , , | Leave a comment

Knockout, self, this, TypeScript. Are you still following?


I’m working on an MVC application with simple CRUD operations. I want the following functionality (don’t mind the Flemish (Dutch) titles):


Remember, I’m “graphically handicapped”, so I’m afraid that my users will have to do with the standard Bootstrap lay-out for now.

The buttons are self-explanatory and always show the same dialog box. The blue button (Bewaren = Save) changes to Insert or Delete, depending on which action the user chooses.

I will need this simple functionality on some more pages, so I want to make it a bit more generic. I don’t want to use an existing grid because the functionality I need for now is so simple that any grid would be overkill. And of course I’m suffering the NIH syndrome. I will describe the generic solution in another post later.

Knockout and the “self” thingy

If you have worked with Knockout before then you know that it is advisable to do something like this (from :

function TaskListViewModel() {
    // Data
    var self = this;
    self.tasks = ko.observableArray([]);
    self.newTaskText = ko.observable();
    self.incompleteTasks = ko.computed(function () {
        return ko.utils.arrayFilter(self.tasks(), function (task) { return !task.isDone() 
               && !task._destroy });

    // Operations
    self.addTask = function () {
        self.tasks.push(new Task({ title: this.newTaskText() }));

    // ...


TaskListViewModel is actually a function behaving like a class. As JavaScript doesn’t have classes (yet, wait for ES6), this is the only way to emulate classes. In every OO language, there is an implicit binding on “this”, referring to the object on which a method is called. As you may expect by now, this is different in JavaScript. “this” is referring to where the function is called from, and this is not necessarily the [emulated] class. This is one of the reasons that we all love JavaScript so much.  </sarcasm>

There are some ways to tackle this problem, and in the Knockout library they choose to use the pattern that you see in the code above. When the TaskListViewModel  is created, this refers to itself. So we then assign this to a variable in the Model:

var self = this;

The nice thing is now that we can call the functions in TaskListViewModel  from anywhere (that is, with possibly a different “this”) and that they will operate on the correct “self”.

Let’s try this in TypeScript

In TypeScript the problem remains the same but is even more tricky to detect. The code looks and feels like C# (thank you, Microsoft Glimlach) but eventually it is just JavaScript in disguise. So the “this” problem remains. And actually it get worse, check out the following (incomplete) code:

class Color {
    ColorId: KnockoutObservable<number>;
    ShortDescription: KnockoutObservable<string>;
    Description: KnockoutObservable<string>;

    constructor(id: number, shortDescription: string, description: string) {
        this.ColorId = ko.observable(id);
        this.ShortDescription = ko.observable(shortDescription);
        this.Description = ko.observable(description);

In TypeScript every member of a class must be prefixed by this. So that should take care of the scoping problem, not?

Let’s add the ColorsModel class to this and then investigate some behavior:

class ColorsModel { 
    Action: KnockoutObservable<Actions> = ko.observable<Actions>();
    CurrentItem: KnockoutObservable<Color> = ko.observable<Color>();
    Items: KnockoutObservableArray<T> = ko.observableArray<T>();

    Empty(): Color {
        return new Color(0, "", "");

    Create(c: any): Color {
        return new Color(c.colorId, c.shortDescription, c.description);

    InsertColor(): void {
        var newColor: Color = this.Empty();

    RemoveColor(Color: Color): void {

    UpdateColor(Color: Color): void {


var model = new ColorsModel();

In short, we create the ColorsModel class, which contains an array of colors (Items). This model is then bound to the page containing this script. For more information on this check out

In the page we have the following (partial!) html:

    <button class="btn btn-info" data-bind='click: $root.InsertColor'><span class="glyphicon glyphicon-plus" aria-hidden="true"></span>  Kleur toevoegen</button>

    <table class="table table-striped">
                <th>Korte beschrijving</th>
        <tbody data-bind="foreach: Items">
                <td data-bind='text: ShortDescription'></td>
                <td data-bind='text: Description'></td>
                    <div class="btn-group" role="toolbar">
                        <button title="Update" type="button" class="btn btn-default" data-bind='click: $root.UpdateColor'><span class="glyphicon glyphicon-pencil" aria-hidden="true"></span></button>
                        <button title="Delete" type="button" class="btn btn-default" data-bind='click: $root.RemoveColor'><span class="glyphicon glyphicon-trash" aria-hidden="true"></span></button>

As you can see on the <tbody> element, we bind the Items collection from the ColorModel to the rows. Each item in the collection will create a new <tr> with its values. We also create an update and a delete button, both bound to the $root methods UpdateColor (…) and RemoveColor(…).

The problem

Running the application in the browser and clicking on the “update” button doesn’t seem to work. So using the debugger in the browser we discover the following:


“this” is not the $root (thus the ColorModel). In a real OO language this would have been the case. Here “this” points to the current color, where we clicked on the “update” button. The debugging Console then rubs it in further:

SCRIPT438: Object doesn't support property or method 'Action'

As you can see in the RemoveColor(…) method, I found a quick work around involving the use of the global variable model. Maybe that isn’t the right solution after all…

Next attempt to solve the problem

First of all, this attempt didn’t last long, you’ll quickly notice why.

class ColorsModel {
    Self: ColorsModel = this;

    UpdateColor(Color: Color): void {

Remember that in TypeScript when you want to use a property you need to prefix it with “this”? As we now know “this” points to the wrong object, so it won’t have a property “this”. I feel a Catch 22 coming up.

A clean solution: arrow notation

    UpdateColor = (item: Color): void => {

When using the => to assign the function to the member UpdateColor, TypeScript will do the necessary to make this work. Looking in the debugger we see this:


And yet, everything is working fine.

If you can’t beat them, confuse them

So how is this possible? The bottom line: this is not this. Let’s see at the JavaScript that TypeScript generates for our arrow function:

var _this = this;
UpdateColor = function (item) {

So the TypeScript “this” is translated into “_this“, and is implemented in just the same way as “self” was before. That solved the scoping problem. The debugger doesn’t catch this subtlety and show us the content of “this”, hence the confusion. But clearly everything works as it should and our problem is solved in an elegant way.


I’m sorry about this confusion post in which I tried to explain that this is not _this, but in plain JavaScript this is self, but in TypeScript this is _this. If you understand this conclusion then you have read the article quite well. Congratulations.

Do you know another solution to this problem? Feel free to share it in the comments!



Posted in Codeproject, Debugging, Development, JavaScript, TypeScript | Tagged , | Leave a comment

Structuring an MVC app for easy testing


In this article I want to give you some handles to structure your MVC applications in such a way that they become easier to test. The article isn’t really about MVC, but if you want more information on MVC, I put some references at the end of this article.

MVC is a pattern typically used for creating web applications. Of course it can be (and is) applied for other types of applications as well. In this article I only talk about APS.NET MVC.

One of the reasons to separate the application in (at least) model – view – controller is to promote testability. We want to be able to test the application with as little as possible dependencies.

In this post I only concentrate on testing the server side, which will be more than enough for 1 post.

MVC Responsibilities

Image result for yodaI only give a short introduction, to make sure that we’re all on the same page.

<Yoda voice> More to say there is! </Yoda voice>


The View

To start with the easiest one: the view will be sent to the user, typically as HTML. The view can contain all the logic if you wish, because in the Razor syntax you can use everything in C# that you can use elsewhere. BUT that is not the idea. The view should bind variables, walk over collections, and generate the HTML from ready made data. In the view there should be as little processing as possible (server side).

Your view can also contain logic in JavaScript for code that is to be executed in the client browser, ultimately resulting in a Single Page Application.

The Controller

The client request is routed from the browser to the controller. Of course this is a simplification, but it will do for now. The controller contains methods with zero or more arguments that the routing system will call. This is done automatically in the framework.

The responsibility of the controller is to use the arguments to generate the output needed in the view. For example, arguments can be used to obtain customers for a certain ZIP code. In this case the controller will obtain only the required customers and send them into the view. As we saw before the view will receive this data from the controller and represent it.

Slightly more advanced: the controller can also choose to display a different view. The principle remains the same though: the view receives the data and renders is.

We want to keep the controller as simple as possible, so we let us help by the Model classes. You may notice that I’m already trying to split up complex parts into simple parts to make them easy to test – as is the purpose of this article.

The Model

The Model contains most of the classes that will be used in the Controllers and in the Views. Try to keep these classes as simple as possible as well and separate responsibilities. The Model is often split into 2 specific parts:

Data Model

Typically these are classes generated by Entity Framework (if you use database first), or your code first classes. I also like to include the repositories in the data model.

Other classes that can go in here are classes that are generated from SOAP or REST web services (by adding a web service proxy to your project).

These classes are mainly used in the Controllers to either modify data, or to obtain data (or both).


As the name implies the ViewModel is used by the views. In a small application it may be overkill to create a separate ViewModel , and you can use the classes from the data model. But very soon the ViewModel will contain more (or other) information than the Data model:

  • the ViewModel may contain only those fields that are necessary in the view(s)
  • It may contain other field names, in case this is clearer for the View. Sometimes field names in a database have to follow some (company) rules, or names from a web service may be very generic names. In those cases translating them into something more “speaking” may help your designers. The developer who creates the Controllers and other back-end code and the front-end developer are often not the same guy (or girl).
  • It may contain calculated fields, aggregated fields, transformed fields, …
  • It may contain extra annotations to indicate to the user interface that fields are mandatory, have specific validations, have default values, different (localized) captions. These can then be picked up in the View to automatically generate some validation rules.
  • etc.

This means that the responsibility of the Controller now becomes:

  • Use the arguments to obtain data. The data will be in the Data Model format
  • Convert the obtained data into ViewModel classes
  • Pass this ViewModel to the View, which can then represent the data

Converting from the Data Model to the View Model

If the conversion is straightforward then it may be handy to use a library like AutoMapper, which will take care of the mapping of the fields for you. If the mappings become more complex I would advice to write specific conversion classes and methods. AutoMapper is capable of a lot of customization in its mapping, but you risk to complicate your code more than by writing a simple conversion function. Think also about the poor guy who needs to debug this code. Usually that’s you!

It will be clear now that the conversions must be tested as well. The tests can be simple / obvious, but when you extend or modify your classes the tests will fail if you don’t adapt your conversions as well. This will create a nice TODO list for you…

Setting up the tests

Now  that we have briefly determined the responsibilities of the MVC parts, we can set up and implement tests.

Setting up the test project

If you haven’t already created a test project, do so now (refer to my previous posts about this if you are not sure how to do this). A shortcut can be to right-click the Index method and then select “Create Unit test”. This will present you a nice dialog and do all the hard work for you.

Because we are going to test an MVC application, based on the ASP.NET MVC classes we’ll also need to install the Microsoft.AspNet.Mvc Nuget package. You can do this in the Package Manager Console (Tools > Package Manager > Package Manager Console) and type

install-package Microsoft.AspNet.Mvc

Also add a reference to the “Microsoft.CSharp” assembly. This will make sure that you can use the “dynamic” type in your tests.

Testing the Model

This should be easy because these are just normal classes. Feel free to read my other articles on this in the Testing category of this site.

Typically the Model classes will access a database, or call web services. For the unit tests this must be mocked of course, for example using a library such as MOQ. The other MVC classes will depend on the Model. So make sure you put enough effort in testing the Model classes.

Testing the Controller

As we saw before, the controller must orchestrate some calls into the Model, and bring together the results. These results are then passed into the view to be represented. So in most cases you don’t want to generate representation data in the controller, as that is the view’s responsibility.

Let’s take the example of a simple (stubbed) AgendaController:

    public class AgendaController : Controller
        // GET: Admin/Agenda
        public ActionResult Index()
            List<Agenda> agendas = new List<Agenda>();
            agendas.Add(new Agenda { Description = "Dr.X", Id = 1 });
            agendas.Add(new Agenda { Description = "Dr.No", Id = 2 });
            agendas.Add(new Agenda { Description = "Dr.Who", Id = 3 });

            List<String> resources = new List<String>();
            ViewBag.Resources = resources;
            return View(agendas);


In the Index( ) function a list of Agendas is created, and in this case filled in with some random data. The same is done with a list of resources and then 2 methods of data passing are used:

  • ViewBag: this will create a new property on the dynamic object ViewBag, to be able to pass the resources collection in the View.
  • return View(Agendas): This will use the Model property in the view, which will contain this collection. The data type of Model is determined in this line:
@model IEnumerable<Planning365.Data.Agenda>

This prevents us from having to cast Model everywhere in the View.

Writing the test for AgendaController.Index( )

I choose an easy example to test, with no parameters; but the principles remain the same.

    public class AgendaControllerTests
        public void IndexTest()
            // arrange
            AgendaController sut = new AgendaController();

            // act
            ViewResult res = (ViewResult)sut.Index();
            List<String> resources = res.ViewBag.Resources;
            List<Agenda> agendas = (List<Agenda>) res.Model;

            // assert
            Assert.IsTrue(agendas.Any(a => a.Id == 1));
            Assert.IsTrue(agendas.Any(a => a.Id == 2));
            Assert.IsTrue(agendas.Any(a => a.Id == 3));
            Assert.AreEqual("", res.ViewName);

Using the AAA pattern, we first arrange the test. In this case it is only 1 line, instantiating the controller. The controller is just a normal CLR class, which happens to derive from the MVC Controller base class, so its instantiation is simple. If you use dependency injection then the steps in the “arrange” phase may be:

  • Create a mocked instance of the classes to be injected
  • Use the right constructor to pass this into the class.

In the “act” phase we call the Index method. I also create some local variables to store the results that are to be tested. As we said when describing the controller, we use 2 ways to pass data into the view, and that is the reason that we have these 2 lines here. The “resources” variable retrieves the controller data via the ViewBag, the “agendas” variable retrieves its data via the Model. Notice that the Model needs to be casted to use it.

The assertions use the obtained controller data to make sure that it is correct. These are normal Assert statements, nothing fancy going on.

In the last assertion I test that the ViewName == “”. This is the case when you don’t specify a ViewName in the Controller. If you return different Views from the Controller depending on some arguments, then this is the way to test if the correct View is returned.

Testing the View (?)

There is no straightforward way to test your Views in the MVC Framework, which is a good indication that this is not a good idea. There are some libraries and Open Source projects to test Views but is it worth it?

If all is well your View doesn’t contain any business logic. This should be in the Model classes, and possibly also in the Controllers of your project. So the view only contains representation logic. It will bind values to HTML controls, possibly loop over a collection and that’s it.

Also Views may change a lot. This is what a user sees from the application, so users will probably want to change lay-out, order of fields, colors, … You don’t want to adapt your tests each time a lay-out change has occurred.

So in my opinion it usually is a smell if you need to test your Views. And it is better to deal with the smell than to invest time in testing your Views.


The MVC framework has been created to be testable from the beginning. This is clearly demonstrated by the nice Separation of Concerns (MVC) in the framework. But as always it is still possible to mess things up 😉

The advice I try to give in this article is to make small testable Model classes that are easy to test. Then put them together in the Controller, with the necessary conversions. The conversions are preferably in separate (testable) methods.

The Controller methods can then be tested in the usual way. There are some small caveats in the setup of the test project.

Did I forget something? Would you do things differently? Say it in the Comments section!


ASP.NET home

Introduction to ASP.NET MVC

Creating Unit Tests for ASP.NET MVC Applications








Posted in .Net, Codeproject, Development, MVC, Testing | Leave a comment

What is the role of the database?


Most non-trivial applications will use a database to store data. This database can be relational or not, but some data store will be needed.

Most modern database management systems (DBMS) can do much more than just storing data, and as we know: with great power comes great responsibility.

What should a DBMS do?

A DBMS has 3 main responsibilities:

  • Store correct data, preferably in a durable way (but not always, check out in-memory OLTP). So simple data validation is part of this responsibility.
  • Retrieve data correctly and fast.
  • Security. Make sure that data can only be modified or retrieved by the right people. This is not the scope of this post.

A DBMS has many possibilities to fulfill these 2 conditions. In this post I will discuss mostly a general DBMS, using SQL Server as an example. So Oracle fans, please don’t shoot me.

Storing data

Data is stored in tables. That could conclude this chapter, but let’s see what happens when we insert / update / delete a record in SQL Server. I simplified things, and I probably forgot some actions left and right. Feel free to let me know in the comments section!

Let’s use this table as an initial example:


Inserting a record

When a record is inserted in the database, multiple validations are performed:

  • Check if all the mandatory fields are filled. In the example these are all the fields where “Allow Nulls” is not checked.
  • Check if all the fields are in a correct format. It will not be possible to store ‘ABC’ in the MarkID field, because its data type is int.
  • Check the conditions between fields. Maybe you’re storing a period with a from- and a to-date. Probably you’ll want the from-date to be before the to-date. This can be enforced using check constraints.
  • Check unique constraints. A primary key has to be unique, and this must be validated. In the example the ProductID is the primary key (as you can see by the key symbol next to it). Other unique constraints need to be validated. You can for example make sure that the first name, last name are unique by creating a unique index over these 2 fields.
  • Check referential integrity. We saw in last week’s post how to normalize a database, which will create related tables in the database. The relationships (foreign keys) must be checked as well. Creating these relationships will enforce this, and tools (like Entity Framework designer) can use this information to automatically create relations in the model.
  • Set fields with default values. Typically these are fields like:
    • ID, which can be an identity field in SQL Server, or come from a sequence in Oracle (and SQL Server too in the latest versions).
    • A GUID (unique identifier) that can be generated as a default value.
    • A data field (example: CreationDate) which can be set to the current date (or timestamp)
    • Specific defaults that have been created for this table.

When all this is done the row will be inserted. This means that the log file is updated, and that the row is added in the right memory page. When a checkpoint occurs the page will be written to disk.

After the row is written the insert triggers are fired (if any). Insert triggers can also perform validations, and rollback the transaction. This is AFTER the row has been written, so it is not the most efficient way of validating data. The trigger execution is part of the same transaction, so if the record is invalid we can rollback in the trigger. This means that the record has been written in de log file already, and this action is now undone.

Usually the need for triggers indicates some data denormalization. This can be done for performance reasons, or to perform more exotic validations.

Updating a record

Most of the actions for an insert will be performed for an update as well. Default values will not be set, this only occurs when a record is inserted. Update triggers can be used to set fields like [LastUpdated] for auditing purposes. Referential integrity is verified, and can be handled in several ways.


In the update rule we see 4 possible actions:

  • No Action – When the field is updated and referential integrity is broken an exception will be thrown and the update will not be performed.
  • Cascade – The value of this field will set the value of the foreign key in all the detail records as well.
  • Set Null – All the detail records’ foreign key will be set to null.
  • Set Default – You can guess once Knipogende emoticon


After the update the update triggers are executed, with the same undo logic.

Deleting a record

The data page for the row is retrieved and the row is marked as deleted. Referential integrity is checked, and can be handled in the same ways as when updating a record. When the row is deleted the delete triggers are executed.

Retrieving data

It is important to retrieve data as fast as possible. Users don’t want to wait for minutes for a web request to return. And often,when a request is slow it is due to the database. Of course there can be many other causes, but the database is usually the first thing to look at. Retrieving the data is the simplest operation, but it is very critical because when inserting / updating / deleting records, the DBMS must also retrieve the correct record(s) to work with.

Indexes are very important at this stage, because without indexes the only mechanism to retrieve data is a full table scan. This will soon become a problem for performance. Indexes are not the scope of this article.

In a well-normalized database queries can quickly become complex. Most development tools (like Visual Studio, SQL Server Management Studio, …) have query builders that take away the heavy typing. These tools work best when you have also created all the relationships between your tables.


SQL Server has a nice designer to maintain your relationships, the Database Diagram Designer. It allows you to create relationships in a graphical way.


When you create relationships you do yourself some favors:

  • The database is automatically documented. When you see the database diagram, you understand immediately how tables are related. In our little example we see that a SalesOrderHeader can have many SalesOrderDetails, and must be linked to one customer.
  • The DBMS can now enforce relational integrity. A SalesOrderDetail must be linked to a SalesOrderHeader.

What Should the application do?

Data validation (again)

Now that the database is set up the application can focus on using the data. We are sure that most data can only be entered in a correct way in the database. This doesn’t mean that no verification must be done at the application level, but if we fail to verify the DBMS will make sure that no invalid data can be entered. This also applies when data is modified using other tools than the application (for example by using SQL Server Management Studio or linking tables in Access or Excel and directly modify the data).

Some good reasons to still perform data validation at the application level are:

  • Users want to have immediate feedback about the data they enter. If is a pity if a user enters all their data, only to find out that at the last step (saving into the database) there are some things incorrect and hey have to start again.
  • Referential integrity means that we store the (often meaningless) key to the related table. If you would for example use GUIDs for your primary keys then users would be required to know – remember – type these GUIDs in the user interface. No user will do this. Combo boxes and other mechanisms are more user friendly. Many tools will generate this automatically if the relations are properly put in the database.
  • We preferably enter the data correctly in the database. the DBMS will verify your data, but rejecting the data means that a lot of time is wasted. If you’re alone on the database this isn’t a problem, but on a loaded system with thousands of users this will impact the performance for everybody. It is better then to verify the data on the user side as much as possible.
  • Not all verifications can be done on the user side. Unique constraints are typically only checked at database level (and therefor require indexes to be created).


Entities must be stored in the database. Depending on how your application is set up entities can correspond to single tables, or be stored in multiple tables. This can be done by simple insert statements (or by your O/RM, which will do the same in the end).

How about stored procedures for CRUD?

This used to be a no-brainer. 10 years ago we would create a procedure for each insert / update / delete, and one or more procedures to read data. Advantages of this approach are:

  • Consistency. Every modification in the database is done via a procedure.
  • Data validation. More advanced evaluations can be done in the procedure, before issuing the actual DML statement.
  • Automatic auditing / logging. In the same stored procedure we can insert / update / delete records, and then write some entries in a log table. This is typically done in a the same transaction.
  • Security. For the 2 previous reasons we may want to block direct DML access to the tables and only allow modifications (and possibly reads) through stored procedures.

As always there are also disadvantages:

  • Many procedures. If we need at least 4 procedures per table, the number of procedures will grow fast. In a moderately big database this means hundreds of stored procedures to be written and maintained. For this reasons there are tools available that will generate the CRUD procedures for a table. This doesn’t solve the problem of the many procedures, but it will at least reduce typing!
  • Code organization. Some code will be repeated in many procedures. So either we copy / paste or we create a new stored procedure for this common code. In SQL Server is it not possible to create “internal / private” procedures that can only be called from other procedures and “public” procedures that can be called from the outside. So everything will rely on good naming conventions and discipline.
  • Tools. Most tools and frameworks (ADO.NET, EF) are capable of calling stored procedures for update statements. But in the procedures a lot can be going on, and the tool doesn’t know about all the possible side effects. Cached data can be invalid after calling a stored procedure.

For all these reasons nowadays we usually choose to have the OR/M generate its own DML (data manipulation language) statements and don’t generate all the stored procedures, unless there is a good reason for it. Usually we create repositories that will be called whenever an application needs data or wants to modify data. In these repositories we can add some more validation if needed. Of course this doesn’t prevent the use of applications to directly enter data in the database!

Business logic

In many applications business logic is implemented in stored procedures. This is sometimes referred to being a 2 1/2 tier application. This approach makes it possible to change application logic without recompiling / redeploying the application. Sometimes it can be faster because stored procedures are executed “close to the data”. But (in no specific order)

  • Code organization is a problem again, for exactly the same reasons I have already given before.
  • SQL is not the best language for programming business logic. Even though the most basic language constructs (loops, selection, …) are available, it usually is easier (and more maintainable) to write the code in an OO language (or for fans of functional languages in F# or Haskell Knipogende emoticon ). Compiler support will be much better.
  • Your application code depends closely to the DBMS. If your software becomes popular and must run on different DMBSs you’re out of luck!
  • It is hard to debug SQL stored procedures. Since the latest versions of Visual Studio it is possible to debug stored procedures, but it still complicates things and it is not as powerful as the C# debugger.
  • You need a good SQL developer who knows what he is doing on the database. Don’t forget that the database layer is used by many users. So if a procedure is badly written it may impact the performance of the DBMS, and hence the performance of every application that depends on it.
  • By default database objects (such as procedures) are not versioned. So if someone modifies a stored procedure and now your application doesn’t work anymore you’ll have a good time finding what has been changed. Some would describe this as “job security”.
  • It doesn’t scale! I kept the best for last. When you discover that stored procedures are the bottleneck of your application, you can’t just put another server on the side to add some more power.

So even though it is possible to use stored procedures to implement business logic, there are some good reasons not to go that way!

So when are stored procedures OK then?

Simple: when you implement functionality that has to do with your database, and which doesn’t implement business logic. For example just copying client data may qualify.

Having said that this is simple I can tell you that if you put a couple DBAs together and you throw this “in the group”, you can expect long and hefty discussions.


The database is a very important part of your application. So it is a good idea to not make it your bottleneck. Try to use the DBMS for guaranteeing that data is stored as correct as possible and retrieved efficiently. Stored procedures (and user functions for that matter) are very useful, but must not be used to implement business logic. With modern OR/M frameworks the use of stored procedures for CrUD operations is not encouraged anymore.

Posted in Architecture, Codeproject, Databases, Development, Entity Framework | Tagged | 2 Comments