Why would you use Common Table Expressions?

Introduction

In this article I suppose that you have a good understanding of SQL already. I will introduce some concepts very briefly before moving on to Common Table Expressions.

Below you can find the relevant database diagram of the database that I will use in this article:

image

How is SQL processed by SQL Server?

When we look at a basic SQL statement; the general structure looks like

SELECT <field list>
FROM <table list>
WHERE <row predicates>
GROUP BY <group by list>
HAVING <aggregate predicates>
ORDER BY <field list>

As a mental picture we see the order of execution as:

First determine where the data will come from. This is indicated in the <table list>. This list can contain zero or more tables. When there are many tables, they can be joined using inner or outer join operators, and possibly also cross join operators. At this stage we consider the Cartesian product of all the rows in all the tables.

select count(*) from [HR].[Employees]                    -- 9
select count(*) from [Sales].[Orders]                    -- 831
select count(*) from [HR].[Employees], [Sales].[Orders]  -- 7479
select 9 * 831                                           -- 7479

In the third query we combine the tables, without a join operator. The result will be all the combinations of employees with orders, which explains the 7479 rows. This can escalate quickly.

As a side remark: this is valid SQL, but when I encounter this in a code review it will make me suspicious. One way to make clean that you want all these combinations is the CROSS JOIN operator:

select count(*) from [HR].[Employees] cross join [Sales].[Orders]    -- 7479

This will be handled exactly the same as query 3, but now I know that this is on purpose.

Image result for sql joke

Once we know which data we are talking about, we can then filter using the <row predicates> in the where clause. This will make sure that soon in the process the number of rows is limited. In most join operators there is a condition (inner join T1 on <join condition>) which would be applied here, again limiting the number of rows.

select count(*) 
from [HR].[Employees] E 
inner join [Sales].[Orders]    O on E.empid = O.empid    -- 831

The predicate E.empid = O.empid will make sure that only the relevant combinations are returned.

If there is a group by clause, that happens next, followed by the filtering on aggregated values.

Then finally SQL looks at the <field list> to determine which fields / expressions / aggregates to make available, and then the order by clause is applied.

Of course this is all just a mental picture

Imagine a join between 3 tables, each containing 1000 rows. The resulting virtual table would contain 1.000.000.000 rows, on which SQL would have then to select the right ones. Through the use of indexes SQL Server will only obtain the relevant row combinations.  Each DBMS (Database Management System) contains a query optimizer that will intelligently use indexes to obtain the rows in the <table list>, combined with the <row predicate> from the where condition, and so on. So, if the right indexes are created in the database, only the necessary data pages will be retrieved.

Inner queries

The table list can also contain the result of another SQL statement. The following is a useless example of this:

select count(*) 
from (select * from [HR].[Employees]) E

This example will first create a virtual table named E as the result of the inner query, and use this table to select from. We can now use E as a normal table, that can be joined with other tables (or inner queries).

Tip: It is mandatory to give the inner select statement an alias, otherwise it will be impossible to work with it. Even if this is the only data source that you use, an alias is still needed.

As an example I want to know the details of the 3 orders that gave me the highest revenue. To start with, I first find those 3 orders:

select top 3 [orderid], [unitprice] * [qty] as LineTotal
from [Sales].[OrderDetails]
order by LineTotal desc

This gives us the 3 biggest orders:

orderid LineTotal
10865 15810,00
10981 15810,00
10353 10540,00

Now I can use these results in a query like

select *
from [Sales].[OrderDetails]
where orderid in (10865, 10981, 10353)

which will give the order details for these 3 orders, at this point in time. I can use the result of the previous query in the where condition to make the query work at any point in time:

select *
from [Sales].[OrderDetails]
where orderid in 
(
    select top 3 [orderid]
    from [Sales].[OrderDetails]
    order by [unitprice] * [qty] desc 
)

This query will give me the correct results. I just had to adapt some things from the initial query because the IN clause requires a list of values, so we can only return 1 value (the [orderId]. The order by clause then needs to use the full expression. Don’t worry, no more calculations than needed will be done. Trust the optimizer!

To further evolve this query we can now use an inner join instead of WHERE … IN. The resulting execution plan will be the same again, and the results too.

select *
from [Sales].[OrderDetails] SOD
inner join (select top 3 [orderid]
    from [Sales].[OrderDetails]
    order by [unitprice] * [qty] desc) SO 
on SO.orderid = SOD.orderid

Common Table Expressions

With all this we have gently worked toward CTEs. A first use would be to separate the inner query from the outer query, making the SQL statement more readable. Let’s first start with another senseless example to make the idea of CTEs more clear:

;with cte as
(
    select top 3 [orderid]
    from [Sales].[OrderDetails]
    order by [unitprice] * [qty] desc
)
select * from cte

What this does is to create a (virtual) table called cte, that can then be used in the following query as a normal data source.

Tip: the semicolon at the front of the statement is not needed if you just execute this statement. If the “with” statement follows another SQL statement then both must be separated by a semicolon. Putting the semicolon in front of the CTE makes sure you never have to search for this problem.

The CTE is NOT a temporary table that you can use. It is part of the statement that it belongs to, and it is local to that statement. So later in the script you can’t refer to the CTE table again. Given that the CTE is part of this statement, the optimizer will use the whole statement to make an efficient execution plan. SQL is a declarative language: you define WHAT you want, and the optimizer decides HOW to do this. The CTE will not necessarily be executed as first, it will depend on the query plan.

Let’s make this example more useful:

;with cte as
(
    select top 3 [orderid]
    from [Sales].[OrderDetails]
    order by [unitprice] * [qty] desc
)
select *
from [Sales].[OrderDetails] SOD 
inner join cte on SOD.orderid = cte.orderid

Now, for us humans we have split the query in 2 parts: we first calculate the 3 best orders, then we use the results of that to select their order details. Like this we can show the intent of our query.

In this case we use the CTE only once, but if you would use it multiple times in this query it would become more useful.

Hierarchical queries

image

In this table we see a field empid, and a field mgrid. (Almost) every employee has a manager, who can have a manager, … So clearly we have a recursive structure.

This kind of structures often occurs with

  • compositions
  • Categories with an unlimited level of subcategories
  • Folder structures
  • etc

So let’s see how things are organized:

select [empid], [firstname], [title], [mgrid]
from [HR].[Employees]

Gives us the following 9 rows:

image

We can see here that Don Funk has Sara Davis as a manager.

If we want to make this more apparent, we can join the Employees table with itself to obtain the manager info (self-join):

select E.[empid], E.[lastname], E.[firstname], 
       E.[title], E.[mgrid],
       M.[empid], M.[lastname], M.[firstname]
from [HR].[Employees] E
left join [HR].[Employees] M on E.mgrid = M.empid

Notice that a LEFT join operator is needed because otherwise the CEO (who doesn’t have a manager) would be excluded.

image

We could continue this with another level until the end of the hierarchy. But if a new level is added, or a level is removed, this query wouldn’t be correct anymore. So let’s use a hierarchical CTE:

;with cte_Emp as
(
select [empid], [lastname] as lname, [firstname], [title], 
       [mgrid], 0 as [level]
from [HR].[Employees]
where [mgrid] is null

union all

select E.[empid], E.[lastname], E.[firstname], E.[title], 
       E.[mgrid], [level] + 1 
from [HR].[Employees] E 
inner join cte_Emp M on E.mgrid = M.empid
)
select *
from cte_Emp

I’ll first give the result before explaining what is going on:

image

As explained before we start with a semicolon, to avoid frustrations later.

We then obtain the highest level of the hierarchy

select [empid], [lastname], [firstname], [title], 
       [mgrid], 0 as [level]
from [HR].[Employees]
where [mgrid] is null

This is our starting point for the recursion. Using UNION ALL we now obtain all the employees that have Sara as a manager. This is added to our result set, and then for each row that is added, we do the same, effectively implementing the recursion.

To make this more visual I added the [level] field, so you can see how things are executed. Row 1 has level 0, because this is the part of the query (0 as [level]). The for each pas in the recursive part, the level is incremented. This explains perfectly how this query is executed.

Conclusion

Common Table Expressions are one of the more advanced query mechanisms in T-SQL. They can make your queries more readable, or perform queries that would otherwise be impossible, such as outputting a hierarchical list. In this case the real power is that a CTE can reference itself, making it possible to handle recursive structures.

Reference

https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql

Posted in Codeproject, Databases, Development, SQL | Tagged , | Leave a comment

Creating a Visio Add-In in VS2017

Problem statement

A good friend asked me the following question:

How can I in Visio change the color to a previously selected color just by selecting a shape?

Sounds simple enough, but there are some caveats, so here is my attempt to tackle this problem. The main caveats were:

  • hooking up the SelectionChanged event,
  • keeping and accessing the state in the Ribbon (for the default color),
  • setting the color of the selected shape.

The code for this project can be found at https://github.com/GVerelst/ActOnShapeSelection.

Visio Add-In – Preparation

I decided to use Visual Studio 2017 to create a Visio Add-In. To do this we need to install the “Office/SharePoint development” workload. Since Visual Studio 2017; the installer allows for modular installation, so we just need to add this to our installation.

Start Visual Studio Installer (Start -> type “Visual Studio Installer”). In the installer window select “More > Modify”:

image

After a little while this takes us to the workloads selection screen. Select Office/SharePoint development and then click “Modify”.

image

When you launch Visual Studio again you’ll find a new bunch of project templates.

Creating the Visio Add-in

In VS create a new project (File > New > Project…) like this:

image

As you can see there are new project templates for “Office/SharePoint”. I choose the Visio Add-in project and gave it an appropriate name “ActOnShapeSelection”.

The result is a project with one C# file (ThisAddIn.cs). This is where we will initialize our events. As a test we show a message box when starting our add-in, and another one when closing it:

public partial class ThisAddIn
{
    private void ThisAddIn_Startup(object sender, System.EventArgs e)
    {
        MessageBox.Show("Startup ActOnShapeSelection");
    }

    private void ThisAddIn_Shutdown(object sender, System.EventArgs e)
    {
        MessageBox.Show("Shutdown ActOnShapeSelection");
    }

    // …
}

Remark: by default the namespace System.Windows.Forms is not included in the using list, so we will need to add it. An easy way to do this is by clicking on the red-underlined MessageBox and typing ctrl+; (control + semicolon that is). Now we can choose how to solve the “using problem”.

Starting the application (F5) will now start Visio and our first message box is indeed shown. Closing Visio shows the second message box. No rocket science so far, but this proves that our add-in is loaded properly into Visio.

Convincing Visio to do something when a shape is selected is just a bit harder.

Wiring the Selected event

public partial class ThisAddIn
{
    private void ThisAddIn_Startup(object sender, System.EventArgs e)
    {
        Application.SelectionChanged += Application_SelectionChanged;
    }

    private void Application_SelectionChanged(Visio.Window Window)
    {
        MessageBox.Show("SelectionChanged ActOnShapeSelection");
    }
// …
}

The SelectionChanged event must be wired in the startup event. Later we will do the same for the ShapeAdded event. Once you know this “trick” things become easy.

When running this code, each time we select something in Visio we see our fancy message box. So the event wiring works. Now we want to be able to only execute code when a shape is selected. Let’s investigate if we can find out something about the selected object(s) in the “Window” parameter:

image

As expected this is a dynamic object. Visio is accessed over COM. Luckily the debugger allows to expand the dynamic view members. Looking through the members of this object we find a property “Selection”. This looks promising! Looking a bit further, “Selection” is an implementation of the interface IVSelection. And this interface inherits from IEnumerable.

So Selection is actually an enumerable collection of all the selected items, hence it can be handled using a standard foreach( ). Let’s try this:

private void Application_SelectionChanged(Visio.Window Window)
{
    //MessageBox.Show("SelectionChanged ActOnShapeSelection");
    Visio.Selection selection = Window.Selection;
    foreach (dynamic item in selection)
    {
        Visio.Shape shp = item as Visio.Shape;
        if (shp != null)
        {
            shp.Characters.Text = "selected";
        }
    }
}

We run the add-in again (F5) and add 2 shapes on the page. When we select the shapes, they get the text “selected”. So now we are capable of knowing which shapes are selected and doing something useful with them. Let’s add a new ribbon with actions to perform on our shapes. After all, that is the purpose of this exercise.

Adding a ribbon

This can easily be done by right-clicking on the project, New Item. Then select Office/SharePoint > Ribbon (Visual Designer).

image

Name this ribbon “ActionsRibbon.”

Opening the ribbon, we can now use the visual designer. Make sure that the toolbox window is visible (View > Toolbox).

Now we add 3 ToggleButtons on the design surface, named btnRed, btnGreen, and you guessed it: btnBlue.  For each of the buttons we add a Click event by double-clicking on the button. Using the GroupView Tasks we also add a DialogBoxLauncher. This will open a ColorDialog for selecting a custom color.

image

Double-click the group to implement the “DialogLauncherClick” event, which will handle this.

The ActionsRibbon will contain its own data, being the 3 components for a color (Red, Green, Blue):

public byte Red { get; private set; }
public byte Green { get; private set; }
public byte Blue { get; private set; }

Each of the toggle buttons will toggle its own color component:

private void btnRed_Click(object sender, RibbonControlEventArgs e)
{
    Red = (byte)(255 - Red);
}

private void btnGreen_Click(object sender, RibbonControlEventArgs e)
{
    Green = (byte)(255 - Green);
}

private void btnBlue_Click(object sender, RibbonControlEventArgs e)
{
    Blue = (byte)(255 - Blue);
}

Remark: This code works fine if there is no possibility for a custom color. When selecting a custom color the values will become something else than 0 or 255 and not correspond to the UI anymore. I leave it as an exercise to the reader to make a better implementation.

Choosing a custom color:

private void group1_DialogLauncherClick(object sender, RibbonControlEventArgs e)
{
    ColorDialog dlg = new ColorDialog { Color = Color.FromArgb(Red, Green, Blue) };

    if (dlg.ShowDialog() == DialogResult.OK)
    {
        Red = dlg.Color.R;
        Green = dlg.Color.G;
        Blue = dlg.Color.B;
    }
}

In the SelectionChanged event of our AddIn class we now need to refer to the RGB values from the Ribbon. Here is the full code for the event handler:

private void Application_SelectionChanged(Visio.Window Window)
{
    ActionsRibbon rib = Globals.Ribbons.ActionsRibbon;
    Visio.Selection selection = Window.Selection;
    foreach (dynamic item in selection)
    {
        Visio.Shape shp = item as Visio.Shape;
        if (shp != null)
        {
            shp.Characters.Text = "selected";
            shp.CellsSRC[(short)Visio.VisSectionIndices.visSectionObject,3, 0].FormulaU = $"THEMEGUARD(RGB({rib.Red}, {rib.Green}, {rib.Blue}))";
        }
    }
}

There is some Visio-fu to set the color. Consider this as a cookbook recipe. When you do this, you’ll get the desired result. Visio is not always straightforward, one could say.

Now we have a working Visio add-in, that does what was asked. BUT when we add a new shape, it will automatically receive the selected color. To solve this we add another event handler:

Application.ShapeAdded += Application_ShapeAdded;

We also add a boolean indicating if we are adding a shape or not.

bool _isAddingAShape = false;

In the ShapeAdded event we set its value to true:

private void Application_ShapeAdded(Visio.Shape Shape)
{
    _isAddingAShape = true;
}

And we modify the SelectionChanged event to do nothing when a shape was added. This event will be called when a shape is selected, AND when a shape is added (which indeed selects it). The code:

private void Application_SelectionChanged(Visio.Window Window)
{
    if (! _isAddingAShape)
    {
        ActionsRibbon rib = Globals.Ribbons.ActionsRibbon;
        Visio.Selection selection = Window.Selection;
        foreach (dynamic item in selection)
        {
            Visio.Shape shp = item as Visio.Shape;
            if (shp != null)
            {
                shp.Characters.Text = "selected";
                shp.CellsSRC[(short)Visio.VisSectionIndices.visSectionObject, 3, 0].FormulaU = $"THEMEGUARD(RGB({rib.Red}, {rib.Green}, {rib.Blue}))";
            }
        }
    }
    _isAddingAShape = false;
}

Conclusion

Using Visual Studio it took us about 100 lines of code to implement the desired behavior. It would not be hard to add more actions to the ribbon. But in that case it will be wise to move the code to perform the action into the ActionRibbon class instead of using the values of this class in the AddIn class.

The hardest part was to find how to obtain the newly created Ribbon from within the AddIn class. All the rest is just a straightforward implementation.

Posted in .Net, Codeproject, Development, Office Development | Tagged , | Leave a comment

Architecture of a Polyglot Azure Application

Introduction

I started working on a C# project that will communicate requests to several different partners, and receive feedback from them. Each partner will receive requests in their own way. This means that sending requests can (currently) be done by

  • calling into a REST service,
  • preparing a file and putting it on an FTP share,
  • sending a mail (SMTP).

Needless to say that the formats of these requests are never the same, which complicates things even more.

Receiving feedback can also be done in different ways:

  • receiving a file through FTP. These files can be CSV files, JSON files, XML files, each in their own format,
  • polling by calling a web service on a schedule.

So we need an open architecture that is able to send a request, and store the feedback received for this request. This feedback consists of changes in the states for a request. I noticed that this is a stand-alone application, that can easily be moved into the cloud. We use Microsoft Azure.

Here is a diagram for the current specifications:

Current specifications

First observations

When I analyzed this problem, I noticed immediately some things that could make our lifes easier. And when I can make things simpler, I’m happy!

The current flow

Currently everything is handled in the same application, which is just a plain simple C# solution. In this solution a couple of the protocols are implemented. This is OK because currently there are only 2 partners. But this will be extended to 20 partners by the end of the year.

There are adapters that transform the request into the right format for the corresponding partner, and then send it through a REST service. So we already have a common format to begin with. If the “PlaceOrder” block can receive this common format we know at least what comes in. And we know what we can store in the “Feedback Store” as well; this will be a subset of the “PlaceOrder request.”

PlaceOrder then will have to switch per partner to know in which data format to transform the request, and send it to that partner.

On the feedback side, we know that feedback comes in several formats, over several channel types. So in the feedback handler we need to normalize this data so that we can work with it in a uniform way. Also, some feedback will come as a file (SFTP) with several feedback records; or per one record (for example when polling). This needs to be handled as well.

So now we can think about some more building blocks. The green parts are new:

image

  • The “Initiatior Service” will receive a request from the application (and in the future from multiple applications). All it will do is transforming the request into a standard format and putting it on the “Requests Queue“. Some common validations can be done here already. Creating a separate service allows future applications to use the application as well.
  • We introduce the “Request Queue”, which will receive the standardized request.
  • And now we can create the “PlaceOrder queue handler” which will wake up when a request arrives on the queue, and then handles all the messages on the queue.

Advantages of adding queues

  • Separation. A nice (and simple) separation between the caller (Application -> “Initiator Service“) and the callee (the “PlaceOrder Queue Handler“).
  • Synchronization. In the queue handler we only need to bother about 1 request at a time. Everything is nicely synchronized for us.
  • Elasticity. When needed we can add more Queue Handlers. Azure can handle this automatically for us, depending on the current queue depth.
  • Big loads will never slow down the calling applications, because all they have to do is to put a message on the queue. So each side of the queue can work at its own pace.
  • Testing. Initiating the Queue Handler means putting a message on the queue. This can be done using tools such as the Storage Explorer. This makes testing a lot easier.
    • Testing the “Initiator Service“: call the service with the right parameters, and verify if the message on the Request Queue is correct.
    • Testing the “Queue Handler“: put in some way (ex: storage explorer) a request in the correct format on the queue and take it from there.
    • Both are effectively decoupled now.

We can do the same for the feedback handler. Each partner application can receive feedback in its own way, and then send the feedback records one by one to the Feedback Queue in a standard format. This takes away a lot of the complexity again. The feedback programs just need to interpret the feedback from their partner and put it in the right format on the Feedback Queue. The Feedback Queue Handler just needs to handle these messages one by one.

To retrieve the feedback status we’ll need a REST service to answer to all the queries. You’ll never guess the name of this service: “Feedback Service“. I left this out of scope for this post. In the end it is just a REST service that will talk to the data store via the “Repository Service.”

I also don’t want to access the database directly, so a repository service is created as well. Again, this is a very simple service to write.

But there is still a lot of complexity

image

The “Place Order Queue Handler” handles each request by formatting the message and sending it to the specific partner. Having this all in 1 application doesn’t seem wise because

  • This application will be quite complex and large
  • When a new partner needs to receive calls we need to update (and test, deploy) this application again.
  • This is actually what we do currently, so there would be little or no advantage in putting all this effort into it if we stopped here.

So it would be nice to find a way to extend the application by just adding some assemblies in a folder. The first idea was to use MEF for this. Using MEF we can dynamically load the modules and use them, effectively splitting out the complexity per module. Again, each module has only 1 responsibility (formatting & sending the request).

The same would apply (more or less) for the feedback part.

But thinking a bit further, I realized that this is actually nothing but a workflow application (comparable to BizTalk). And Azure provides us with Logic Apps, which are created to handle workflows. So let’s see how we can use this in our architecture…

image

I left out the calling applications from this diagram. A couple of things have been modified:

  • DLQ. For each queue I also added a Dead Letter Queue (DLQ). This is sometimes also called a poison queue. The “Initiator Service” puts a request on the queue to be handled. But if the Queue Handler has a problem (for example, the Partner web service sends back a non-recoverable error code), we can’t let the Initiator Service know that. So we’ll put those failed messages on the DLQ to be handled by another application. A possible handling could be to send an e-mail to a dedicated address to resolve the problem manually.
  • Logic App. The “Request Q Handlernow is a Logic App. This is a workflow that can be executed automatically by Azure when a trigger is fired. In our case the trigger is that one or more requests are waiting on the “Request Queue.” In this post I won’t go into detail into the contents of this Logic App, but this is the main logic:
    • Parse the content of the request message as JSON
    • Store the relevant parts of the message in the database with a “Received” status.
    • Execute the partner specific logic using Logic App building blocks, and custom made blocks.
    • Update the status of the request in the database to “Sent”
    • When something goes wrong put the request on the DLQ.
  • Configuration. The nice thing is that this all can be done using available building blocks in the Logic App, so no “real” programming is needed – only configuration. Adding a new partner requires just adding a new branch in the switch and implementing the partner logic.
  • The database is accessed by a REST service, and there are Logic actions that can communicate with a REST service. So accessing the database can be done in a very standard way.

The feedback part is also a bit simpler now

  • One Logic App will poll every hour for those partners who work like that. This means that this App will have a block per “polling partner” which will retrieve the states for the open requests, transform them into a standard format and put them in the Feedback Queue. So the trigger for this Logic App is just a schedule.
  • Some partners communicate their feedback by putting a file on an FTP location. This is the trigger and the handling is a bit different:
    • Interpret the file contents and transform them into JSON.
    • For each row in the JSON collection execute the same logic as before.
    • Delete the file.
    • Again, these are all existing blocks in a Logic App, so no “real” programming!

The “Feedback Q handler” is again simple. Because the FTP Logic Apps (Notice the plural!) make sure that the feedback records are stored one by one on the “Feedback Queue“, all we have to do is to update the status in the database, and possibly execute a callback web service.

Conclusion

Thanks to MS Azure I was able to easily split the application in several small blocks that are easy to implement and to test. In the end we reduced a programming problem to a configuration problem. Of course some programming remains to be done, for example the “Repository Service” and possibly some other services to cover more exotic cases.

Posted in Analysis, Architecture, Azure, Cloud, Codeproject, Development | Tagged , | 1 Comment

Areas in ASP.NET Core

Introduction

In a default MVC application everything is organized by Controllers and Views. The controller name determines the first part of your URL, and the controller method the second part. By default the view that will be rendered has the same name as the method in the controller, although this isn’t required.

So when you create a HomeController in your application, with a method About( ) you have defined the URL Home/About for your application. Easy. For small web applications this is sufficient, but when things start to get bigger you may want to add another level in your URL.

Areas

imageThis is done by creating separate areas in the application. You can create as many areas as you like, and you can consider each area as a separate part of your application, with its own controllers, views and models. So now you can make an “Admin” area for user management and other “admin stuff.” The nice thing is that now your URL will have an additional level, such as Admin/Users/Create.

This allows organizing your project in a logical way, but don’t exaggerate. I have seen projects where areas only contain 1 controller. In that case the advantage of using an area is gone, and worse yet: you haven’t simplified your application, but added an extra layer for nothing. The KISS principle is still one of the most important principles in software engineering!

The problem

In the old ASP.NET MVC all you have to do is

  1. Right-click on the project level, select “Add area”
  2. Enter the name of the area
  3. Everything is done for you: you get a nice solution folder for your area, routing is set up, …

Looking for this menu item in ASP.NET Core was disappointing, it is not there anymore. I couldn’t imagine that areas would have disappeared, so I consulted my friend Google. This led me to this page in Microsoft Docs: https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/areas.

So how does it work in ASP.NET Core?

I want to create a new area in my application, called “Reports”. We already know that right-clicking doesn’t work anymore, so here are the steps.

Create a folder “Areas”

imageRight-click on the project > Add > Folder, enter “Areas”.

MVC will by default search for /Areas/… which is why you need this folder. If you want to give it a different name you also need to configure your RazorViewEngineOptions. More info on https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/areas.

Now right-click the Areas folder and add a new folder called “Reports”. And under Reports, create the 3 folders “Controllers”, “Models” and “Views”.

Caveat

The views under your area don’t share the _ViewImports.cshtml and _ViewStart.cshtml. This means that your site layout will not be automatically applied to your area’s views. The solutions is simple: copy both files under the corresponding Views folder.

The standard _ViewStart.cshtml looks like this:

@{
    Layout = "_Layout";
}

If you want to use the same layout in your areas you should change the copied file to

@{
    Layout = "~/Views/Shared/_Layout.cshtml";
}
Of course, if you want your area to have a different layout you don’t have to do this. You can then create a “Shared” folder under the Views folder and create a new _Layout.cshtml there.

We’re ready to add some code now.

Create a HomeController in the Reports Area

Right-click on the newly created Controllers folder > Add > Controller. This takes you to the known “Add Scaffold” dialog box; we choose to add an empty controller.

image

Name the controller “HomeController” and let VS2017 do its scaffolding work for you. We now have a controller with the Index( ) method already implemented. This controller is created under the areas folder structure, but for ASP.NET Core this isn’t enough. We need to indicate which area it belongs to. This is easy:

image

I added line 11, which does the trick. This means that areas and folder structure are now decoupled.

As you notice I also changed the return type to string on line 12, and on line 14  I return … a string Winking smile.  This string will be literally returned to the browser when this page is requested. Of course we could have gone through the trouble of creating a view, but let’s keep things simple in this demo.

Inform ASP.NET Core that areas are involved

MVC determines which controller to instantiate, and which method in the controller to call by means of routing. Routing is implemented by templated routing tables, as you can see below. By default there is 1 route template defined:

routes.MapRoute(
    name: "default",
    template: "{controller=Home}/{action=Index}/{id?}");

In the template we see {Controller=Test}, which will interpret the URL (ex: http://localhost:12345/Test/index). Test is now used to determine that the class TestController must be instantiated. The second part is easy to explain too: the method Index( ) will be called, and that’s how routing works basically.

When we start the site we don’t want (our users) to type http://localhost:12345/Home/Index, which is why a default value is foreseen: when we just type http://localhost:12345 the default HomeController will be instantiated, and the default Index( ) method will be called.

URLs are mapped against the routes in the routing table, and the first match will be used. This means that the “areaRoute” (in yellow below) best comes first in the routing table. This is all defined in the Startup.cs file in the project folder. Go to the Configure method and find where the routes are mapped. Add the lines in yellow:

image

Now we can try if our work has paid off:

  1. Start the application (ctrl + F5). This will show the default home page (no areas involved).
  2. Browse to http://localhost:12345/Reports/Home/Index. Of course 12345 depends on your configuration. We now see the string that we returned from the area controller. And of course http://localhost:12345/Reports/Home/ and http://localhost:12345/Reports/ return the same, because Home and Index are indicated as default values in the route mapping (lines 54 and 55).

Generating a link to the Reports/Home controller

Somewhere in the application we’ll want to refer to the newly created controller. This is typically done from the _Layout.cshtml view; which serves as a template for all your pages. By default a top menu is created for easy navigation between your pages.

We don’t want to hard-code links, because then part of the advantage of using the MVC framework disappears (and we have to think about the link, which always provides room for error). In the navigation we find links like this:

<ul class="nav navbar-nav">
    <li><a asp-area="" asp-controller="Home" asp-action="Index">
		Home
	</a>
    </li>
    <!--   other links  -->
</ul>

The TagHelpers clearly indicate the intend of this line: a link to Home/Index is created.

So for our Reports home page we just need to fill in the area :

<li><a asp-area="Reports" asp-controller="Home" asp-action="Index">
	Reports
    </a>
</li>

This will do the trick. We have created a new (top-level) menu that will open our great Reports page. The link will be http://localhost:12345/Reports. The /Home/Index part is left out because MVC knows from its routing tables that these are default values.

Conclusion

Adding an area is slightly more complex now, but the documentation was quite clear. I will need to do this more than once, hence this post Smile

References

https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/areas

https://docs.microsoft.com/en-us/aspnet/core/mvc/views/tag-helpers/intro

https://docs.microsoft.com/en-us/aspnet/core/mvc/views/layout

Posted in .Net, Codeproject, Development, MVC, Web | Tagged , , , | 4 Comments

Automating the creation of NuGet packages with different .NET versions

Introduction

Image result for +nugetI created a couple of straightforward libraries to be used in almost every project. So evidently these libraries are a good candidate for NuGet. This will decouple the libraries from the projects that they are used in. It also forces the Single Responsibility principle because every NuGet package can be used on its own, with only dependencies on (very few) other NuGet packages.

Creating the packages for one version of .NET is quite straightforward, and well-documented. But the next request was: “can you upgrade all our projects from .NET 4.5.2 to .NET 4.6.1, and later to .NET  4.7?”.

The plan

We have over 200 projects that preferably all are compiled in the same .NET version. So clicking each project open, change it’s version, and compile it is not really an option…

<PropertyGroup>
  <Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
  <Platform Condition=" '$(Platform)' == '' ">AnyCPU</Platform>
  <ProjectGuid>{21E95439-7A66-4C75-ACC5-1B9A5FF4A32D}</ProjectGuid>
  <OutputType>Library</OutputType>
  <AppDesignerFolder>Properties</AppDesignerFolder>
  <RootNamespace>MyProject.Clients</RootNamespace>
  <AssemblyName>MyProject.Clients</AssemblyName>
  <TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
  <FileAlignment>512</FileAlignment>
  <TargetFrameworkProfile />
</PropertyGroup>

image

Investigating the .csproj files I noticed that there is 1 instance of the <TargetFrameworkVersion> element that contains the .NET version. When I change it, in Visual Studio the .NET version property is effectively changed. So this is easy: using Notepad++ I replace this in all *.csproj files and recompile everything. This works but …

What about the NuGet packages?

The packages that I created work for .NET 4.5.2, but now we’re at .NET 4.6.1. So this is at least not optimal, and it will possibly not link properly together. So I want to update the NuGet packages to contain both versions. That way developers who are still at 4.5.2 with their solutions will use this version automatically, and developers at 4.6.1 too. Problem solved.  But how …

Can this be automated?

Creating the basic NuGet package

This is quite good explained on the nuget.org website. These are the basic steps:

Technical prerequisites

Download the latest version of nuget.exe from nuget.org/downloads, saving it to a location of your choice. Then add that location to your PATH environment variable if it isn’t already.
Note:  nuget.exe is the CLI tool itself, not an installer, so be sure to save the downloaded file from your browser instead of running it.

I copied this file to C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7Tools, which is already in my PATH variable (Developer command prompt for VS2015). So now I have access to the CLI from everywhere, provided that I use the Dev command prompt of course.

So now we can use the NuGet CLI, described here.

Nice to have

https://github.com/NuGetPackageExplorer/NuGetPackageExplorer

From their website:

NuGet Package Explorer is a ClickOnce & Chocolatey application that makes it easy to create and explore NuGet packages. After installing it, you can double click on a .nupkg file to view the package content, or you can load packages directly from nuget feeds like nuget.org, or your own Nuget server.

This tool will prove invaluable when you are trying some more exotic stuff with NuGet.

It is also possible to change a NuGet package using the package explorer. You can change the package metadata, and also add content (such as binaries, readme files, …).

image

Prerequisites for a good package

An assembly (or a set of assemblies) is a good candidate to be a package when the package has the least dependencies possible. For example a logging package would only do logging, and nothing more. Like that NuGet packages can be used everywhere, without special conditions. When dependencies are necessary, then they are preferably on other NuGet packages.

Creating the package

In Visual Studio, create a project of your choice. Make sure that it compiles well.

Now open the DEV command prompt and enter

nuget spec

in the folder containing your project file. This will generate a template .nuspec file that you can use as a starting point. This is an example .nuspec file:

<?xml version="1.0"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd">
  <metadata>
    <!-- The identifier that must be unique within the hosting gallery -->
    <id>Diagnostics.Logging</id>

    <!-- The package version number that is used when resolving dependencies -->
    <version>1.1.0</version>

    <!-- Authors contain text that appears directly on the gallery -->
    <authors>Gaston</authors>

    <!-- Owners are typically nuget.org identities that allow gallery
         users to early find other packages by the same owners.  -->
    <owners>Gaston</owners>

    <!-- License and project URLs provide links for the gallery -->
<!--
    <licenseUrl>http://opensource.org/licenses/MS-PL</licenseUrl>
    <projectUrl>http://github.com/contoso/UsefulStuff</projectUrl>
-->
    <!-- The icon is used in Visual Studio's package manager UI -->
<!--
    <iconUrl>http://github.com/contoso/UsefulStuff/nuget_icon.png</iconUrl>
-->
    <!-- If true, this value prompts the user to accept the license when
         installing the package. -->
    <requireLicenseAcceptance>false</requireLicenseAcceptance>

    <!-- Any details about this particular release -->
    <releaseNotes>Added binaries for .NET 4.6.1</releaseNotes>

    <!-- The description can be used in package manager UI. Note that the
         nuget.org gallery uses information you add in the portal. -->
    <description>Logging base class </description>
    <!-- Copyright information -->
    <copyright>Copyright ©2017</copyright>

    <!-- Tags appear in the gallery and can be used for tag searches -->
    <tags>diagnostics logging</tags>

    <!-- Dependencies are automatically installed when the package is installed -->
    <dependencies>
      <!--<dependency id="EntityFramework" version="6.1.3" />-->
    </dependencies>
  </metadata>

  <!-- A readme.txt will be displayed when the package is installed -->
  <!--
  <files>
    <file src="readme.txt" target="" />
  </files>
  -->
</package>

Now run

nuget pack

in your project folder, and a Nuget package will be generated for you.

Verifying the package

If you want to know if the contents of your package are correct, use Nuget Package Explorer to open your package.

image

Here you see a package that I created. It contains some meta data on the left side, and the package in 2 versions on the right side. You can use this tool to add more folders and to  change the meta data. This is good and nice, but not very automated. For example, how can we create a Nuget package like this one, that contains 2 .NET versions of the libraries?

Folder organization

I wanted to separate the creation of the package from the rest of the build process. So I created a NuGet folder in my project folder.

I moved the .nuspec file into this folder, to have a starting point and then I created a batch file that solved the following problems:

  1. Create the necessary folders
  2. Build the binaries for .NET 4.5.2
  3. Build the binaries for .NET 4.6.1
  4. Pack both sets of binaries in a NuGet package

I also wanted this package to be easily configurable, so I used some variables.

The script

Initializing the variables

set ProjectLocation=C:\_Projects\Diagnostics.Logging
set Project=Diagnostics.Logging

set NugetLocation=%ProjectLocation%\NuGet\lib
set ProjectName=%Project%.csproj
set ProjectDll=%Project%.dll
set ProjectNuspec=%Project%.nuspec
set BuildOutputLocation=%ProjectLocation%\NuGet\temp

set msbuild="C:\Program Files (x86)\MSBuild\14.0\bin\msbuild.exe"
set nuget="C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\nuget.exe"

The 2 first variables are the real parameters. All the other variables are built from these 2 variables.

The %msbuild% and %nuget% variables allow running the commands easily without changing the path. Thanks to these 2 lines this script will run in any “DOS prompt”, not just in the Visual Studio Command Prompt.

Setting up the folder structure

cd /d %ProjectLocation%\NuGet
md temp
md lib\lib\net452
md lib\lib\net461
copy /Y %ProjectNuspec% lib
copy /Y readme.txt lib

imageIn my batch file I don’t want to rely on the existence of a specific folder structure, so I create it anyway. I know that I can first test if a folder exists before trying to create it, but the end result will be the same.

Notice that I created Lib\Lib. The first level contains the necessary “housekeeping” files to create the package, the second level will contain the actual content that goes into the package file. The 2 copy statements copy the “housekeeping” files.

Building the project in the right .NET versions

%msbuild% "%ProjectLocation%\%ProjectName%" /t:Clean;Build /nr:false /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.5.2
copy /Y "%BuildOutputLocation%"\%ProjectDll% "%NugetLocation%"\lib\net452\%ProjectDll%

%msbuild% "%ProjectLocation%\%ProjectName%" /t:Clean;Build /nr:false /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.6.1
copy /Y "%BuildOutputLocation%"\%ProjectDll% "%NugetLocation%"\lib\net461\%ProjectDll%

The secret is in the /p switch

When we look at a .csproj file we see that there are <PropertyGroup> elements with a lot of elements in them, here is an extract :

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="12.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <Import Project="$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props" Condition="Exists('$(MSBuildExtensionsPath)\$(MSBuildToolsVersion)\Microsoft.Common.props')" />
  <PropertyGroup>
    <!--  …   -->
    <OutputType>Library</OutputType>
    <!--  …   -->
    <TargetFrameworkVersion>v4.6.1</TargetFrameworkVersion>
    <!--  …   -->
  </PropertyGroup>

Each element under the <PropertyGroup> element is a property that can be set, typically in Visual Studio (Project settings). So compiling for another .NET version is as simple as changing the <TargetFrameworkVersion> element and executing the build.

But the /p flag makes this even easier:

%msbuild% "%ProjectLocation%\%ProjectName%" 
          /t:Clean;Build /nr:false 
          /p:OutputPath="%BuildOutputLocation%";Configuration="Release";Platform="Any CPU";TargetFrameworkVersion=v4.5.2

In this line MSBuild is executed, and the properties OutputPath, BuildOutputLocation, Release, Platform and TargetFrameworkVersion are set using the /p switch. This makes building for different .NET versions easy. You can find more information about the MSBuild switches here.

So now we are able to script the compilation of our project in different locations for different .NET versions. Once this is done we just need to package and we’re done!

cd /d “%NugetLocation%”
%nuget% pack %ProjectNuspec%

Conclusion

We automated the creation of NuGet packages with an extensible script. In the script as much as possible is parameterized so it can be used for other packages as well.

It is possible to add this to your build scripts, but be careful to not always build and deploy your NuGet packages when nothing has changed to them. This is a reason that I like to keep the script handy and run it when the packages are modified (and only then).

References

MSBuild Reference

NuGet CLI Reference

Posted in .Net, Architecture, Codeproject, Development | Tagged | Leave a comment

Implementing the Excel Simulator in F#

Introduction

In my previous post we talked about how to structure an Excel workbook to make it easy to perform changes. As a side effect we have now a good idea of what is the data, and what is the functionality that the workbook implements. This allows us to replace the workbook by an equivalent F# program, which was our “hidden agenda” all along 😉

What we want to achieve:

  • high-level: use a set of input parameters to generate a data set of output records
  • mid-level: some calculations can already be done on the input parameters without the corresponding records in the database. We want to separate these out.
  • low-level: we want to read data from the database to lookup some values, to filter the rows that we need and to transform them into the output records.
  • We will need to implement some Excel functionality, for example the VLOOKUP function.

I am not an expert in F# (yet), so feel free to drop comments on how this can be done better or different.

Attack Plan

There are 2 main ways to start this journey: top-down or bottom-up. Actually there is also a third way in which we work top-down and bottom-up at the same time, to meet in the middle. This may be what we need here.

First helper functions (aka bottom-up)

let round100 (x:decimal) = // ROUND(i_ConsJr*r.NormalPricePerkWh/100;2)
    let y = System.Math.Round x 
    y / 100M
This is an easy function to start with: round a decimal on 2 digits after the decimal point. This function is used quite some times, and it is very easy to write, so why not use it 😉
let n2z (x:Nullable<decimal>) =
    if x.HasValue then x.Value else 0M

let n2zi (x:Nullable<int>) =
    if x.HasValue then x.Value else 0

let n2b (x:Nullable<bool>) =
    if x.HasValue then x.Value else false
Another set of easy functions to cope with database NULL values. Very simple, but useful!
let rec LookupNotExact ls f v =
    match ls with
    | [x] -> x
    | h::t::s -> 
        if f t > v then h
        else LookupNotExact (t::s) f v
    | [] -> raise (OuterError("LookupNotExact over empty list"))

(* Tests for LookupNotExact
let testls1 = [1;2;3;4;5]
let res1_3 = LookupNotExact testls1 (fun x -> x) 3
let res1_5= LookupNotExact testls1 (fun x -> x) 7

let testls2 = [1;2;3;6;7]
let res2_3 = LookupNotExact testls2 (fun x -> x) 3
let res2b_3 = LookupNotExact testls2 (fun x -> x) 5
let res2_7 = LookupNotExact testls2 (fun x -> x) 7
let res2b_7 = LookupNotExact testls2 (fun x -> x) 9
*)

The LookupNotExact function mimics the behavior of the Excel VLookup function. It finds in a sorted list the first value that is greater or equal than v. The nice thing is that this function can easily be tested using F# interactive. Just remove the comment from the tests , select the function with its tests and hit alt+enter. This will execute the selected code and display the results in the F# interactive window.

Some data structures

The next data structures serve only to make the code more readable. We could do without them just as easy. Some examples:

type Currency = decimal

type GasVolume =
    | KWH of decimal
    | M3 of decimal

type Languages = NL |FR
Using Currency instead of decimal makes it easy to see what is the purpose of a variable. It takes away the guessing about what a variable holds. Technically it is not different from decimal.
The gas volume can be expressed in either KWH or cubic meters. That is what we see in this data type. Again using 2 different constructors make clear what we are dealing with.
Languages is just an enumerator of 2 languages, as we would do in C#.
With these functions in place (and then some more boring ones) we can start to emulate the Excel formulas that we need. I’m not going into detail on this because I don’t want law suits 😉

Converting to CSV

In the end the results are exported. We export the values as a CSV file, which can be easily read back into Excel (for validation purposes). This will involve some reflection, here is the code:

module Csv

let ref = box "#REF!"

// prepare a string for writing to CSV            
let prepareStr obj =
    if obj = null then "null"
    else
        obj.ToString()
            .Replace("\"","\"\"") // replace single with double quotes
            |> sprintf "\"%s\""   // surround with quotes

let combine s1 s2 = s1 + ";" + s2   // used for reducing

let mapReadableProperties f (t: System.Type) =
    t.GetProperties() 
        |> Array.filter (fun p -> p.CanRead)
        |> Array.map f
        |> Array.toList

let getPropertyvalues x =
    let t = x.GetType()
    t   |> mapReadableProperties (fun p -> 
            let v = p.GetValue(x)
            if v = null then ref else v
            )

let getPropertyheaders (t: System.Type) =
    t   |> mapReadableProperties (fun p -> p.Name)
        |> Seq.map prepareStr
        |> Seq.reduce combine

let getNoneValues (t: System.Type) =
    t   |> mapReadableProperties (fun p -> ref)

let toCsvString x =
    x |> getPropertyvalues
      |> Seq.map prepareStr
      |> Seq.reduce combine
Let’s start with the ToCsvString function. It almost says what it does:
  • Get the property values from x (which is the part using reflection).
  • Each property value is mapped to a good CSV value (if it contains a double quote, then the double quote will be doubled, surround the value by double quotes)
  • Everything is combined in a comma-separates string using the Seq.reduce method.

The other functions are quite easy to understand as well.

The actual program

let main  = 
    let newOutput = Calculate myInput

    // output
    let newOutputHeaders = newOutput |> List.head |> newOutputPropertyHeaders
    let newOutputCsv = newOutput |> List.map newOutputPropertyValues

    System.IO.File.WriteAllLines(@"C:\temp\NewOutput.csv", 
	newOutputHeaders :: newOutputCsv);

    printfn "Find the output in %s" @"C:\temp\NewOutput.csv"

    printfn "Press enter to terminate..."
    Console.ReadLine() |> ignore

The program is composed of some simple statements that use the functions that we previously described. This makes the program very easy to read. Not much explanation is needed, but here goes:

  • newOutput will contain the result of the calculations using the input. This is the main purpose of the program. If this were implemented as a service, newOutput would be returned and that’s it.
  • For debugging purposes we output this as a CSV file, using the functions in the Csv module.

Conclusion

Writing this simulation as an F# program was not too hard. Immutability is baked into the FP paradigm, which was perfect for this case. So you could say that this is a nice match.

The Excel workbook itself is quite complex (and big). It is hard to maintain and to extend. The F# code on the other hand is quite readable. A nice (and unexpected) side-effect is that now we understand much better what goes on in the Excel, which helps us to maintain the Excel for as long as it is still used. Another nice thing is that the (non-technical) end-used is able to understand the F# code (with some explanation).

 

Posted in Codeproject | 2 Comments

Structuring your Excel – “the hidden agenda”

Introduction

Most developers don’t like Excel as a “development platform”. Many projects are migration projects from an Excel solution to a “real application”. And often this is worth the trouble. But in some cases Excel has a lot of advantages.

An Excel workbook can be set up in many ways, and it usually starts off very small, to end in a big spaghetti where nobody knows what is going on. And nobody dares to change a thing. Sounds like a typical spaghetti .NET (or enter your favorite language) project. So let’s try to make our Excel project manageable.

Structuring your Excel

Every workbook starts with the first sheet. And initially there are some cells that we use as input, and then some cells are used as output. If you want to keep a nice overview of what is happening is is worth creating separate sheets for separate concerns.

image

Input sheet(s)

Use this sheet to gather all the input parameters from your users. This can be a simple sheet. In the end this us the user interface to your workbook, so make it easy for users to enter data. Use data validation and lookups to limit errors. The input sheets should be the only thing that can be modified by the end-users.

Enriching the input sheet

imageYou can also do some simple calculations that are local to the input sheet. For example adding 2 input fields together to see their total may be handy here. This also gives immediate feedback to the user. Looking up the city that goes with a ZIP code is another good example. I know that most of us have a problem remembering the syntax of “lookup” in Excel, hence the link Winking smile.

Depending on the nature of your applications there can be 1 or more input sheets. For example a simulation using some parameters will typically contain 1 input sheet, where an accounting application (with cash books, bank books, …) may contain multiple input sheets.

Principle: all the calculations that only concern the input data can be done already here. Using some data from the datasheets (as static data) can also be done here. This will give us the first intermediate results.

Output sheet(s)

imageThese sheets will contain the output for your users. The sheets should not contain calculations, only formatting. They will present the results from the calculation sheet(s). Of course formatting means changing fonts, colors, … and also cell formats.

Principle: This sheet contains no calculations, only formatting. Everything should be calculated in the calculations sheets already.

Data sheet(s)

imageYour workbook will probably need some data, and if you’re lucky this data is in a structured format. This data serves as static input for your calculations.

Often you will want to calculate some results per data row. This can be done in a separate sheet per data sheet that will contain all the necessary calculations that are only using data from that data sheet. Eventually you will want this data to be stored in a database to be able to access it from many applications (think reporting, for example).

To accommodate for this you can create the data sheets to contain only raw data, and then

  • Add columns to the raw data that contain calculations on the data per row. Maybe you have some fields to be added already in the data that you’ll need later, some Lookups to do. All that does not involve the input data can be done here. Make sure you put the calculations away from the data (separate them by some empty columns for later expansion of the raw data). It may also be a good idea to use colors to indicate which are the calculated fields.
  • Add a new sheet that will contain all the calculations for this data sheet that depend on the input parameters. This sheet will calculate intermediate results that are needed later.

Principle: Separate data from calculations, either by adding calculated columns at the end of the raw data, or by adding dedicated sheets to calculate these intermediate results.

Calculation sheet(s)

This is where you perform more complex calculations. Some simple calculations can be done on the input sheets and the data sheets already, but all the complex work should happen here. The calculation sheets will use the (calculated) data from many sheets and combine this data to become the final (unformatted) results. Because all the simple calculations have been done in the input- and datasheets, the calculation sheet is just consolidating this data.

Principle: This is where the sheets in the workbook are combined to produce the end result. The local calculations per sheet are done already, so that only the consolidation logic remains.

Keep your calculations as local as possible

In the proposed structure it is easy to see that calculations are done in the input sheet over only the input data (and maybe some lookups using the data sheets). Calculations over the data is done in a separate sheet per data sheet.

It is only in the calculation sheets that data will be combined from different sheets. In this way the workbook remains manageable, even when it grows bigger.

Structuring your workbook like this will also make intermediate results visible. These intermediate results will be calculated only once, and can be used everywhere. This has a couple of advantages:

  • The workbook is simpler because the intermediate results are calculated as close as possible to their data.
  • It is easier to create more complex formulas when the intermediate results are calculated already. Instead of trying to squeeze everything in 1 formula there are now building blocks available to be used.
  • The results become more consistent. The formulas for calculating the intermediate results are not repeated all over the workbook. So there is less of a risk that in some instances a formula is (slightly) different. This will prevent hard to solve errors.
  • The results are easy to test. The intermediate results are simple, so verifying if they are correct is easy. The more complex formulas are using the intermediate results as building blocks, so they become much easier to test too. If something goes wrong it is easier to track back where it went wrong. This can be seen as a kind of unit testing.

Did you notice how I don’t care about performance in this case? That is because the other advantages outweigh this by far. But of course there could be a performance gain as well.

Named cells and regions

This is an easy one. Excel allows to name cells or ranges of cells. Try to use a naming convention, for example:

  • Cells containing input fields will have a name starting with I_  (for example I_Zip code)
  • Ranges containing data for lookups can start with l_  (for example l_locations)
  • Cells containing intermediate results can start with c_  (for example c_TotalConsumption)

In this way the purpose of each cell or range is clear.

An additional advantage is that you can now move the named cells and ranges elsewhere in the workbook if you need to. As long as the new zone gets the same name, all the formulas referring to it will still work.

Excel as a functional language

If you look closely at an Excel workbook, then you’ll notice that the cells either contain input values which can be modified by the users, or output values. The output values are obtained by calculating formulas in various sheets, and with many dependencies. But each formula contains functions that have no side effects (unless you start to use non-deterministic functions of course).

So calling a function will never do something bad to your input data. It will of course update the cell which contains the formula, in a deterministic way.

Conclusion – Our hidden agenda

Once the workbook is properly structured it becomes easy to separate the data from the functions. The functions are now very clear and easy to implement in some programming language. The names ranges can be the names of variables in your program, making the match easy.

We use the Excel workbook here as input – processing – output, which can be perfectly done in a Functional language, such as F#. You will end up with data coming from the database, input parameters and functions to be applied. This will then give the end results. More on this in a later post.

Posted in Architecture, Codeproject | Tagged | 5 Comments