Most developers don’t like Excel as a “development platform”. Many projects are migration projects from an Excel solution to a “real application”. And often this is worth the trouble. But in some cases Excel has a lot of advantages.
An Excel workbook can be set up in many ways, and it usually starts off very small, to end in a big spaghetti where nobody knows what is going on. And nobody dares to change a thing. Sounds like a typical spaghetti .NET (or enter your favorite language) project. So let’s try to make our Excel project manageable.
Structuring your Excel
Every workbook starts with the first sheet. And initially there are some cells that we use as input, and then some cells are used as output. If you want to keep a nice overview of what is happening is is worth creating separate sheets for separate concerns.
Use this sheet to gather all the input parameters from your users. This can be a simple sheet. In the end this us the user interface to your workbook, so make it easy for users to enter data. Use data validation and lookups to limit errors. The input sheets should be the only thing that can be modified by the end-users.
Enriching the input sheet
You can also do some simple calculations that are local to the input sheet. For example adding 2 input fields together to see their total may be handy here. This also gives immediate feedback to the user. Looking up the city that goes with a ZIP code is another good example. I know that most of us have a problem remembering the syntax of “lookup” in Excel, hence the link .
Depending on the nature of your applications there can be 1 or more input sheets. For example a simulation using some parameters will typically contain 1 input sheet, where an accounting application (with cash books, bank books, …) may contain multiple input sheets.
Principle: all the calculations that only concern the input data can be done already here. Using some data from the datasheets (as static data) can also be done here. This will give us the first intermediate results.
These sheets will contain the output for your users. The sheets should not contain calculations, only formatting. They will present the results from the calculation sheet(s). Of course formatting means changing fonts, colors, … and also cell formats.
Principle: This sheet contains no calculations, only formatting. Everything should be calculated in the calculations sheets already.
Often you will want to calculate some results per data row. This can be done in a separate sheet per data sheet that will contain all the necessary calculations that are only using data from that data sheet. Eventually you will want this data to be stored in a database to be able to access it from many applications (think reporting, for example).
To accommodate for this you can create the data sheets to contain only raw data, and then
- Add columns to the raw data that contain calculations on the data per row. Maybe you have some fields to be added already in the data that you’ll need later, some Lookups to do. All that does not involve the input data can be done here. Make sure you put the calculations away from the data (separate them by some empty columns for later expansion of the raw data). It may also be a good idea to use colors to indicate which are the calculated fields.
- Add a new sheet that will contain all the calculations for this data sheet that depend on the input parameters. This sheet will calculate intermediate results that are needed later.
Principle: Separate data from calculations, either by adding calculated columns at the end of the raw data, or by adding dedicated sheets to calculate these intermediate results.
This is where you perform more complex calculations. Some simple calculations can be done on the input sheets and the data sheets already, but all the complex work should happen here. The calculation sheets will use the (calculated) data from many sheets and combine this data to become the final (unformatted) results. Because all the simple calculations have been done in the input- and datasheets, the calculation sheet is just consolidating this data.
Principle: This is where the sheets in the workbook are combined to produce the end result. The local calculations per sheet are done already, so that only the consolidation logic remains.
Keep your calculations as local as possible
In the proposed structure it is easy to see that calculations are done in the input sheet over only the input data (and maybe some lookups using the data sheets). Calculations over the data is done in a separate sheet per data sheet.
It is only in the calculation sheets that data will be combined from different sheets. In this way the workbook remains manageable, even when it grows bigger.
Structuring your workbook like this will also make intermediate results visible. These intermediate results will be calculated only once, and can be used everywhere. This has a couple of advantages:
- The workbook is simpler because the intermediate results are calculated as close as possible to their data.
- It is easier to create more complex formulas when the intermediate results are calculated already. Instead of trying to squeeze everything in 1 formula there are now building blocks available to be used.
- The results become more consistent. The formulas for calculating the intermediate results are not repeated all over the workbook. So there is less of a risk that in some instances a formula is (slightly) different. This will prevent hard to solve errors.
- The results are easy to test. The intermediate results are simple, so verifying if they are correct is easy. The more complex formulas are using the intermediate results as building blocks, so they become much easier to test too. If something goes wrong it is easier to track back where it went wrong. This can be seen as a kind of unit testing.
Did you notice how I don’t care about performance in this case? That is because the other advantages outweigh this by far. But of course there could be a performance gain as well.
Named cells and regions
This is an easy one. Excel allows to name cells or ranges of cells. Try to use a naming convention, for example:
- Cells containing input fields will have a name starting with I_ (for example I_Zip code)
- Ranges containing data for lookups can start with l_ (for example l_locations)
- Cells containing intermediate results can start with c_ (for example c_TotalConsumption)
In this way the purpose of each cell or range is clear.
An additional advantage is that you can now move the named cells and ranges elsewhere in the workbook if you need to. As long as the new zone gets the same name, all the formulas referring to it will still work.
Excel as a functional language
If you look closely at an Excel workbook, then you’ll notice that the cells either contain input values which can be modified by the users, or output values. The output values are obtained by calculating formulas in various sheets, and with many dependencies. But each formula contains functions that have no side effects (unless you start to use non-deterministic functions of course).
So calling a function will never do something bad to your input data. It will of course update the cell which contains the formula, in a deterministic way.
Conclusion – Our hidden agenda
Once the workbook is properly structured it becomes easy to separate the data from the functions. The functions are now very clear and easy to implement in some programming language. The names ranges can be the names of variables in your program, making the match easy.
We use the Excel workbook here as input – processing – output, which can be perfectly done in a Functional language, such as F#. You will end up with data coming from the database, input parameters and functions to be applied. This will then give the end results. More on this in a later post.