Harder, better, faster, stronger: understanding and improving the tractability of large energy system models

Background Energy system models based on linear programming have been growing in size with the increasing need to model renewables with high spatial and temporal detail. Larger models lead to high computational requirements. Furthermore, seemingly small changes in a model can lead to drastic differences in runtime. Here, we investigate measures to address this issue. Results We review the mathematical structure of a typical energy system model


Introduction
Mathematical optimisation, in particular linear programming (LP), has become the method of choice for the majority of established and emerging energy system modelling tools [1][2][3][4].Renewable energy technologies need to be represented with a high spatio-temporal resolution and scope, to capture and account for the effect of their intermittency on system stability, to ensure that weather variability can be captured across years [5], and to exploit the balancing effect of geographically distant weather systems [6].This has led to the development of ever larger models [7][8][9].Model size increases further when using Monte-Carlo or scenario-based methods to deal with structural and parametric uncertainty [10][11][12][13].
[14] identified tractability as one key challenge for energy system optimisation models.Most commonly, temporal resolution is reduced [8,15,16] or time periods with similar features are clustered and represented only by a subset of 'typical days' [11,17,18].Advancements in the field of time-series aggregation include preselection of 'critical' days prior to clustering [19], identification of k-mediods as the most reliable aggregation algorithm [20,21], and separation of storage into multiple decision variables, to allow information to be linked within and between clustered periods [22,23].[24] summarised these methods, finding that existing feature aggregation using k-means, k-mediods, or hierarchical clustering is still the 'state-of-the-art'.However, they reiterated warnings made in several previous studies that time-series aggregation alters model results, both qualitatively and quantitatively, and should therefore be used with caution [19,21].Recent work [25] tries to improve tractability of energy system models on the algorithmic layer: they develop a parallel algorithm that exploits the inherent structure of energy system models to break up the problem into weakly related subproblems that can then be solved individually.
Similarly to [25], we take a step back from application-specific methods to reduce model complexity.Instead, we examine how the structure of the underlying optimisation problem affects its tractability, and empirically examine a range of options to improve tractability for typical energy system models.We also develop and test a method to facilitate tractability, in which we automatically scale the parameters of an energy system model.To do so, we draw on literature from the broader field of operations research [26][27][28].The paper proceeds as follows.First, we discuss the mathematical properties of a typical high-resolution energy system model and compare the characteristics of the two main solution methods: simplex and barrier (interior point).We identify scaling of the optimisation problem as a particular area of concern for the performance of these methods and introduce a method to automatically scale an energy system model.We then proceed with a series of sys-tematic experiments investigating different solver configurations, examining whether we can save computational effort by settling for a potentially less accurate solution, and testing the performance of our scaling method.We then discuss what general guidelines for energy system modelling we can draw from these experiments.We use the open-source Calliope modelling tool [29] for our experiments.

Background
A high-level view of the problem we want to solve is the following: Given the geographically distributed consumption and production of "goods" such as electricity or heat (from now on referred to as 'carriers'), a network over which these carriers can be transferred, and the ability to store and convert between carriers, we want to find the cheapest way to allocate these carriers over time and between locations such that all demand is met.In other words, we want to solve a minimum-cost flow problem with multiple commodities.In practice we find that network flow problems are not expressive enough to formulate realistic energy system models.In particular, imposing policy-based constraints on carrier consumption and production such as minimum shares of renewable generation, mean that energy system models cannot be described just as network flow problems.Instead, we must consider them as generic linear programs (LPs). 4

LP problems and solution methods
LPs are a very general framework of mathematical optimization [30].The general form of an LP is The linear objective function x −→ c ⊺ x is to be minimized by choosing suitable variable values x ∈ R n + .The constraint matrix A ∈ R m×n and right-hand side vector b ∈ R m together constrain the choice of variables x with m linear constraints.
A positive vector x ∈ R n + is called feasible if it satisfies all m constraints.A set of m independent columns of A induces a basis A B of the column space of A. A feasible point x can be obtained by solving Ax = b while setting all variables not corresponding to basis columns to 0. We call such a solution x a basic feasible solution of equation 1. Geometrically, a basic feasible solution is an extreme point (a vertex) of the feasible region.
Formulating an energy system in analogy to cost-minimal flow problems as an LP is straightforward: x i is the variable holding the amount of flow through the i-th edge.Row j of A contains a non-zero entry at index i which is 1 or −1 if the i-th edge is an incoming or outgoing edge of node j respectively.b j is the flow-demand of node j. c i is the cost incurred by one unit of flow at edge i.Additionally, edge capacities and many constraints that we find in energy system models, such as efficiencies of carrier conversion or complex policies governing the operation of certain plants, can either be expressed directly or approximated by linear constraints.
As discussed above, energy system models consider the allocation of resources in time and space.Modelling time requires its discretisation into timesteps, which leads to a very particular structure of the constraint matrix A, depicted in Figure 1.This structure is often referred to as arrowhead structure.Specialized LP solution methods which try to exploit this structure to speed up the solution process are in development [31].
At the moment, the two families of algorithms most widely used to solve energy system models are the simplex algorithms and barrier or interior-point methods, of which many variants exist.These two approaches are available both in open-source software packages such as Coin-OR [32] or commercial ones such as Gurobi [33].In the following we sketch some important characteristics of both simplex and barrier methods based on the treatment in [26].Table 1 lists the main points for both methods side-by-side.
Although the two algorithms solve the same problem, they operate in entirely different ways.Simplex maintains and updates a basic feasible solution x i in every step i of the algorithm.To find x i+1 , simplex chooses one of the neighboring vertices of x i that improves the objective value.This jump from x i to x i+1 is performed algebraically by exchanging a column in the current basis of A. Convexity of the feasible region and linearity of the objective function imply that an x i which cannot be improved in this way is optimal.
Barrier maintains an interior point x of the feasible region.In every step of the algorithm, barrier finds a vector along which x can be improved with respect to c, and takes a step along this direction without leaving the feasible region.With x becoming closer to optimal, x will also move towards the boundary of the feasible region.Once the update step of x is shorter than some threshold, barrier will consider x to be close enough to optimal value and return it as the solution.
Simplex always returns a basic feasible solution while barrier in general returns an interior point of the feasible region.This has several implications: First, an interior point is, strictly speaking, not an optimal solution to an LP.However, the "gap" can in theory be made arbitrarily small.Moreover, optimality of a basic feasible solution can also only be determined with a certain limited precision.Thus in practice, solvers typically allow controlling the precision of both methods.Neither "optimal" solutions returned by simplex nor by barrier are generally better, but the precision of the solution depends on the solver configuration.
Second, there are cases where a basic feasible solution is needed: (1) when solving a MILP problem, and (2) if the obtained solution is to be used to warm-start subsequent solver runs on the same (or a slightly different) model.
Third, an interior solution has potentially many more variables away from their bounds than a basic solution.Since variables representing real-life quantities often have a lower bound of 0, this means that solutions returned by barrier usually have many more non-zeros than basic feasible solutions; potentially these non-zeros are unrealistically small for the considered problem.A solution with many non-zeros may be harder to interpret and may thus be undesirable.Computes a vector along which the objective is improved, projects it onto the nullspace of A and updates x along this vector while maintaining x > 0 Main computation Main computational cost is solving m× m linear systems of equations for the pivot step Main computational cost is projecting the update vector onto the nullspace of A. This entails inverting the m × m matrix Â Â⊺ in every step where Â is a scaled version of A.

Termination condition
Terminates if no basis update can improve the objective value over the current basic feasible solution Terminates if the last update step was shorter than some threshold

Return value Returns a basic feasible solution
Returns an interior point that is close to the boundary of the feasible region (we will denote this an interior solution).A crossover method can be added to retrieve an optimal basic feasible solution from an interior solution

Table 1: Comparison of major characteristics of simplex and barrier methods
To rectify these downsides of interior solutions, a dedicated crossover method can be used to find an optimal basic feasible solution from an optimal interior solution [34].
Neither of the two methods is generally faster than the other.However, the factorization step of Â Â⊺ in barrier is more amenable to parallelization than the pivot operations of simplex [26].Nevertheless, the most suitable algorithm depends on the structure of the LP at hand [26,35].Two LPs of the same size can lead to vastly different solution times for both methods, simplex and barrier.This is mostly due to two reasons: numerical solvers can recognize and exploit problems with special structures and they are susceptible to numerical problems.Three aspects are particularly influential: sparsity (structure), degeneracy (structure), and the numerical range.
The first aspect impacting the solution time is sparsity.Energy system models typically lead to LPs that are highly sparse.This follows directly from the arrowhead structure depicted in Figure 1: the number of zero entries scales quadratically in the number of time steps, whereas the number of nonzero entries is linear in the number of time steps.A typical model might consider a full year at a granularity of 1 hour time steps, causing most entries of the matrix to be zero.In general, sparsity is a desirable property of a linear program because both simplex and barrier solvers have ways of profiting from sparsity in the constraint matrix A. Interestingly, barrier is very sensitive to the sparsity-pattern of A while it matters almost not at all to simplex.The reason for this is that for barrier the product AA ⊺ needs to be sparse.Note that if A has some dense columns, then AA ⊺ will be dense too.On the other hand, if A has no dense columns but some dense rows, then AA ⊺ will be sparse.Simplex, on the other hand, relies on the LU decompositions of bases of A being sparse.[36] observe that a large part of most bases of A can be permuted to triangular form.Thus, for most bases only a very small part actually needs to be factorized, the largest part corresponds directly to the basis itself and thus the resulting LU decomposition will be almost as sparse as the original basis.
The second aspect that influences solution time is degeneracy.Degeneracy in the context of linear programming means that multiple bases of the constraint matrix A lead to the same basic feasible solution.Standard minimum-cost flow problems are often inherently degenerate [35,37].Degeneracy can pose significant problems to the simplex method: because multiple bases lead to the same solution, simplex updates may fail to improve the objective and stall progress.Computational studies have shown that up to 90% of all pivot operations of the simplex method on min-cost flow problems can be degenerate [37].Barrier methods maintain an interior point of the feasible region and never actually encounter a basic feasible solution.Degeneracy thus matters less to the operation of barrier methods [26].
The third and final aspect is a large numerical range of problems, given by the absolute values of coefficients in the constraint matrix, the right-hand side and the objective function.Energy system models often have a very large numerical range.This is a consequence of two facts: Firstly, the input data of energy system models correspond to different physical quantities (e.g.energy, area, cost).Choosing inappropriate combinations of units to represent these quantities will lead to large numerical ranges.Secondly, even within quantities of the same unit we might encounter large numerical ranges, for instance, between operating costs and investment costs for new generation infrastructure, or even just between the operating costs of very different technologies like photovoltaics and gas-fired power generation.A large numerical range can result in increased solution times and even lead to non-convergence in extreme cases.It can furthermore lead to loss of precision due to round-off errors.To address these issues, it is possible to scale the linear problem before solving it.

Scaling
Choosing positive scaling factors r 1 , . . ., r m , s 1 , . . ., s n ∈ R + we can scale each row i of A by r i and each column j of A by s j by multiplying A from left and right with the diagonal matrices R := diag(r 1 , . . ., r m ) and S := diag(s 1 , . . ., s n ).
Perhaps unintuitively, the original problem 2 is equivalent to problem 3 which is scaled with R and S. A short proof for this is in the Appendix A. While being equivalent, problem 2 may have better numerical properties than the original LP.In the following we consider the numerical range and the condition number of A as two factors that decide whether an LP has good or bad numerical properties.The numerical range max x∈A,b,c |x| min x∈A,b,c |x| of an LP impacts the performance of the solution methods.The reason for this is that most solvers use absolute tolerances to compare numbers.For instance, a central parameter of the Gurobi solver is the feasibility tolerance τ f , set by default to τ f = 10 −6 .To check if a point x satisfies some constraint A i x ≤ b i , Gurobi checks whether A i x − b i ≤ τ f .If such computations are at the order of 10 10 , then the relative error of 10 −16 induced by floating point arithmetic is at the order of the tolerance τ f .In other words, feasibility can no longer be reliably computed.Similarly, if some computation is at the order of 10 −6 , the result will likely be meaningless to Gurobi.If our model now includes both very large and very small numbers, it is likely that both types of problems appear during the solution process.It is thus desirable to limit the numerical range in our model formulation.Note that this applies equally to both simplex and barrier methods.
A different notion is the condition number κ(M) := M M −1 of a matrix M. Intuitively, the condition number of a matrix measures the effect of small rounding errors when solving the linear system of equations associated with the matrix: For a matrix with small condition number, the rounding errors will barely impact the solution.For a matrix with large condition number, the effect of the rounding error can be significant.Note that both simplex and barrier rely heavily on solving linear systems of equations.In the case of simplex, these matrices are the square submatrices of A corresponding to the visited bases of A. In the case of barrier, the important matrices are Â Â⊺ that are inverted at each step (see Table 1).Both factors, the numerical range and the condition number thus impact the accuracy of the computations performed during both simplex and barrier and both are affected by scaling.The relationship between numerical range and condition number, however, is complicated: improving one may make the other one worse.
We elaborate more on this in Appendix B. The accuracy of computations impacts the performance of these algorithms in multiple ways.First, these algorithms make decisions based on numerical computations.For instance, simplex chooses the next basic feasible solution based on some metric which is computed numerically.Large errors in the computation of this metric may cause simplex to make a suboptimal choice and increase the number of steps necessary to reach optimality.Second, to avoid the previous issue, solvers have ways to detect and deal with loss of precision.One way is switching to a numeric data type with higher precision.These types incur a higher computational cost for each operation and thus cause overall slowdown of the algorithm (a secondary impact would additionally be the higher memory cost of higher-precision data types, which in itself can be significant for very large models).

An automated scaling method
Both numerical range and condition number of a matrix are affected by scaling rows and columns of the matrix.While reducing numerical range with scaling is straightforward, reducing the condition number of all relevant matrices that are encountered during simplex and barrier is not.In the case of simplex, [27] compares many different scaling methods and their effect on the condition number.For certain problems, scaling can lead to an increase of the condition number averaged over all basis matrices encountered during a simplex run.
We have developed an automatic scaling method that focuses on minimizing the numerical range of the input data, which ignores the issue of the condition number (we refer to this method as "auto-scaling" from here one).The basic idea is to group input values by type (such as area, cost, energy, etc.) and scale all values in one group with the same factor.The assumption is that values of the same type will often be contained within an acceptable range.This approach corresponds to choosing suitable units (such as m 2 , dollars, kWh, etc.) for each type of quantity.Apart from being a natural approach to reducing numerical range, grouping values together this way considerably reduces the search space for good scaling factors.We formulate the choice of good scaling factors as a small, auxiliary optimization problem.Details are given in the appendix C.

Experimental Procedures
We consider models generated by the Calliope modelling framework [29] and want to investigate (1) which algorithm solves these problems in the least amount of time, (2) what the trade-offs when choosing between a basic feasible solution and an interior solution are, and ( 3) what the impact of our scaling approach on the solution time is.The models used for the experiments are specified in Table 2 [39] BangaloreLP Same as Bangalore but without integrality constraints [39] Table 2: Calliope models used in experiments.Note that the Euro model by default includes manual scaling to improve performance; for the experiments here we undo this manual scaling.
All models are implemented with the Calliope modelling framework.We will often add a postfix to model names to indicate the time range, e.g.Euro 15d and Euro 5m refer to the Euro-Calliope model run over a 15-day and a 5-month time range respectively, starting from January 1, always at the time step resolution of 1 hour.Note that the Bangalore model is set in a leap year while Euro and UK are not.Because we set the time ranges by date, the selected time range in the Bangalore model is often one day longer than in the other two models.
We run all benchmarks on the Euler cluster of ETH Zurich.All experiments where solution times are measured are performed on the same setup: compute nodes with two 18-core Intel Xeon Gold 6150 processors (2.7-3.7 GHz) with 192 GB of DDR4 memory clocked at 2666 MHz.Each experiment was run alone on a full node, i.e. taking up all 36 cores, to prevent competing processes from other cluster users from affecting the solution time.Note, however, that the number of cores does not correspond to the number of software threads.Except where explicitly noted we set the number of software threads for the solvers to 4. To diagnose numerical issues in our models we closely analyze the logs produced by the solver.This methodology is showcased for instance in [26].
Solver choice.We compare the suitability of the commercial Gurobi solver and the open-source Coin-OR Clp/CBC solver for our problems.We compare both simplex and barrier methods of these two solvers.For Gurobi specifically, we also investigate whether we can improve solution times through solver parameters.Other authors have performed more thorough solver comparisons on a wider class of problems 8 .The authors of [40] observe that the performance of MILP solvers is non-deterministic and is often subject to large variation due to seemingly unimportant changes: different machines, compilers, and libraries can all lead to vastly different performance.We observe the same behaviour for Gurobi's LP solvers, mostly for the crossover phase (including final simplex cleanup).In order to account for this variability in our benchmarks, we perform each measurement several times only changing the random seed value of Gurobi.For each benchmark we indicate the number of repetitions, the average and min/max solution times.
Interior vs. basic solution.As discussed in Section 4.1, the barrier method run by itself returns an interior solution which is approximately optimal, whereas the simplex method returns a basic feasible solution.When using the barrier method, it is possible to run the crossover method to find a basic feasible solution, taking the interior solution as the initial value.We consider up-and downsides of using an interior solution as compared to obtaining a basic solution.
We consider two questions: (1) how quickly we can obtain both interior and basic feasible solutions, and (2) how "good" either solution is.To answer (1) we perform benchmarks for each of the solution methods and on several models.To answer (2) we analyze the returned solution in terms of several relevant metrics: solution convergence, objective function value, and the fraction of non-zeros in the solution.We then discuss how interior and basic solutions can imply different policy decisions.
When interpreting the solution to a model we are mostly interested in the following Calliope decision variables: • capacity: indicates for each location and technology the installed capacity (from now on denoted cap).
• carrier production: indicates for each time step, location and technology, how much of a given carrier is produced (from now on denoted prod ).
• systemwide levelised cost: indicates the total per-unit cost paid for each carrier across the entire energy system (from now on denoted lcoe).
Scaling.As discussed in Sections 4.1 and 4.2, energy system models often have large numerical ranges and it is desirable to apply scaling to avoid numerical issues.The models we examine are often numerically problematic, and we investigate whether our auto-scaling method as developed in Section C can effectively mitigate these problems.To do so, we benchmark the solution time of various models with and without auto-scaling and quantify the effects of scaling.

Solver choice
In this comparison, we only consider pure LP models without integrality constraints, using the Gurobi and Coin-OR solvers, for primal simplex, dual simplex, and barrier method with and without crossover.For each model we create instances of different sizes by varying the number of time steps.The shortest time frames are of little practical relevance but are included here to give a more complete picture of solver capabilities.
We aim at configuring each solver and algorithm equivalently despite the inherent differences in configurability between Coin-OR and Gurobi's solvers: in particular, we instruct each solver to use 4 threads and we set optimality and feasibility tolerances to 10 −5 (changing from the 10 −6 default in Gurobi).We group our observations in the following paragraphs roughly by solver.

Coin-OR barrier
As an important caveat we must note that the barrier implementation of Coin-OR is meant as a baseline implementation which the user is supposed to extend with problem specific implementations.In particular, it is stated on the Clp homepage 9 that "the sparse factorization [of barrier] requires a good ordering algorithm, which the user is expected to provide (perhaps a better factorization code as well)."We nevertheless use this baseline implementation for our comparison, and perhaps unsurprisingly, Coin-OR's default barrier algorithm struggles with all but the smallest problems (Table 3).After a certain size Coin-OR fails to perform a single barrier iteration and either times out (error condition (2) ) or chooses to run the simplex method instead of barrier (error condition (1) ).Also for small model sizes, barrier of Coin-OR is not competitive with any of the other algorithms.
Primal and dual simplex Primal and dual simplex implementations of Gurobi and Coin-OR each seem to be comparable in their capabilities but Gurobi is generally slightly better (Table 3).Both dual simplex implementations solve almost the same set of problem instances, each with comparable solution times.Similarly, both primal simplex implementations solve almost the same set of problems with comparable solution times.Generally, dual simplex seems to be at least as good as primal simplex (with some exceptions for Gurobi, for instance UK 181d).Bar is barrier method, Crossover is barrier+crossover, Primal and Dual are primal and dual simplex, respectively.Each solution time is the average of two runs.Runs that did not terminate successfully within 4 hours and with 72GB of RAM are indicated with −.In those cases, we indicate the returned error: (1) wrong algorithm, (2) timeout, (3) solver error.See the text for more detailed information on what these errors mean.
Gurobi Barrier methods Gurobi's barrier method with and without crossover solves all problem instances and more instances than all other solvers (Table 3).Except for the smallest problem instances, barrier methods have the shortest solution times.Turning on crossover has an unpredictable effect on runtims: in some cases solution time is barely affected (e.g.UK 181d), in other cases the solution time triples (e.g.BangaloreLP 182d and Euro 181d).

To crossover or not
We observe that the barrier method often solves models much faster than simplex methods do, but that that the crossover step can have a significant impact on the total solution time.It is tempting to say that disabling crossover is therefore an easy way to dramatically improve solution times.However, time to solution alone is not sufficient to choose a method because the solutions returned by barrier and those returned by simplex or barrier+crossover are inherently different.In this section we thus compare the interior solutions obtained by running barrier without crossover with the basic feasible solution obtained by running either the simplex method or barrier+crossover.Table 4 compares solutions to various Calliope models that were obtained using barrier and barrier+crossover.The models considered in Table 4 are all slight variations of the models listed in Table 2.These variations were obtained by varying costs of certain technologies and, in the case of the Euro model, by reducing the size of the network.The concrete modifications performed are described in Appendix E. Solutions are compared with respect to the objective value obtained, the time it took to find the solution and the fraction of non-zero values in the solution.In this section we consider all numbers with absolute value ≤ 10 −10 to be zero.For the objective value we additionally list the signed relative error ε of the interior solution as compared with the basic feasible solution First, we note that Bangalore 2 181d was not successfully solved by the barrier method alone.Closer investigation shows that this happens consistently for this model instance while it never happens for the closely related model Bangalore 1 181d.The two models are identical apart from slightly different costs associated with technologies.As described in Appendix E, Bangalore 2 181d scales the cost contribution of carbon by 0.365 compared to the original model and by a factor of 26 compared to Bangalore 1 181d.Moreover, the Bangalore model associates dummy carbon costs with all technologies, whose sole purpose it is to prevent the solver from allocating unused capacities.The use of these technologies is otherwise unbounded.The combined effect of these modelling decisions is that Bangalore 2 181d contains a set of unbounded variables whose cost contribution is almost zero (and lower than in other instances of this model).This formulation geometrically leads to unbounded faces that are almost "flat", i.e. even distant points on the face have almost the same cost.
It is known that unbounded optimal faces may lead to numerical issues for barrier methods [41].Indeed, associating slightly higher costs to electricity transmission in Bangalore 2 181d resolves the issue entirely: barrier consistently solves the modified problem to optimality.This case illustrates that barrier can be more susceptible to numerical issues than barrier+crossover.We will return to this insight in Section 7 where we develop guidelines for model formulation.
Next, we observe that the interior solution is generally slightly worse than the basic feasible solution.However, the differences are negligible in all cases.Moreover, we recall that a better approximation can be obtained by tightening the barrier convergence tolerance parameter.
Barrier+crossover always performs additional work compared to just barrier, thus obtaining a basic feasible solution takes longer than obtaining an interior solution.A counter-example to this intuitive rule is the UK 1 180d where obtaining an interior solution takes longer than obtaining a basic feasible solution.This is most likely due to solution time variability as discussed in Section 5.
As expected, we can observe that the fraction of non-zeros is significantly higher in interior solutions: In many cases the interior solution contains twice as many nonzero decision variables as the basic feasible solution and almost all interior solutions have more than half of their decision variables away from their bounds.Table 5 breaks down the non-zeros in the Bangalore 1 181d model by decision variable.Decision variables in Calliope can have multiple dimensions, e.g. the decision variable energy cap is a vector of installed technology capacities with one entry for each technology and location combination.Table 5 highlights why a basic feasible solution might be "nicer" than an interior solution: the interior solution allocates many more technologies that produce and store electricity (energy cap, storage cap > 0).Also it schedules "facilities" to consume and produce energy on more different timesteps (carrier prod, carrier con > 0).Clearly, both interior and basic feasible solutions achieve the necessary allocation of carriers at (almost) identical cost, but the actual energy system configurations they imply are quite different.
As discussed in section 5, the most central variables of Calliope energy system models are the installed technology capacities (cap) and the energy "production" amounts (prod ), in addition to the cost parameters which directly and indirectly control these two variables.Figure 2 shows prod of the interior and basic feasible solution of the Bangalore 1 181d model side-by-side.More precisely, it shows the production at "facilities" summed up over all time steps.Figure 2 suggests an intuitive interpretation for the differences between the interior and basic feasible solution: In the interior solution, more electricity is produced centrally at location F and then distributed to many other locations using electricity lines.These other locations, in turn, produce less energy themselves.This result arises because the cost of capacity of electricity lines is very low and their usage is free.While a basic solution has as many variables as possible set to zero and thus allocates and uses fewer technologies, an interior solution will in general have all variables away from zero, which do not have an associated cost.In this example, electricity lines are excessively used in the interior solution, because they incur no cost.
In this case the basic feasible solution is arguably more interpretable and more realistic than the interior solution.Likely, this effect is pronounced because of the degree of degeneracy of the model we chose: solutions with almost-optimal cost can be very different.Table 5: Fraction of non-zero elements in the decision variables of the interior and basic feasible solutions of the Bangalore 1 181d.length is the number of entries of the decision variable, the fraction of non-zero elements for the basic feasible solution and the interior solution are indicated as basic and inter, respectively

Automatically improving scaling
The major indicators of numerical problems we find across all three examined models are: 1. Crossover makes very slow progress and sometimes needs to be restarted.

Simplex makes only very slow progress on the objective value or jumps wildly
in the solution space.
3. Barrier returns a sub-optimal objective.
4. The solver detects numerical issues and tries to counteract them by: switching to higher precision, dropping variables from the current basis, tightening Markowitz tolerance.
(1) and ( 2) often occur for the Euro model.( 3) is common for the Bangalore model.We conclude that these models exhibit numerical issues that need to be addressed.The UK model, on the other hand, seldom shows any of these issues.We find that our autoscaling approach can significantly reduce the solution time of the barrier+crossover method for numerically difficult problems.Figure 3 shows the average solution time of the Euro model for different time frames.Scaling considerably reduces the solution time.Runs that did not successfully converge were not included in the average solution time but instead, their number is indicated in brackets.We notice that autoscaling not only reduces average solution time, but also leads to more regular solution time behaviour and reduces the number of times the algorithm does not converge.However, there are two large outliers in the scaled solution times that did not converge in time.This behaviour seems surprising because in both cases the remaining four runs terminated quickly and with similar solution time.We recall that only the random seed changes within a set of 5 runs of the same model instance.We will discuss this issue of extreme outliers due to scaling below.
Figure 4 shows the effect of autoscaling on Euro 6m, UK 12m and BangaloreLP 6m.We solve each model using Gurobi's barrier+crossover method both with, and without autoscaling enabled.In the Euro 6m model the solution time improves by about 3x when applying autoscaling.In BangaloreLP there is still a noticeable improvement of solution time after scaling.In the UK model the solution time is hardly affected.This supports our hypothesis that the UK model is generally well formulated already and does not cause numerical issues.Looking at the numerical range before and after scaling of the three model instances in Table 6 shows that the original Euro 6m model has a numerical range close to 16 orders of magnitude.As discussed in Section 4.2 this may very well be the cause of grave numerical issues.Moreover, scaling achieves a significant reduction in the range of the Euro 6m model.The other two model instances, however, have a more moderate numerical range and scaling achieves a much smaller improvement of the numerical range which explains the solution time for solving these problems is less affected.Gurobi recommends numerical ranges of at most 10  the barrier algorithm (to be precise, the different sections of each bar represent the fraction between the average of the corresponding phase and the average of the total solution time).We split the crossover phase into its two parts: crossover,where a basic point (a vertex) close to the interior solution is found, and simplex, where this basic point is re-optimized using simplex steps.Interestingly, for the Euro and Bangalore models different phases of the algorithm are affected by scaling: Euro 6m gains most in the crossover phase (red) whereas for BangaloreLP 6m the crossover phase with scaling takes even longer than without scaling.BangaloreLP 6m instead gains most in the simplex phase (cyan).The results also suggest that the barrier method is generally least sensitive to scaling: whereas crossover and simplex phases often speed up noticeably, autoscaling usually shows only very minor improvement.
An important insight is that every phase of the algorithm can encounter numerical difficulties and that the exact nature of the problems leading to deteriorating solution time is subtle.In particular, a measure that improves the solution time of some phase may badly affect the solution time of another phase.Finding a scaling that always works and never deteriorates solution time proved to be a major difficulty in designing a suitable autoscaling approach.

Performance variability caused by scaling
As seen above, while scaling often reduces average time to solve a specific model using the barrier+crossover method, it seems to increase the probability of experiencing extremely long solution times for solving certain models (consider for instance the outlier behaviour in Figure 3).In particular, we regularly observe that the final simplex phase does not converge.As described above, there are two different flavours of this: either simplex just stops making progress or it jumps wildly in the solution space.The latter seems to be a strategy Gurobi applies to deal with stalling progress in the simplex solver.An intuitive explanation for why scaling might induce this behaviour is given by [27] and briefly discussed in Section 4.2: while scaling can improve the average condition number, it can increase the condition number of certain bases in the matrix.[27] gives an illustrative example where scaling increases the condition number of all relevant bases in an LP by an arbitrarily large amount.As the bases encountered during simplex are subject to randomness, this may explain why scaling in some cases leads to very long solution times or even non-convergence.
In the following, we want to quantify this effect.Pragmatically, to judge if autoscaling is still useful we want to answer the following questions 1.What is the average time it takes to solve a given model?2. How often will we encounter outlier cases with very long solution time or timeout?
We have seen above that scaling can effectively improve average solution time of a model in a previous paragraph.In order to address the second question we perform the following experiment: We consider the Euro 6m model, which exhibits the outlier behaviour discussed above, and run it 100 times with scaling and 100 times without scaling.We then evaluate the empirical distribution function of these solution times.
Figure 5 shows the histogram of runtimes of both the scaled and the unscaled data.The runtimes at the right end correspond to timeout runs.While the runtimes of the scaled model have a 40% lower mean than the ones from the unscaled model, their variance is 25% higher.This supports our hypothesis that scaling increases the probability of "exceptionally long" solution times; in other words, a distribution fitted to the scaled solution times would have a heavier tail than a distribution fitted to the unscaled solution times.

Discussion: guidelines for modelling
We find that barrier is generally superior to simplex for the kinds of models investigated here, and barrier alone is faster than barrier+crossover as the latter performs strictly additional work.The difference in solution time, however, is unpredictable and can be large or barely noticeable, depending on the model instance.Barrier also sometimes fails to converge.In particular, barrier encounters numerical difficulties in models with unbounded (large), optimal surfaces.Such cases can potentially be remedied if crossover is run after barrier.However, we find it difficult to answer the question of whether or not running crossover is worth the additional effort in  finding a basic feasible solution.Interior solutions inherently have a larger number of non-zero values in their decision variables as compared to a basic feasible solution.This may negatively affect the interpretability of results, and it can imply very different real-world system designs than those implied by a basic feasible solution -especially in models in which very different system designs are close to optimal.For instance, the interior solution of one Bangalore model produced more electricity in a centralized location and distributed it via transmission links compared to the basic feasible solution.Investing in the additional effort of crossover must likely be decided on a case-by-case and model-by-model basis, and underscores once again the general problem with relying on a single, "optimal" result [12,13,42].
To enable the use of crossover without excessive solution times, we find that model scaling can be helpful; our autoscaling approach often reduces the average solution time of barrier+crossover significantly.This reduction generally happens in the crossover and simplex phases.From this, we extrapolate that autoscaling has a negligible effect on the solution time of barrier alone but that it improves the average solution time of the simplex method.We also find that autoscaling increases the probability of solution time outliers: We see more exceptionally long solution times when using autoscaling.This problem requires further investigation in future work.However, as the autoscaling yields almost always shorter solution times and higher convergence rates than our unscaled base cases, autoscaling seems to be a no-regret solution in terms of solution time for barrier+crossover.
A separate issue which is not easily addressed with scaling is the problem of costfree technologies.The model Bangalore 2 181d consistently failed to converge on a solution using the barrier method (Table 4).For this model, we found that two small modifications to the original model formulation each lead to successful optimization by barrier: increasing costs for the installed capacity of electricity transmission technologies by 2 orders of magnitude, and upper-bounding the energy capacity of electricity transmission.Both measures counteract the numerical problems caused by unbounded optimal faces, each in a different way, supporting our hypothesis that these problems are due to unbounded optimal faces.It is thus advisable to avoid model formulations with free and unbounded technologies.There are, however, cases in the real world where some technologies are virtually free to operate or are free to install given the system scope.Electricity transmission is one example: their operation cost can be considered negligible compared to the cost of installation, and sizing them might be outside the scope of the problem altogether.In these cases it seems unavoidable to resort to dummy costs, but the modeller should be aware that it is non-trivial to set these dummy costs in a way that represents the real world whilst also being high enough to avoid unbounded optimal faces.If updates to the algorithm, model scaling, and cost-free technologies do not enable model tractability, adjusting solver tolerances may still help.Since solver authors made certain assumptions about the input models they will need to solve when setting tolerances, they may not be optimally set for the numerical range or required accuracy of energy system models.Choosing the right value for tolerances is hard and usually the underlying problem is not solved by modifying tolerances, thus it is only advisable to do so on rare occasions.Three key tolerances are Feasibility, Optimality and Barrier-Convergence 10 .They all control how tightly some inequality must be fulfilled.Tightening Optimality and Barrier-Convergence tolerances will lead to a better objective value in the solution returned by simplex/barrier+crossover methods and barrier methods, respectively.However, tightening these tolerances too much may lead to longer solution times and (in the case of barrier) to nonconvergence.The Feasibility tolerance controls how strictly constraints need to be satisfied for a solution to be feasible.Loosening these tolerances may lead to faster convergence of simplex and barrier+crossover methods.However, this usually does not magically resolve all difficulties with numerically challenging problems.It also requires careful experimentation to ensure results do not violate physical properties of the system being described (e.g., negative stored energy).
In the context of these results, we can formulate the following guidelines to improve the computational performance of typical energy system optimisation models: • On large and difficult models, manually select the barrier method or bar-rier+crossover method.Prefer the latter if you suspect your model formulation to be numerically unstable or if you want a minimal solution in terms of technology allocations.
• Use appropriate units that minimize the model's numerical range or apply an automatic scaling procedure like the one we introduce here to derive them automatically.
• Be wary of model formulations with cost-free technologies and dummy costs, as those can dramatically worsen the numerical properties of the model and thus increase solution time • Know the basic solver tolerance settings for your chosen solver and adjust them if necessary; however, this should usually be the very last resort.
Ultimately, more systematic work to understand the properties of energy system models could help them provide better decision support, for example by making it possible to rapidly explore large numbers of scenarios or alternative solutions even in models that depict the system with high spatial and temporal detail.Promising avenues are custom solvers that exploit these model properties [43], or even solvers that exploit the fact that a single optimal solution is not necessarily useful; a range of near-optimal solutions, for example extracted from an interior-point algorithm, could be just as relevant for real-world applications [12,13,42].To complement this, more systematic work is needed on the difference between basic feasible and interior solutions and the implications of these differences for the resulting energy system designs.While black-box solvers like the ones examined here are continuing to be developed and becoming more powerful, the energy modelling community could reap many practical benefits if such work can improve solution times by multiples or even orders of magnitude, as some of our explorations here suggests may be possible.which has arbitrarily large condition number and numerical range 1 ε .Scaling the second row of 7 with 1 ε yields the identity matrix which has both condition number and numerical range of 1.
On first sight, it may seem as though minimizing numerical range will always also reduce the condition number.The next example illustrates that this relationship does not hold in general.consider the matrix Note that if ε = 1 the numerical range of the matrix is 1, as small as it can get.At the same time, the matrix is singular, i.e. its condition number is ∞.Making ε go to 0 will continually increase the numerical range while decreasing the condition number of the matrix.It follows, that we cannot in general optimize both the condition number and the numerical range simultaneously.
A second problem with improving the condition number of LPs is that it's not possible to improve the condition number of all bases simultaneously: improving the condition number of one basis may adversely affect the condition number of another basis.Since it is unknown a priori which bases of A will be visited in the course of the simplex algorithm it is possible that a scaling method has a negative effect on the bases actually encountered during simplex.[27] investigates the effects of many mainstream scaling methods on the condition number in a large set of practical linear programs.

C Scaling Method
In the following we describe the automatic scaling method we develop for energy system models formulated with Calliope, which we refer to as autoscaling.As discussed in Appendix B, minimizing the condition number of the constraint matrix A, apart from being hard, may not be desirable.[28] distinguishes between: • Optimal scaling methods that minimize a certain metric on the constraint matrix A.
• Empirical scaling methods, such as equilibration, that are sometimes found to work well in practice but don't provide any guarantees about their result.
For optimal scaling methods [28] lists two possible objectives There are algorithms that approximately solve both 9 [44] and 10 [45].We devise our own approximation for 9 that incorporates additional domain knowledge about our models.
Most variables in Calliope are associated with some physical quantity such as energy or costs.We call this the type of the quantity.Values types can be of vastly different orders of magnitude.This effect is often a side effect representing certain types of quantities (cost, energy, etc.) with standard units (dollars, kWh, etc.), regardless of the range of their values.Informally, autoscaling automates the choice of suitable units for all types of quantities.Therefore, it can decrease the numerical range between different types of quantities but not within the same type.Note that changing the unit of a type of quantity is a special case of the scaling considered in Section 4.2 with R = I and S = diag(f u i ) where u i for i ∈ [n] is the unit of the i-th decision variable.
In the context of Calliope, it is necessary to distinguish between base units and derived nits.The sets U of base units and V of derived units are related as follows: This distinction is necessary because Calliope models often feature combinations of base units such as cost per energy.Scaling energy by a factor s e and cost by a factor s c we necessarily need to scale cost per energy by the factor sc se in order to retain consistency.Associating a special base unit 1 ∈ U with all quantities that do not correspond to a physical quantity allows us to represent all derived units v ∈ V of Calliope models as a fraction of base units v = u i u j .Let A u the set of values in the model that have unit u ∈ V .Our goal is to find scaling factors f u for each unit u ∈ V that minimize Next, we discuss some considerations when choosing scaling factors and then we explain how we actually compute them.
a sensible absolute size.
Optimal scaling factors We now formulate an auxiliary optimization problem for finding scaling factors f u taking into account the results of the previous discussion: We wish to find scaling factors S = {f u | u ∈ U} ⊆ {2 x |x ∈ Z} that are powers of two and that minimize the the numeric range between different types of variables while avoiding to scale any variable below some threshold L. Note that all variables of some column A i of constraint matrix A have the unit of the i-th decision variable.Thus denote by u i , v i ∈ U for each i ∈ [n] the base units satisfying that all variables in column i of A have unit u i v i .We thus wish to solve min S⊆{2 x |x∈Z} max 0≤i,j≤m 0≤k,l≤n s.t.
We can rephrase this into an integer LP by taking logs of the entries of A and of the scaling factors f ∈ S. That is, we define f u := 2 gu for each u ∈ U, and constrain g u ∈ Z to ensure that all scaling factors are powers of two.min r s.t.g u k − g v k + log(A i,k ) − g u l + g v l − log(A j,l ) ≤ r ∀ i, j, k, l g u k − g v k ≥ log L A i,k ∀k ∈ {0, . . ., n}, ∀i ∈ {0, . . ., m} Note that we don't actually need to consider all entries of A. It suffices to include the minimum and the maximum of value of each unit.This simplification considerably reduces the amount of constraints from n 2 m 2 to |U| 2 which is usually very small (below 100).Moreover, instead of solving the actual integer LP, we can relax the integrality constraint of g u and round the resulting variables to the closest integer.This will give a 4-approximation of the optimal scaling factors.In practice we find that the integer LP is solved quite rapidly, thus we keep the integrality constraint in place.

D Markowitz tolerance
In computing the LU factorization of a matrix, the Markowitz tolerance controls which elements are sufficiently large to be used as pivots.An LU factorization of the constraint matrix A is computed at several different stages of the algorithm: during the simplex algorithm in order to recompute the current basis [46][47][48] and during crossover in order to construct a valid basis from scratch [34].Choosing a large Markowitz tolerance means that many potential pivots are disregarded in order to ensure a numerically stable basis.When solving certain numerically challenging models, the crossover method will often need to restart with an updated value of the Markowitz tolerance.One might expect that setting a more conservative value for the Markowitz tolerance in the first place will prevent this but we could not consistently verify this in our benchmarks.

E Model Variations in Experiments
In this section we describe the model variations we used in 6.2.

Table 8
Battery cost controls the cost per energy capacity and the cost per storage capacity of batteries.
Zones varies the number of countries in the model.While the baseline model consists of 34 European countries, each making up one zone of the model, the variations consist of 20 countries and 1 country (Germany), respectively.
CO2 caps limits the total amount of co2 produced in each location.The bound ranges from 1.2 Mt in Cyprus to 184 Mt in Germany.
Renewable shares requires a certain share of the total electricity consumption of each country to exceed a minimum value.This value varies per country and ranges from 11% in Luxembourg to 78% in Portugal.
Hydro reservoir controls whether the hydro reservoir technology can be used to store energy (yes) or not (no).
Data source describes what source was used in the model to control the resource constraints of renewables.The original model contains csv timeseries for each of wind, pv and hydro.In average, these values were simply averaged over the whole timerange.Battery cost controls the cost per energy capacity and the cost per storage capacity of batteries.

Model
Imports sets the high-voltage energy import per zone.The value shown in the table is aggregated over all zones.
A Renewables share of X% forces the combined electricity production of wind onshore, wind offshore, pv rooftop, pv utility scale and hydro to make up at least X% of the total electricty.
New Nuclear allows for 17.3 GW additional energy produced by nuclear technology.

E.3 Bangalore Model
Cost of carbon is the factor of the cost of emitted carbon in the objective function.

Figure 1 :
Figure 1: A • x = b in block-matrix notation showing the typical sparsity pattern of the constraint matrix A of an energy system model.Variables x i are specific to timestep i and are constrained using parameters τ i and b i .cx = b c are constraints containing dependencies across time steps.Parameters v and variables z model time-independent quantities.

Figure 3 :
Figure 3: Average absolute solution time for the Euro model for time frames between 1 and 12 months both with and without autoscaling and using barrier+crossover.Average was taken over 5 runs, the vertical bars show the max and min solution time.Runs taking more than 20h were aborted and not considered in either average or errorbars.The number of aborted runs per instance is indicated in brackets next to the average solution time.

Figure 4 :
Figure 4: Normalized solution time of barrier+crossover for three models with autoscaling (left bar) and without autoscaling (right bar).The solution time is broken down in the four main phases of the algorithm.Each bar is the average of the solution times of 10 experiments.Absolute solution time in seconds is indicated above the bars of the unscaled models.Error bars were omited to improve readability.

Figure 5 :
Figure 5: Histogram with 30 bins showing runtimes of 100 runs of each, the scaled and unscaled Euro 6m model . .

Table 3 :
Solution times in seconds of different solvers on different model instances.

Table 4 :
Comparison of objective value, solution time and fraction of non-zeros of basic feasible (basic) and interior (inter) solutions of various Calliope models using barrier+crossover (basic) and barrier only (inter), respectively.ε is the relative error of the interior solution compared with the basic feasible solution.Solution time is in seconds.Non-zeros denotes the fraction of decision variables with absolute value > 10 −10 .

Table 6 :
9, further supporting these conclusions.Improvement of κ ′ by scalingIn Figure4we report the fraction of the solution time spent in each phase of