Introduction to Constrained Optimization in the Wolfram Language

Optimization problems.

solving constrained optimization problems

The following describes constrained optimization problems more precisely, restricting the discussion to minimization problems for brevity.

Global Optimization

solving constrained optimization problems

Local Optimization

solving constrained optimization problems

A local minimum may not be a global minimum. A global minimum is always a local minimum.

Solving Optimization Problems

solving constrained optimization problems

Wolfram Language functions for constrained optimization include Minimize , Maximize , NMinimize , and NMaximize for global constrained optimization, FindMinimum for local constrained optimization, and LinearOptimization for efficient and direct access to linear optimization methods. The following table briefly summarizes each of the functions.

Summary of constrained optimization functions.

Here is a decision tree to help in deciding which optimization function to use.

Related Tech Notes

  • Constrained Optimization
  • Linear Optimization
  • Numerical Nonlinear Local Optimization
  • Numerical Nonlinear Global Optimization
  • Exact Global Optimization

Enable JavaScript to interact with content and submit forms on Wolfram websites. Learn how

Help Center Help Center

  • Help Center
  • Trial Software
  • Product Updates
  • Documentation

Constrained Nonlinear Optimization Algorithms

Constrained optimization definition.

Constrained minimization is the problem of finding a vector x that is a local minimum to a scalar function f ( x ) subject to constraints on the allowable x :

min x f ( x )

such that one or more of the following holds: c ( x ) ≤ 0, ceq ( x ) = 0, A·x  ≤  b , Aeq·x  =  beq , l  ≤  x  ≤  u . There are even more constraints used in semi-infinite programming; see fseminf Problem Formulation and Algorithm .

fmincon Trust Region Reflective Algorithm

Trust-region methods for nonlinear minimization.

Many of the methods used in Optimization Toolbox™ solvers are based on trust regions, a simple yet powerful concept in optimization.

To understand the trust-region approach to optimization, consider the unconstrained minimization problem, minimize f ( x ), where the function takes vector arguments and returns scalars. Suppose you are at a point x in n -space and you want to improve, i.e., move to a point with a lower function value. The basic idea is to approximate f with a simpler function q , which reasonably reflects the behavior of function f in a neighborhood N around the point x . This neighborhood is the trust region. A trial step s is computed by minimizing (or approximately minimizing) over N . This is the trust-region subproblem,

The current point is updated to be x  +  s if f ( x  +  s ) <  f ( x ) ; otherwise, the current point remains unchanged and N , the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach to minimizing f ( x ) are how to choose and compute the approximation q (defined at the current point x ), how to choose and modify the trust region N , and how accurately to solve the trust-region subproblem. This section focuses on the unconstrained problem. Later sections discuss additional complications due to the presence of constraints on the variables.

In the standard trust-region method ( [48] ), the quadratic approximation q is defined by the first two terms of the Taylor approximation to F at x ; the neighborhood N is usually spherical or ellipsoidal in shape. Mathematically the trust-region subproblem is typically stated

where g is the gradient of f at the current point x , H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, Δ is a positive scalar, and ‖ . ‖ is the 2-norm. Good algorithms exist for solving Equation 2 (see [48] ); such algorithms typically involve the computation of all eigenvalues of H and a Newton process applied to the secular equation

1 Δ − 1 ‖ s ‖ = 0.

Such algorithms provide an accurate solution to Equation 2 . However, they require time proportional to several factorizations of H . Therefore, for large-scale problems a different approach is needed. Several approximation and heuristic strategies, based on Equation 2 , have been proposed in the literature ( [42] and [50] ). The approximation approach followed in Optimization Toolbox solvers is to restrict the trust-region subproblem to a two-dimensional subspace S ( [39] and [42] ). Once the subspace S has been computed, the work to solve Equation 2 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only two-dimensional). The dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace S is determined with the aid of a preconditioned conjugate gradient process described below. The solver defines S as the linear space spanned by s 1 and s 2 , where s 1 is in the direction of the gradient g , and s 2 is either an approximate Newton direction, i.e., a solution to

or a direction of negative curvature,

The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.

Solve Equation 2 to determine the trial step s .

If f ( x + s ) < f ( x ) , then x = x + s .

These four steps are repeated until convergence. The trust-region dimension Δ is adjusted according to standard rules. In particular, it is decreased if the trial step is not accepted, i.e., f ( x + s ) ≥ f ( x ) . See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special cases of f with specialized functions: nonlinear least-squares, quadratic functions, and linear least-squares. However, the underlying algorithmic ideas are the same as for the general case. These special cases are discussed in later sections.

Preconditioned Conjugate Gradient Method

A popular way to solve large, symmetric, positive definite systems of linear equations Hp  = – g is the method of Preconditioned Conjugate Gradients (PCG). This iterative approach requires the ability to calculate matrix-vector products of the form H·v where v is an arbitrary vector. The symmetric positive definite matrix M is a preconditioner for H . That is, M  =  C 2 , where C –1 HC –1 is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix H is symmetric. However, H is guaranteed to be positive definite only in the neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of negative (or zero) curvature, that is, d T Hd  ≤ 0 . The PCG output direction p is either a direction of negative curvature or an approximate solution to the Newton system Hp  = – g . In either case, p helps to define the two-dimensional subspace used in the trust-region approach discussed in Trust-Region Methods for Nonlinear Minimization .

Linear Equality Constraints

Linear constraints complicate the situation described for unconstrained minimization. However, the underlying ideas described previously can be carried through in a clean and efficient way. The trust-region methods in Optimization Toolbox solvers generate strictly feasible iterates.

The general linear equality constrained minimization problem can be written

where A is an m -by- n matrix ( m  ≤  n ). Some Optimization Toolbox solvers preprocess A to remove strict linear dependencies using a technique based on the LU factorization of A T [46] . Here A is assumed to be of rank m .

The method used to solve Equation 5 differs from the unconstrained approach in two significant ways. First, an initial feasible point x 0 is computed, using a sparse least-squares step, so that Ax 0  =  b . Second, Algorithm PCG is replaced with Reduced Preconditioned Conjugate Gradients (RPCG), see [46] , in order to compute an approximate reduced Newton step (or a direction of negative curvature in the null space of A ). The key linear algebra step involves solving systems of the form

where A ˜ approximates A (small nonzeros of A are set to zero provided rank is not lost) and C is a sparse symmetric positive-definite approximation to H , i.e., C  =  H . See [46] for more details.

Box Constraints

The box constrained problem is of the form

where l is a vector of lower bounds, and u is a vector of upper bounds. Some (or all) of the components of l can be equal to –∞ and some (or all) of the components of u can be equal to ∞. The method generates a sequence of strictly feasible points. Two techniques are used to maintain feasibility while achieving robust convergence behavior. First, a scaled modified Newton step replaces the unconstrained Newton step (to define the two-dimensional subspace S ). Second, reflections are used to increase the step size.

The scaled modified Newton step arises from examining the Kuhn-Tucker necessary conditions for Equation 7 ,

D ( x ) = diag ( | v k | − 1 / 2 ) ,

and the vector v ( x ) is defined below, for each 1 ≤  i  ≤  n :

If g i  < 0 and u i  < ∞ then v i  =  x i  –  u i

If g i  ≥ 0 and l i  > –∞ then v i  =  x i  –  l i

If g i  < 0 and u i  = ∞ then v i  = –1

If g i  ≥ 0 and l i  = –∞ then v i  = 1

The nonlinear system Equation 8 is not differentiable everywhere. Nondifferentiability occurs when v i  = 0 . You can avoid such points by maintaining strict feasibility, i.e., restricting l  <  x  <  u .

The scaled modified Newton step s k for the nonlinear system of equations given by Equation 8 is defined as the solution to the linear system

at the k th iteration, where

Here J v plays the role of the Jacobian of | v |. Each diagonal component of the diagonal matrix J v equals 0, –1, or 1. If all the components of l and u are finite, J v  = diag(sign( g )) . At a point where g i  = 0 , v i might not be differentiable. J i i v = 0 is defined at such a point. Nondifferentiability of this type is not a cause for concern because, for such a component, it is not significant which value v i takes. Further, | v i | will still be discontinuous at this point, but the function | v i |· g i is continuous.

Second, reflections are used to increase the step size. A (single) reflection step is defined as follows. Given a step p that intersects a bound constraint, consider the first bound constraint crossed by p ; assume it is the i th bound constraint (either the i th upper or i th lower bound). Then the reflection step p R  =  p except in the i th component, where p R i  = – p i .

fmincon Active Set Algorithm


In constrained optimization, the general aim is to transform the problem into an easier subproblem that can then be solved and used as the basis of an iterative process. A characteristic of a large class of early methods is the translation of the constrained problem to a basic unconstrained problem by using a penalty function for constraints that are near or beyond the constraint boundary. In this way the constrained problem is solved using a sequence of parametrized unconstrained optimizations, which in the limit (of the sequence) converge to the constrained problem. These methods are now considered relatively inefficient and have been replaced by methods that have focused on the solution of the Karush-Kuhn-Tucker (KKT) equations. The KKT equations are necessary conditions for optimality for a constrained optimization problem. If the problem is a so-called convex programming problem, that is, f ( x ) and G i ( x ), i = 1,..., m , are convex functions, then the KKT equations are both necessary and sufficient for a global solution point.

Referring to GP ( Equation 1 ), the Kuhn-Tucker equations can be stated as

in addition to the original constraints in Equation 1 .

The first equation describes a canceling of the gradients between the objective function and the active constraints at the solution point. For the gradients to be canceled, Lagrange multipliers ( λ i , i = 1,..., m ) are necessary to balance the deviations in magnitude of the objective function and constraint gradients. Because only active constraints are included in this canceling operation, constraints that are not active must not be included in this operation and so are given Lagrange multipliers equal to 0. This is stated implicitly in the last two Kuhn-Tucker equations.

The solution of the KKT equations forms the basis to many nonlinear programming algorithms. These algorithms attempt to compute the Lagrange multipliers directly. Constrained quasi-Newton methods guarantee superlinear convergence by accumulating second-order information regarding the KKT equations using a quasi-Newton updating procedure. These methods are commonly referred to as Sequential Quadratic Programming (SQP) methods, since a QP subproblem is solved at each major iteration (also known as Iterative Quadratic Programming, Recursive Quadratic Programming, and Constrained Variable Metric methods).

The 'active-set' algorithm is not a large-scale algorithm; see Large-Scale vs. Medium-Scale Algorithms .

Sequential Quadratic Programming (SQP)

SQP methods represent the state of the art in nonlinear programming methods. Schittkowski  [36] , for example, has implemented and tested a version that outperforms every other tested method in terms of efficiency, accuracy, and percentage of successful solutions, over a large number of test problems.

Based on the work of Biggs  [1] , Han  [22] , and Powell ( [32] and [33] ), the method allows you to closely mimic Newton's method for constrained optimization just as is done for unconstrained optimization. At each major iteration, an approximation is made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This is then used to generate a QP subproblem whose solution is used to form a search direction for a line search procedure. An overview of SQP is found in Fletcher  [13] , Gill et al.  [19] , Powell  [35] , and Schittkowski  [23] . The general method, however, is stated here.

Given the problem description in GP ( Equation 1 ) the principal idea is the formulation of a QP subproblem based on a quadratic approximation of the Lagrangian function.

Here you simplify Equation 1 by assuming that bound constraints have been expressed as inequality constraints. You obtain the QP subproblem by linearizing the nonlinear constraints.

Quadratic Programming (QP) Subproblem

This subproblem can be solved using any QP algorithm (see, for instance, Quadratic Programming Solution ). The solution is used to form a new iterate

x k + 1 = x k + α k d k .

The step length parameter α k is determined by an appropriate line search procedure so that a sufficient decrease in a merit function is obtained (see Updating the Hessian Matrix ). The matrix H k is a positive definite approximation of the Hessian matrix of the Lagrangian function ( Equation 13 ). H k can be updated by any of the quasi-Newton methods, although the BFGS method (see Updating the Hessian Matrix ) appears to be the most popular.

A nonlinearly constrained problem can often be solved in fewer iterations than an unconstrained problem using SQP. One of the reasons for this is that, because of limits on the feasible area, the optimizer can make informed decisions regarding directions of search and step length.

Consider Rosenbrock's function with an additional nonlinear inequality constraint, g ( x ),

This was solved by an SQP implementation in 31 iterations compared to 37 for the unconstrained case. The figure shows the path to the solution point x  = [0.9072,0.8228] starting at x  = [–1.9,2.0] .

Level curves of the Rosenbrock function are close to the parabola y = x^2. The iterative steps follow the parabola from upper left, down around the origin, and end at upper right within the constraint boundary.

SQP Implementation

The SQP implementation consists of three main stages, which are discussed briefly in the following subsections:

Updating the Hessian Matrix

Quadratic Programming Solution


Line Search and Merit Function

Updating the Hessian Matrix.   At each major iteration a positive definite quasi-Newton approximation of the Hessian of the Lagrangian function, H , is calculated using the BFGS method, where λ i , i  = 1,..., m , is an estimate of the Lagrange multipliers.

s k = x k + 1 − x k q k = ( ∇ f ( x k + 1 ) + ∑ i = 1 m λ i ⋅ ∇ g i ( x k + 1 ) ) − ( ∇ f ( x k ) + ∑ i = 1 m λ i ⋅ ∇ g i ( x k ) ) .

Powell  [33] recommends keeping the Hessian positive definite even though it might be positive indefinite at the solution point. A positive definite Hessian is maintained providing q k T s k is positive at each update and that H is initialized with a positive definite matrix. When q k T s k is not positive, q k is modified on an element-by-element basis so that q k T s k > 0 . The general aim of this modification is to distort the elements of q k , which contribute to a positive definite update, as little as possible. Therefore, in the initial phase of the modification, the most negative element of q k * s k is repeatedly halved. This procedure is continued until q k T s k is greater than or equal to a small negative tolerance. If, after this procedure, q k T s k is still not positive, modify q k by adding a vector v multiplied by a constant scalar w , that is,

v i = ∇ g i ( x k + 1 ) ⋅ g i ( x k + 1 ) − ∇ g i ( x k ) ⋅ g i ( x k )            if  ( q k ) i ⋅ w < 0  and  ( q k ) i ⋅ ( s k ) i < 0 ,   i = 1 , ... , m v i = 0   otherwise,

and increase w systematically until q k T s k becomes positive.

The functions fmincon , fminimax , fgoalattain , and fseminf all use SQP. If Display is set to 'iter' in options , then various information is given such as function values and the maximum constraint violation. When the Hessian has to be modified using the first phase of the preceding procedure to keep it positive definite, then Hessian modified is displayed. If the Hessian has to be modified again using the second phase of the approach described above, then Hessian modified twice is displayed. When the QP subproblem is infeasible, then infeasible is displayed. Such displays are usually not a cause for concern but indicate that the problem is highly nonlinear and that convergence might take longer than usual. Sometimes the message no update is displayed, indicating that q k T s k is nearly zero. This can be an indication that the problem setup is wrong or you are trying to minimize a noncontinuous function.

Quadratic Programming Solution.   At each major iteration of the SQP method, a QP problem of the following form is solved, where A i refers to the i th row of the m -by- n matrix A .

The method used in Optimization Toolbox functions is an active set strategy (also known as a projection method) similar to that of Gill et al., described in [18] and [17] . It has been modified for both Linear Programming (LP) and Quadratic Programming (QP) problems.

The solution procedure involves two phases. The first phase involves the calculation of a feasible point (if one exists). The second phase involves the generation of an iterative sequence of feasible points that converge to the solution. In this method an active set, A ¯ k , is maintained that is an estimate of the active constraints (i.e., those that are on the constraint boundaries) at the solution point. Virtually all QP algorithms are active set methods. This point is emphasized because there exist many different methods that are very similar in structure but that are described in widely different terms.

A ¯ k is updated at each iteration k , and this is used to form a basis for a search direction d ^ k . Equality constraints always remain in the active set A ¯ k . The notation for the variable d ^ k is used here to distinguish it from d k in the major iterations of the SQP method. The search direction d ^ k is calculated and minimizes the objective function while remaining on any active constraint boundaries. The feasible subspace for d ^ k is formed from a basis Z k whose columns are orthogonal to the estimate of the active set A ¯ k (i.e., A ¯ k Z k = 0 ). Thus a search direction, which is formed from a linear summation of any combination of the columns of Z k , is guaranteed to remain on the boundaries of the active constraints.

The matrix Z k is formed from the last m  –  l columns of the QR decomposition of the matrix A ¯ k T , where l is the number of active constraints and l < m . That is, Z k is given by

Q T A ¯ k T = [ R 0 ] .

Once Z k is found, a new search direction d ^ k is sought that minimizes q ( d ) where d ^ k is in the null space of the active constraints. That is, d ^ k is a linear combination of the columns of Z k : d ^ k = Z k p for some vector p .

Then if you view the quadratic as a function of p , by substituting for d ^ k , you have

Differentiating this with respect to p yields

∇ q ( p ) is referred to as the projected gradient of the quadratic function because it is the gradient projected in the subspace defined by Z k . The term Z k T H Z k is called the projected Hessian. Assuming the Hessian matrix H is positive definite (which is the case in this implementation of SQP), then the minimum of the function q ( p ) in the subspace defined by Z k occurs when ∇ q ( p ) = 0 , which is the solution of the system of linear equations

A step is then taken of the form

At each iteration, because of the quadratic nature of the objective function, there are only two choices of step length α . A step of unity along d ^ k is the exact step to the minimum of the function restricted to the null space of A ¯ k . If such a step can be taken, without violation of the constraints, then this is the solution to QP ( Equation 18 ). Otherwise, the step along d ^ k to the nearest constraint is less than unity and a new constraint is included in the active set at the next iteration. The distance to the constraint boundaries in any direction d ^ k is given by

which is defined for constraints not in the active set, and where the direction d ^ k is towards the constraint boundary, i.e., A i d ^ k > 0 ,   i = 1 , ... , m .

When n independent constraints are included in the active set, without location of the minimum, Lagrange multipliers, λ k , are calculated that satisfy the nonsingular set of linear equations

If all elements of λ k are positive, x k is the optimal solution of QP ( Equation 18 ). However, if any component of λ k is negative, and the component does not correspond to an equality constraint, then the corresponding element is deleted from the active set and a new iterate is sought.

Initialization.   The algorithm requires a feasible point to start. If the current point from the SQP method is not feasible, then you can find a point by solving the linear programming problem

The notation A i indicates the i th row of the matrix A . You can find a feasible point (if one exists) to Equation 26 by setting x to a value that satisfies the equality constraints. You can determine this value by solving an under- or overdetermined set of linear equations formed from the set of equality constraints. If there is a solution to this problem, then the slack variable γ is set to the maximum inequality constraint at this point.

You can modify the preceding QP algorithm for LP problems by setting the search direction to the steepest descent direction at each iteration, where g k is the gradient of the objective function (equal to the coefficients of the linear objective function).

If a feasible point is found using the preceding LP method, the main QP phase is entered. The search direction d ^ k is initialized with a search direction d ^ 1 found from solving the set of linear equations

where g k is the gradient of the objective function at the current iterate x k (i.e., Hx k  +  c ).

If a feasible solution is not found for the QP problem, the direction of search for the main SQP routine d ^ k is taken as one that minimizes γ .

Line Search and Merit Function.   The solution to the QP subproblem produces a vector d k , which is used to form a new iterate

The step length parameter α k is determined in order to produce a sufficient decrease in a merit function. The merit function used by Han  [22] and Powell  [33] of the following form is used in this implementation.

Powell recommends setting the penalty parameter

This allows positive contribution from constraints that are inactive in the QP solution but were recently active. In this implementation, the penalty parameter r i is initially set to

where ‖   ‖ represents the Euclidean norm.

This ensures larger contributions to the penalty parameter from constraints with smaller gradients, which would be the case for active constraints at the solution point.

fmincon SQP Algorithm

The sqp algorithm (and nearly identical sqp-legacy algorithm) is similar to the active-set algorithm (for a description, see fmincon Active Set Algorithm ). The basic sqp algorithm is described in Chapter 18 of Nocedal and Wright [31] .

The sqp algorithm is essentially the same as the sqp-legacy algorithm, but has a different implementation. Usually, sqp has faster execution time and less memory usage than sqp-legacy .

The most important differences between the sqp and the active-set algorithms are:

Strict Feasibility With Respect to Bounds

The sqp algorithm takes every iterative step in the region constrained by bounds. Furthermore, finite difference steps also respect bounds. Bounds are not strict; a step can be exactly on a boundary. This strict feasibility can be beneficial when your objective function or nonlinear constraint functions are undefined or are complex outside the region constrained by bounds.

Robustness to Non-Double Results

During its iterations, the sqp algorithm can attempt to take a step that fails. This means an objective function or nonlinear constraint function you supply returns a value of Inf , NaN , or a complex value. In this case, the algorithm attempts to take a smaller step.

Refactored Linear Algebra Routines

The sqp algorithm uses a different set of linear algebra routines to solve the quadratic programming subproblem, Equation 14 . These routines are more efficient in both memory usage and speed than the active-set routines.

Reformulated Feasibility Routines

The sqp algorithm has two new approaches to the solution of Equation 14 when constraints are not satisfied.

The sqp algorithm combines the objective and constraint functions into a merit function. The algorithm attempts to minimize the merit function subject to relaxed constraints. This modified problem can lead to a feasible solution. However, this approach has more variables than the original problem, so the problem size in Equation 14 increases. The increased size can slow the solution of the subproblem. These routines are based on the articles by Spellucci [60] and Tone [61] . The sqp algorithm sets the penalty parameter for the merit function Equation 30 according to the suggestion in [41] .

Suppose nonlinear constraints are not satisfied, and an attempted step causes the constraint violation to grow. The sqp algorithm attempts to obtain feasibility using a second-order approximation to the constraints. The second-order technique can lead to a feasible solution. However, this technique can slow the solution by requiring more evaluations of the nonlinear constraint functions.

fmincon Interior Point Algorithm

Barrier function.

The approximate problem Equation 34 is a sequence of equality constrained problems. These are easier to solve than the original inequality-constrained problem Equation 33 .

To solve the approximate problem, the algorithm uses one of two main types of steps at each iteration:

A direct step in ( x , s ). This step attempts to solve the KKT equations, Equation 2 and Equation 3 , for the approximate problem via a linear approximation. This is also called a Newton step .

A CG (conjugate gradient) step, using a trust region.

By default, the algorithm first attempts to take a direct step. If it cannot, it attempts a CG step. One case where it does not take a direct step is when the approximate problem is not locally convex near the current iterate.

If either the objective function or a nonlinear constraint function returns a complex value, NaN, Inf, or an error at an iterate x j , the algorithm rejects x j . The rejection has the same effect as if the merit function did not decrease sufficiently: the algorithm then attempts a different, shorter step. Wrap any code that can error in try - catch :

The objective and constraints must yield proper ( double ) values at the initial point.

Direct Step

The following variables are used in defining the direct step:

J g denotes the Jacobian of the constraint function g .

J h denotes the Jacobian of the constraint function h .

S = diag( s ) .

λ denotes the Lagrange multiplier vector associated with constraints g

Λ = diag( λ ) .

y denotes the Lagrange multiplier vector associated with h .

e denote the vector of ones the same size as g .

Equation 38 defines the direct step (Δ x , Δ s ) :

This equation comes directly from attempting to solve Equation 2 and Equation 3 using a linearized Lagrangian.

You can symmetrize the equation by premultiplying the second variable Δs by S –1 :

In order to solve this equation for (Δ x , Δ s ) , the algorithm makes an LDL factorization of the matrix; see ldl . This is the most computationally expensive step. One result of this factorization is a determination of whether the projected Hessian is positive definite or not; if not, the algorithm uses a conjugate gradient step, described in Conjugate Gradient Step .

Update Barrier Parameter

For the approximate problem Equation 34 to approach the original problem, the barrier parameter μ needs to decrease toward 0 as the iterations proceed. The algorithm has two barrier parameter update options, which you specify using the BarrierParamUpdate option: 'monotone' (default) and 'predictor-corrector' .

The 'monotone' option decreases the parameter μ by a factor of 1/100 or 1/5 when the approximate problem is solved with sufficient accuracy in the previous iteration. The option uses a factor of 1/100 when the algorithm takes only one or two iterations to achieve sufficient accuracy, and uses 1/5 otherwise. The measure of accuracy is the following test, which determines if the size of all terms on the right side of Equation 38 is less than μ :

max ( ‖ ∇ f ( x ) + J h T y + J g T λ ‖ , ‖ S λ − μ e ‖ , ‖ h ‖ , ‖ g ( x ) + s ‖ ) < μ .

fmincon overrides the BarrierParamUpdate setting to 'monotone' in either of these cases:

The problem has no inequality constraints, including bound constraints.

The SubproblemAlgorithm option is 'cg' .

The 'predictor-corrector' algorithm for updating the barrier parameter μ is similar to the linear programming Predictor-Corrector algorithm.

Predictor-corrector steps can accelerate the existing Fiacco-McCormick (monotone) approach by adjusting for the linearization error in the Newton steps. The effects of the predictor-corrector algorithm are twofold: it often improves step directions and simultaneously updates the barrier parameter adaptively with the centering parameter σ to encourage iterates to follow the central path. See Nocedal and Wright’s [31] discussion of predictor-corrector steps for linear programs to understand why the central path allows larger step sizes and, consequently, faster convergence.

The predictor step uses the linearized step with μ = 0, meaning without a barrier function:

[ H 0 J h T J g T 0 Λ 0 S J h 0 0 0 J g I 0 0 ] [ Δ x Δ s Δ y Δ λ ] = − [ ∇ f + J h T y + J g T λ S λ h g + s ] .

Define ɑ s and ɑ λ to be the largest step sizes that do not violate the nonnegativity constraints.

α s = max ( α ∈ ( 0 , 1 ] : s + α Δ s ≥ 0 ) α λ = max ( α ∈ ( 0 , 1 ] : λ + α Δ λ ≥ 0 ) .

Now compute the complementarity from the predictor step.

where m is the number of constraints.

The first corrector step adjusts for the quadratic term neglected in the Newton root-finding linearization

( s + Δ s ) ( λ + ​ Δ λ ) = s λ + ​ s Δ λ + λ Δ s ︸ Linear term set to 0 + Δ s Δ λ ︸ Quadratic .

To correct the quadratic error, solve the linear system for the corrector step direction.

[ H 0 J h T J g T 0 Λ 0 S J h 0 0 0 J g I 0 0 ] [ Δ x cor Δ s cor Δ y cor Δ λ cor ] = − [ 0 Δ s Δ λ 0 0 ] .

The second corrector step is a centering step. The centering correction is based on the variable σ on the right side of the equation

[ H 0 J h T J g T 0 Λ 0 S J h 0 0 0 J g I 0 0 ] [ Δ x cen Δ s cen Δ y cen Δ λ cen ] = − [ ∇ f + J h T y + J g T λ S λ − μ e σ h g + s ] .

Here, σ is defined as

σ = ( μ P μ ) 3 ,

where μ P is defined in equation Equation 39 , and μ = s T λ / m .

To prevent the barrier parameter from decreasing too quickly, potentially destabilizing the algorithm, the algorithm keeps the centering parameter σ above 1/100. This action causes the barrier parameter μ to decrease by no more than a factor of 1/100.

Algorithmically, the first correction and centering steps are independent of each other, so they are computed together. Furthermore, the matrix on the left for the predictor and both corrector steps is the same. So, algorithmically, the matrix is factorized once, and this factorization is used for all these steps.

The algorithm can reject the proposed predictor-corrector step when the step increases the merit function value Equation 35 , increases the complementarity by at least a factor of two, or the computed inertia is incorrect (the problem looks nonconvex). In these cases, the algorithm attempts to take a different step or a conjugate gradient step.

Conjugate Gradient Step

The conjugate gradient approach to solving the approximate problem Equation 34 is similar to other conjugate gradient calculations. In this case, the algorithm adjusts both x and s , keeping the slacks s positive. The approach is to minimize a quadratic approximation to the approximate problem in a trust region, subject to linearized constraints.

Specifically, let R denote the radius of the trust region, and let other variables be defined as in Direct Step . The algorithm obtains Lagrange multipliers by approximately solving the KKT equations

∇ x L = ∇ x f ( x ) + ∑ i λ i ∇ g i ( x ) + ∑ j y j ∇ h j ( x ) = 0 ,

Feasibility Mode

When the EnableFeasibilityMode option is true and the iterations do not decrease the infeasibility quickly enough, the algorithm switches to feasibility mode. This switch happens after the algorithm fails to decrease the infeasibility in normal mode, and then fails again after switching to conjugate gradient mode. Therefore, for best performance when the solver fails to find a feasible solution without feasibility mode, set the SubproblemAlgorithm to 'cg' when using feasibility mode. Doing so avoids fruitless searching in normal mode.

The feasibility mode algorithm is based on Nocedal, Öztoprak, and Waltz [1] . The algorithm ignores the objective function and instead tries to minimize the infeasibility, defined as the sum of the positive parts of the inequality constraint functions and the absolute value of the equality constraint functions. In terms of the relaxation variables r = [ r I , r e + , r e − ] , which correspond to inequalities, positive parts of equalities, and negative parts of equalities, respectively, the problem is

min x ,   r 1 T r = min x ,   r ∑ r

subject to the constraints

r I ≥ c ( x ) r e + − r e − = c e q ( x ) r ≥ 0.

To solve the relaxed problem, the software uses an interior-point formulation with a logarithmic barrier function and the slacks s = [ s I , s R , s e + , s e − ] to minimize

min x , r , s 1 T r − μ ∑ ( log ( s I , i ) + log ( s R , i ) ) − ∑ ( log ( s e , i + ) + log ( s e , i − ) )

c e q ( x ) = r e + − r e − c ( x ) = r I − s I r e + = s e + r e − = s e − r I = s R s ≥ 0.

The solution process for the relaxed problem begins with μ initialized to the current barrier parameter value. The slack variable s I is initialized to the current inequality slack value, inherited from the main mode. The r variables are initialized to

r I = max ( s I + c ( x ) , μ ) r e + = max ( c e q ( x ) , μ ) r e − = − min ( c e q ( x ) , − μ ) .

The remaining slacks are initialized to

s R = r I s e + = r e + s e − = r e − .

Starting at this initial point, the feasibility mode algorithm reuses the code for the normal interior-point algorithm. This process requires special step computations because the r variables are linear and, therefore, their associated second derivatives are zero. In other words, the objective function Hessian for the feasibility problem is rank-deficient. Therefore, the algorithm cannot take a Newton step. Instead, the algorithm takes a steepest-descent direction step. The algorithm starts with the gradient of the objective with respect to the variables, projects the gradient onto the null space of the Jacobian of the constraints, and rescales the resulting vector so that it has an appropriate step length. This step can be effective at reducing the infeasibility.

The feasibility mode algorithm ends when it reduces the infeasibility by a factor of 10. When feasibility mode ends, the algorithm passes the variables x and s I to the main algorithm, and discards the other slack variables and relaxation variables r .

[1] Nocedal, Jorge, Figen Öztoprak, and Richard A. Waltz. An Interior Point Method for Nonlinear Programming with Infeasibility Detection Capabilities. Optimization Methods & Software 29(4), July 2014, pp. 837–854.

Interior-Point Algorithm Options

Here are the meanings and effects of several options in the interior-point algorithm.

HonorBounds — When set to true , every iterate satisfies the bound constraints you have set. When set to false , the algorithm may violate bounds during intermediate iterations.

HessianApproximation — When set to:

'bfgs' , fmincon calculates the Hessian by a dense quasi-Newton approximation.

'lbfgs' , fmincon calculates the Hessian by a limited-memory, large-scale quasi-Newton approximation.

'fin-diff-grads' , fmincon calculates a Hessian-times-vector product by finite differences of the gradient(s); other options need to be set appropriately.

HessianFcn — fmincon uses the function handle you specify in HessianFcn to compute the Hessian. See Including Hessians .

HessianMultiplyFcn — Give a separate function for Hessian-times-vector evaluation. For details, see Including Hessians and Hessian Multiply Function .

SubproblemAlgorithm — Determines whether or not to attempt the direct Newton step. The default setting 'factorization' allows this type of step to be attempted. The setting 'cg' allows only conjugate gradient steps.

For a complete list of options see Interior-Point Algorithm in fmincon options .

fminbnd Algorithm

fminbnd is a solver available in any MATLAB ® installation. It solves for a local minimum in one dimension within a bounded interval. It is not based on derivatives. Instead, it uses golden-section search and parabolic interpolation.

fseminf Problem Formulation and Algorithm

Fseminf problem formulation.

fseminf addresses optimization problems with additional types of constraints compared to those addressed by fmincon . The formulation of fmincon is

such that c ( x ) ≤ 0, ceq ( x ) = 0, A·x  ≤  b , Aeq·x  =  beq , and l  ≤  x  ≤  u .

fseminf adds the following set of semi-infinite constraints to those already given. For w j in a one- or two-dimensional bounded interval or rectangle I j , for a vector of continuous functions K ( x , w ) , the constraints are

K j ( x , w j ) ≤ 0 for all w j ∈ I j .

The term “dimension” of an fseminf problem means the maximal dimension of the constraint set I : 1 if all I j are intervals, and 2 if at least one I j is a rectangle. The size of the vector of K does not enter into this concept of dimension.

The reason this is called semi-infinite programming is that there are a finite number of variables ( x and w j ), but an infinite number of constraints. This is because the constraints on x are over a set of continuous intervals or rectangles I j , which contains an infinite number of points, so there are an infinite number of constraints: K j ( x , w j ) ≤ 0 for an infinite number of points w j .

You might think a problem with an infinite number of constraints is impossible to solve. fseminf addresses this by reformulating the problem to an equivalent one that has two stages: a maximization and a minimization. The semi-infinite constraints are reformulated as

where | K | is the number of components of the vector K ; i.e., the number of semi-infinite constraint functions. For fixed x , this is an ordinary maximization over bounded intervals or rectangles.

fseminf further simplifies the problem by making piecewise quadratic or cubic approximations κ j ( x , w j ) to the functions K j ( x ,  w j ) , for each x that the solver visits. fseminf considers only the maxima of the interpolation function κ j ( x , w j ) , instead of K j ( x ,  w j ) , in Equation 42 . This reduces the original problem, minimizing a semi-infinitely constrained function, to a problem with a finite number of constraints.

Sampling Points.   Your semi-infinite constraint function must provide a set of sampling points, points used in making the quadratic or cubic approximations. To accomplish this, it should contain:

The initial spacing s between sampling points w

A way of generating the set of sampling points w from s

The initial spacing s is a | K |-by-2 matrix. The j th row of s represents the spacing for neighboring sampling points for the constraint function K j . If K j depends on a one-dimensional w j , set s(j,2) = 0 . fseminf updates the matrix s in subsequent iterations.

fseminf uses the matrix s to generate the sampling points w , which it then uses to create the approximation κ j ( x , w j ) . Your procedure for generating w from s should keep the same intervals or rectangles I j during the optimization.

Example of Creating Sampling Points.   Consider a problem with two semi-infinite constraints, K 1 and K 2 . K 1 has one-dimensional w 1 , and K 2 has two-dimensional w 2 . The following code generates a sampling set from w 1  = 2 to 12:

fseminf specifies s as NaN when it first calls your constraint function. Checking for this allows you to set the initial sampling interval.

The following code generates a sampling set from w 2 in a square, with each component going from 1 to 100, initially sampled more often in the first component than the second:

The preceding code snippets can be simplified as follows:

fseminf Algorithm

fseminf essentially reduces the problem of semi-infinite programming to a problem addressed by fmincon . fseminf takes the following steps to solve semi-infinite programming problems:

At the current value of x , fseminf identifies all the w j,i such that the interpolation κ j ( x , w j,i ) is a local maximum. (The maximum refers to varying w for fixed x .)

fseminf takes one iteration step in the solution of the fmincon problem:

such that c ( x ) ≤ 0, ceq ( x ) = 0, A·x  ≤  b , Aeq·x  =  beq , and l  ≤  x  ≤  u , where c ( x ) is augmented with all the maxima of κ j ( x ,  w j ) taken over all w j ∈ I j , which is equal to the maxima over j and i of κ j ( x ,  w j,i ) .

fseminf checks if any stopping criterion is met at the new point x (to halt the iterations); if not, it continues to step 4.

fseminf checks if the discretization of the semi-infinite constraints needs updating, and updates the sampling points appropriately. This provides an updated approximation κ j ( x , w j ) . Then it continues at step 1.

MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

  • Switzerland (English)
  • Switzerland (Deutsch)
  • Switzerland (Français)
  • 中国 (English)

You can also select a web site from the following list:

How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

  • América Latina (Español)
  • Canada (English)
  • United States (English)
  • Belgium (English)
  • Denmark (English)
  • Deutschland (Deutsch)
  • España (Español)
  • Finland (English)
  • France (Français)
  • Ireland (English)
  • Italia (Italiano)
  • Luxembourg (English)
  • Netherlands (English)
  • Norway (English)
  • Österreich (Deutsch)
  • Portugal (English)
  • Sweden (English)
  • United Kingdom (English)

Asia Pacific

  • Australia (English)
  • India (English)
  • New Zealand (English)

Contact your local office

Note: This work is under development and has not yet been professionally edited. If you catch a typo or error, or just have a suggestion, please submit a note here . Thanks!

B.3 Constrained Optimization and the Lagrange Method

One of the core problems of economics is constrained optimization : that is, maximizing a function subject to some constraint.

We previously saw that the function \(y = f(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2\) has an unconstrained maximum at the point $(2,4)$. But what if we wanted to find the highest point along the path $2x_1 + x_2 = 5$? In this case, we would be looking for the highest point along the green curve below. In this case, the maximum occurs at the point $(1,3)$:

In a problem like this, we call the function we’re trying to maximize (or minimize) the objective function , and we call the equation describing the set of points along the path under consideration the constraint . We can then write this problem formally as \(\begin{aligned} \max_{x_1,x_2}\ & f(x_1,x_2) & \text{Objective function}\\ \text{s.t. } & g(x_1,x_2) = k & \text{Constraint} \end{aligned}\) where in this case \(\begin{aligned} f(x_1,x_2) &= 8x_1 - 2x_1^2 + 8x_2 - x_2^2\\ g(x_1,x_2) &= 2x_1 + x_2\\ k &= 5 \end{aligned}\) For a simple problem like this, we could just substitute the constraint into the objective function and solve. However, with multivariable functions of more than two variables, that won’t work. Instead, we’ll take a slightly different approach, and employ the method of Lagrange multipliers. This method effectively converts a constrained maximization problem into an unconstrained optimization problem, by creating a new functions that combines the objective function and the constraint. Specifically, we construct the “Lagrangian” as follows: \(\mathcal{L}(x_1,x_2,\lambda) = f(x_1,x_2) + \lambda (k - g(x_1,x_2))\) Note that this second term “punishes” the function for exceeding the constraint: if $g(x_1,x_2) > k$, this function is lower than it would be if $g(x_1,x_2) = k$. However, as long as $g(x_1,x_2) = k$, the two functions are identical.

The plot below illustrates how the Lagrangian affects the height of the function. It shows a horizontal plane at the maximum of the Lagrangian, and shows the gradient at the point (1,3). It starts with $\lambda = 0$; use the slider to increase $\lambda$, and notice what happens:

What’s going on here? It’s a little easier to see if we look at the gradient of the Lagrangian: \(\nabla \mathcal{L}(x_1,x_2,\lambda) = \left[\begin{matrix} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2}\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} \end{matrix}\right] = \left[\begin{matrix} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} \\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} \\ \\ k - g(x_1,x_2) \end{matrix}\right]\) If we set this gradient equal to a vector of zeroes, we get the first-order conditions (FOCs) : \(\begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0\\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & k - g(x_1,x_2) = 0 \end{aligned}\) When the first two conditions hold, it means we’re at the maximum of the Lagrangian function in $x_1 - x_2$ space. When the third condition holds, it means we’re also along the constraint.

The FOCs are three equations in three unknowns; so to find the optimal $(x_1,x_2)$, we solve this system of equations. For example, if we plug the Lagrangian for this case into the FOCs above, we get \(\begin{aligned} {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_1} = 0 & \Rightarrow & 8 - 4x_1 - 2\lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial x_2} = 0 & \Rightarrow & 8 - 2x_2 - \lambda = 0\\ \\ {\partial \mathcal{L}(x_1,x_2, \lambda) \over \partial \lambda} = 0 & \Rightarrow & 2x_1 + x_2 = 5 \end{aligned}\) To solve this, we can solve the first two conditions for $\lambda$ and set those two values of $\lambda$ equal to each other: \(\lambda = 4 - 2x_1 = 8 - 2x_2 \Rightarrow x_2 = 2 + x_1\) We can then plug that into the third condition to get \(2x_1 + [1 + x_1] = 5 \Rightarrow x_1^\star = 1\) therefore \(x_2 = 5 - 2x_1 = 3\) and \(\lambda = 4 - 2x_1 = 2\) Put another way, if we have $\lambda = 2$, the Lagrangian becomes \(\mathcal{L}(x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2 + 2(5 - 2x_1 - x_2) = 10 + 4x_1 - 2x_1^2 +6x_2 - x_2^2\) which has the gradient \(\nabla \mathcal{L}(x_1,x_2) = \left[\begin{matrix}4 - 4x_1 \\ 6 - 2x_2 \end{matrix}\right]\) which is flat (a vector of zeroes) at $(1,3)$.

The Lagrangian, Level Sets, and the Tangency Condition

If we look at this problem in two dimensions, we can notice that the optimum occurs at a point of tangency between the constraint and the level sets of the objective function. The following graph shows the constraint, as well as a few level sets of the objective function. The level set for $f(x_1,x_2) = 21$ is just tangent to the constraint at $(1,3)$:

In fact, we can see that this is another way of looking at the first two FOCs: if we solve each of the first FOCs for $\lambda$, as we did above, we get \(\begin{aligned} {\partial f(x_1,x_2) \over \partial x_1} - \lambda {\partial g(x_1,x_2) \over \partial x_1} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_1}\\ \\ {\partial f(x_1,x_2) \over \partial x_2} - \lambda {\partial g(x_1,x_2) \over \partial x_2} = 0 & \Rightarrow \lambda = {\partial f(x_1,x_2)/\partial x_2 \over \partial g(x_1,x_2)/\partial x_2} \end{aligned}\) Setting these equal to one another and cross multiplying gives us \({\partial f(x_1,x_2)/\partial x_1 \over \partial f(x_1,x_2)/\partial x_2} = {\partial g(x_1,x_2)/\partial x_1 \over \partial g(x_1,x_2)/\partial x_2}\) The left-hand side of this equation is the slope of the level set $f(x_1,x_2) = y$ for some $y$. The right-hand side of this equation is the slope of the level set $g(x_1,x_2) = k$, which is just the constraint. If this equation holds for some value $(x_1,x_2)$, therefore, the functions’ level sets are tangent at that point.

Summing up: for a constrained optimization problem with two choice variables, the method of Lagrange multipliers finds the point along the constraint where the level set of the objective function is tangent to the constraint.

Example: The Fence Problem

Consider the following problem: you are given 40 linear feet of fence and need to make a rectangular enclosure. What is the length and width that maximize the area of the enclosure?

Here, our objective function is the area $A(L,W) = LW$ and the constraint is that the perimeter be 40 feet: $2L + 2W \le 40$. More formally, we can write this problem as \(\max_{(L,W)} A(L,W) = LW\) \(\text{s.t. } 2L + 2W \le 40\) The following graph shows the constraint (the green line) as well as several levels sets of the objective function. If you drag the point along the constraint, you can see that the largest area occurs at a point where the level set is tangent to the constraint:

The Lagrangian for this problem is \(\mathcal{L}(L,W,\lambda) = LW + \lambda(40 - 2L - 2W)\) To find the optimal choice of $L$ and $W$, we take the partial derivatives with respect to the three arguments ($L$, $W$, and $\lambda$) and set them equal to zero to get our three first order conditions (FOCs): \(\begin{aligned} \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial L} &= W - 2\lambda = 0 &\Rightarrow& W = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial W} &= L - 2\lambda = 0 &\Rightarrow& L = 2\lambda \\ \frac{\partial \mathcal{L}(L,W,\lambda)}{\partial \lambda} &= 40 - 2L - 2W =0 &\Rightarrow& 2L + 2W = 40 \end{aligned}\) Solving the first two FOC’s for $\lambda$ and setting them equal to one another we get $W/2 = L/2$, or more simply $W = L$: that is, the optimal shape will be a square. Plugging $W = L$ into the constraint (i.e., the third FOC) and solving gives us $L = W = 10$, and $\lambda = W/2 = L/2 = 5$.

  • {{}}

Journal of Applied Analysis & Computation

Share the QR code with wechat scanning code to friends and circle of friends.


  • Jiahui Tang 1 ,  ,  , 
  • Yifan Xu 1 , 

School of Management, Fudan University, Shanghai, 200433, China

School of Mathematics, East China University of Science and Technology, Shanghai, 200237, China

  • Corresponding author: Email address: [email protected] (J. Tang) 
  • Fund Project: The authors were supported by National Natural Science Foundation of China(72072036)

An important method to solve constrained optimization problem is to approach the optimal solution of constrained optimization problem gradually by sequential unconstrained optimization method, namely penalty function method. And the filling function method is one of the effective methods to solve the global optimal problem. In this paper, a class of augmented Lagrangian objective filled penalty functions are defined to solve non-convex constraint optimization problems, the authors call it filled penalty function method. The theoretical properties of these functions, such as exactness, smoothness, global convergence, are discussed. On this basis, a local optimization algorithm and an approximate global optimization algorithm with corresponding examples are given for solving constrained optimization problems.

  • Filled penalty function method /
  • non-convex constrained optimization problems /
  • globally optimal point /
  • convergence

Tables( 5 )

Article Metrics

Article views( 1405 ) PDF downloads( 192 ) Cited by( 0 )

Access History

Other articles by authors.

  • Jiahui Tang

[ - ]On This Site

[ - ]On Google

Export File

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

4.7: Optimization Problems

  • Last updated
  • Save as PDF
  • Page ID 4467

Learning Objectives

  • Set up and solve optimization problems in several applied fields.

One common application of calculus is calculating the minimum or maximum value of a function. For example, companies often want to minimize production costs or maximize revenue. In manufacturing, it is often desirable to minimize the amount of material used to package a product with a certain volume. In this section, we show how to set up these types of minimization and maximization problems and solve them by using the tools developed in this chapter.

Solving Optimization Problems over a Closed, Bounded Interval

The basic idea of the optimization problems that follow is the same. We have a particular quantity that we are interested in maximizing or minimizing. However, we also have some auxiliary condition that needs to be satisfied. For example, in Example \(\PageIndex{1}\), we are interested in maximizing the area of a rectangular garden. Certainly, if we keep making the side lengths of the garden larger, the area will continue to become larger. However, what if we have some restriction on how much fencing we can use for the perimeter? In this case, we cannot make the garden as large as we like. Let’s look at how we can maximize the area of a rectangle subject to some constraint on the perimeter.

Example \(\PageIndex{1}\): Maximizing the Area of a Garden

A rectangular garden is to be constructed using a rock wall as one side of the garden and wire fencing for the other three sides (Figure \(\PageIndex{1}\)). Given \(100\,\text{ft}\) of wire fencing, determine the dimensions that would create a garden of maximum area. What is the maximum area?

A drawing of a garden has x and y written on the vertical and horizontal sides, respectively. There is a rock wall running along the entire bottom horizontal length of the drawing.

Let \(x\) denote the length of the side of the garden perpendicular to the rock wall and \(y\) denote the length of the side parallel to the rock wall. Then the area of the garden is


We want to find the maximum possible area subject to the constraint that the total fencing is \(100\,\text{ft}\). From Figure \(\PageIndex{1}\), the total amount of fencing used will be \(2x+y.\) Therefore, the constraint equation is


Solving this equation for \(y\), we have \(y=100−2x.\) Thus, we can write the area as


Before trying to maximize the area function \(A(x)=100x−2x^2,\) we need to determine the domain under consideration. To construct a rectangular garden, we certainly need the lengths of both sides to be positive. Therefore, we need \(x>0\) and \(y>0\). Since \(y=100−2x\), if \(y>0\), then \(x<50\). Therefore, we are trying to determine the maximum value of \(A(x)\) for \(x\) over the open interval \((0,50)\). We do not know that a function necessarily has a maximum value over an open interval. However, we do know that a continuous function has an absolute maximum (and absolute minimum) over a closed interval. Therefore, let’s consider the function \(A(x)=100x−2x^2\) over the closed interval \([0,50]\). If the maximum value occurs at an interior point, then we have found the value \(x\) in the open interval \((0,50)\) that maximizes the area of the garden.

Therefore, we consider the following problem:

Maximize \(A(x)=100x−2x^2\) over the interval \([0,50].\)

As mentioned earlier, since \(A\) is a continuous function on a closed, bounded interval, by the extreme value theorem, it has a maximum and a minimum. These extreme values occur either at endpoints or critical points. At the endpoints, \(A(x)=0\). Since the area is positive for all \(x\) in the open interval \((0,50)\), the maximum must occur at a critical point. Differentiating the function \(A(x)\), we obtain


Therefore, the only critical point is \(x=25\) (Figure \(\PageIndex{2}\)). We conclude that the maximum area must occur when \(x=25\).

The function A(x) = 100x – 2x is graphed. At its maximum there is an intersection of two dashed lines and text that reads “Maximum area is 1250 square feet when x = 25 feet.”

Then we have \(y=100−2x=100−2(25)=50.\) To maximize the area of the garden, let \(x=25\,\text{ft}\) and \(y=50\,\text{ft}\). The area of this garden is \(1250\, \text{ft}^2\).

Exercise \(\PageIndex{1}\)

Determine the maximum area if we want to make the same rectangular garden as in Figure \(\PageIndex{2}\), but we have \(200\,\text{ft}\) of fencing.

We need to maximize the function \(A(x)=200x−2x^2\) over the interval \([0,100].\)

The maximum area is \(5000\, \text{ft}^2\).

Now let’s look at a general strategy for solving optimization problems similar to Example \(\PageIndex{1}\).

Problem-Solving Strategy: Solving Optimization Problems

  • Introduce all variables. If applicable, draw a figure and label all variables.
  • Determine which quantity is to be maximized or minimized, and for what range of values of the other variables (if this can be determined at this time).
  • Write a formula for the quantity to be maximized or minimized in terms of the variables. This formula may involve more than one variable.
  • Write any equations relating the independent variables in the formula from step \(3\). Use these equations to write the quantity to be maximized or minimized as a function of one variable.
  • Identify the domain of consideration for the function in step \(4\) based on the physical problem to be solved.
  • Locate the maximum or minimum value of the function from step \(4.\) This step typically involves looking for critical points and evaluating a function at endpoints.

Now let’s apply this strategy to maximize the volume of an open-top box given a constraint on the amount of material to be used.

Example \(\PageIndex{2}\): Maximizing the Volume of a Box

An open-top box is to be made from a \(24\,\text{in.}\) by \(36\,\text{in.}\) piece of cardboard by removing a square from each corner of the box and folding up the flaps on each side. What size square should be cut out of each corner to get a box with the maximum volume?

Step 1: Let \(x\) be the side length of the square to be removed from each corner (Figure \(\PageIndex{3}\)). Then, the remaining four flaps can be folded up to form an open-top box. Let \(V\) be the volume of the resulting box.

There are two figures for this figure. The first one is a rectangle with sides 24 in and 36 in, with each corner having a square of side length x taken out of it. In the second picture, there is a box with side lengths x in, 24 – 2x in, and 36 – 2x in.

Step 2: We are trying to maximize the volume of a box. Therefore, the problem is to maximize \(V\).

Step 3: As mentioned in step 2, are trying to maximize the volume of a box. The volume of a box is

\[V=L⋅W⋅H \nonumber, \nonumber \]

where \(L,\,W,\)and \(H\) are the length, width, and height, respectively.

Step 4: From Figure \(\PageIndex{3}\), we see that the height of the box is \(x\) inches, the length is \(36−2x\) inches, and the width is \(24−2x\) inches. Therefore, the volume of the box is

\[ \begin{align*} V(x) &=(36−2x)(24−2x)x \\[4pt] &=4x^3−120x^2+864x \end{align*}. \nonumber \]

Step 5: To determine the domain of consideration, let’s examine Figure \(\PageIndex{3}\). Certainly, we need \(x>0.\) Furthermore, the side length of the square cannot be greater than or equal to half the length of the shorter side, \(24\,\text{in.}\); otherwise, one of the flaps would be completely cut off. Therefore, we are trying to determine whether there is a maximum volume of the box for \(x\) over the open interval \((0,12).\) Since \(V\) is a continuous function over the closed interval \([0,12]\), we know \(V\) will have an absolute maximum over the closed interval. Therefore, we consider \(V\) over the closed interval \([0,12]\) and check whether the absolute maximum occurs at an interior point.

Step 6: Since \(V(x)\) is a continuous function over the closed, bounded interval \([0,12]\), \(V\) must have an absolute maximum (and an absolute minimum). Since \(V(x)=0\) at the endpoints and \(V(x)>0\) for \(0<x<12,\) the maximum must occur at a critical point. The derivative is


To find the critical points, we need to solve the equation


Dividing both sides of this equation by \(12\), the problem simplifies to solving the equation


Using the quadratic formula, we find that the critical points are

\[\begin{align*} x &=\dfrac{20±\sqrt{(−20)^2−4(1)(72)}}{2} \\[4pt] &=\dfrac{20±\sqrt{112}}{2} \\[4pt] &=\dfrac{20±4\sqrt{7}}{2} \\[4pt] &=10±2\sqrt{7} \end{align*}. \nonumber \]

Since \(10+2\sqrt{7}\) is not in the domain of consideration, the only critical point we need to consider is \(10−2\sqrt{7}\). Therefore, the volume is maximized if we let \(x=10−2\sqrt{7}\,\text{in.}\) The maximum volume is

\[V(10−2\sqrt{7})=640+448\sqrt{7}≈1825\,\text{in}^3. \nonumber \]

as shown in the following graph.

The function V(x) = 4x3 – 120x2 + 864x is graphed. At its maximum there is an intersection of two dashed lines and text that reads “Maximum volume is approximately 1825 cubic inches when x ≈ 4.7 inches.”

Exercise \(\PageIndex{2}\)

Suppose the dimensions of the cardboard in Example \(\PageIndex{2}\) are \(20\,\text{in.}\) by \(30\,\text{in.}\) Let \(x\) be the side length of each square and write the volume of the open-top box as a function of \(x\). Determine the domain of consideration for \(x\).

The volume of the box is \(L⋅W⋅H.\)

\(V(x)=x(20−2x)(30−2x).\) The domain is \([0,10]\).

Example \(\PageIndex{3}\): Minimizing Travel Time

An island is \(2\) mi due north of its closest point along a straight shoreline. A visitor is staying at a cabin on the shore that is \(6\) mi west of that point. The visitor is planning to go from the cabin to the island. Suppose the visitor runs at a rate of \(8\) mph and swims at a rate of \(3\) mph. How far should the visitor run before swimming to minimize the time it takes to reach the island?

Step 1: Let \(x\) be the distance running and let \(y\) be the distance swimming (Figure \(\PageIndex{5}\)). Let \(T\) be the time it takes to get from the cabin to the island.

The cabin is x miles from the shore. From that point on the shore, the island is y miles away. If you were to continue the line from the cabin to the shore (the x miles one) and if you were to draw a line from the island parallel to the shore, then the lines would extend 2 miles from the island and 6 miles from the cabin before intersecting.

Step 2: The problem is to minimize \(T\).

Step 3: To find the time spent traveling from the cabin to the island, add the time spent running and the time spent swimming. Since Distance = Rate × Time \((D=R×T),\) the time spent running is


and the time spent swimming is


Therefore, the total time spent traveling is


Step 4: From Figure \(\PageIndex{5}\), the line segment of \(y\) miles forms the hypotenuse of a right triangle with legs of length \(2\) mi and \(6−x\) mi. Therefore, by the Pythagorean theorem, \(2^2+(6−x)^2=y^2\), and we obtain \(y=\sqrt{(6−x)^2+4}\). Thus, the total time spent traveling is given by the function


Step 5: From Figure \(\PageIndex{5}\), we see that \(0≤x≤6\). Therefore, \([0,6]\) is the domain of consideration.

Step 6: Since \(T(x)\) is a continuous function over a closed, bounded interval, it has a maximum and a minimum. Let’s begin by looking for any critical points of \(T\) over the interval \([0,6].\) The derivative is

\[\begin{align*} T′(x) &=\dfrac{1}{8}−\dfrac{1}{2}\dfrac{[(6−x)^2+4]^{−1/2}}{3}⋅2(6−x) \\[4pt] &=\dfrac{1}{8}−\dfrac{(6−x)}{3\sqrt{(6−x)^2+4}} \end{align*}\]

If \(T′(x)=0,\), then

\[\dfrac{1}{8}=\dfrac{6−x}{3\sqrt{(6−x)^2+4}} \label{ex3eq1} \]

\[3\sqrt{(6−x)^2+4}=8(6−x). \label{ex3eq2} \]

Squaring both sides of this equation, we see that if \(x\) satisfies this equation, then \(x\) must satisfy

\[9[(6−x)^2+4]=64(6−x)^2,\nonumber \]

which implies

\[55(6−x)^2=36. \nonumber \]

We conclude that if \(x\) is a critical point, then \(x\) satisfies

\[(x−6)^2=\dfrac{36}{55}. \nonumber \]

[Note that since we are squaring, \( (x-6)^2 = (6-x)^2.\)]

Therefore, the possibilities for critical points are

\[x=6±\dfrac{6}{\sqrt{55}}.\nonumber \]

Since \(x=6+6/\sqrt{55}\) is not in the domain, it is not a possibility for a critical point. On the other hand, \(x=6−6/\sqrt{55}\) is in the domain. Since we squared both sides of Equation \ref{ex3eq2} to arrive at the possible critical points, it remains to verify that \(x=6−6/\sqrt{55}\) satisfies Equation \ref{ex3eq1}. Since \(x=6−6/\sqrt{55}\) does satisfy that equation, we conclude that \(x=6−6/\sqrt{55}\) is a critical point, and it is the only one. To justify that the time is minimized for this value of \(x\), we just need to check the values of \(T(x)\) at the endpoints \(x=0\) and \(x=6\), and compare them with the value of \(T(x)\) at the critical point \(x=6−6/\sqrt{55}\). We find that \(T(0)≈2.108\,\text{h}\) and \(T(6)≈1.417\,\text{h}\), whereas

\[T(6−6/\sqrt{55})≈1.368\,\text{h}. \nonumber \]

Therefore, we conclude that \(T\) has a local minimum at \(x≈5.19\) mi.

Exercise \(\PageIndex{3}\)

Suppose the island is \(1\) mi from shore, and the distance from the cabin to the point on the shore closest to the island is \(15\) mi. Suppose a visitor swims at the rate of \(2.5\) mph and runs at a rate of \(6\) mph. Let \(x\) denote the distance the visitor will run before swimming, and find a function for the time it takes the visitor to get from the cabin to the island.

The time \(T=T_{running}+T_{swimming}.\)

\(T(x)=\dfrac{x}{6}+\dfrac{\sqrt{(15−x)^2+1}}{2.5} \)

In business, companies are interested in maximizing revenue. In the following example, we consider a scenario in which a company has collected data on how many cars it is able to lease, depending on the price it charges its customers to rent a car. Let’s use these data to determine the price the company should charge to maximize the amount of money it brings in.

Example \(\PageIndex{4}\): Maximizing Revenue

Owners of a car rental company have determined that if they charge customers \(p\) dollars per day to rent a car, where \(50≤p≤200\), the number of cars \(n\) they rent per day can be modeled by the linear function \(n(p)=1000−5p\). If they charge \($50\) per day or less, they will rent all their cars. If they charge \($200\) per day or more, they will not rent any cars. Assuming the owners plan to charge customers between \($50\) per day and \($200\) per day to rent a car, how much should they charge to maximize their revenue?

Step 1: Let \(p\) be the price charged per car per day and let \(n\) be the number of cars rented per day. Let \(R\) be the revenue per day.

Step 2: The problem is to maximize \(R.\)

Step 3: The revenue (per day) is equal to the number of cars rented per day times the price charged per car per day—that is, \(R=n×p.\)

Step 4: Since the number of cars rented per day is modeled by the linear function \(n(p)=1000−5p,\) the revenue \(R\) can be represented by the function

\[ \begin{align*} R(p) &=n×p \\[4pt] &=(1000−5p)p \\[4pt] &=−5p^2+1000p.\end{align*}\]

Step 5: Since the owners plan to charge between \($50\) per car per day and \($200\) per car per day, the problem is to find the maximum revenue \(R(p)\) for \(p\) in the closed interval \([50,200]\).

Step 6: Since \(R\) is a continuous function over the closed, bounded interval \([50,200]\), it has an absolute maximum (and an absolute minimum) in that interval. To find the maximum value, look for critical points. The derivative is \(R′(p)=−10p+1000.\) Therefore, the critical point is \(p=100\). When \(p=100, R(100)=$50,000.\) When \(p=50, R(p)=$37,500\). When \(p=200, R(p)=$0\).

Therefore, the absolute maximum occurs at \(p=$100\). The car rental company should charge \($100\) per day per car to maximize revenue as shown in the following figure.

The function R(p) is graphed. At its maximum there is an intersection of two dashed lines and text that reads “Maximum revenue is $50,000 per day when the price charged per car is $100 per day.”

Exercise \(\PageIndex{4}\)

A car rental company charges its customers \(p\) dollars per day, where \(60≤p≤150\). It has found that the number of cars rented per day can be modeled by the linear function \(n(p)=750−5p.\) How much should the company charge each customer to maximize revenue?

\(R(p)=n×p,\) where \(n\) is the number of cars rented and \(p\) is the price charged per car.

The company should charge \($75\) per car per day.

Example \(\PageIndex{5}\): Maximizing the Area of an Inscribed Rectangle

A rectangle is to be inscribed in the ellipse

\[\dfrac{x^2}{4}+y^2=1. \nonumber \]

What should the dimensions of the rectangle be to maximize its area? What is the maximum area?

Step 1: For a rectangle to be inscribed in the ellipse, the sides of the rectangle must be parallel to the axes. Let \(L\) be the length of the rectangle and \(W\) be its width. Let \(A\) be the area of the rectangle.

The ellipse x2/4 + y2 = 1 is drawn with its x intercepts being ±2 and its y intercepts being ±1. There is a rectangle inscribed in the ellipse with length L (in the x-direction) and width W.

Step 2: The problem is to maximize \(A\).

Step 3: The area of the rectangle is \(A=LW.\)

Step 4: Let \((x,y)\) be the corner of the rectangle that lies in the first quadrant, as shown in Figure \(\PageIndex{7}\). We can write length \(L=2x\) and width \(W=2y\). Since \(\dfrac{x^2}{4}+y^2=1\) and \(y>0\), we have \(y=\sqrt{1-\dfrac{x^2}{4}}\). Therefore, the area is


Step 5: From Figure \(\PageIndex{7}\), we see that to inscribe a rectangle in the ellipse, the \(x\)-coordinate of the corner in the first quadrant must satisfy \(0<x<2\). Therefore, the problem reduces to looking for the maximum value of \(A(x)\) over the open interval \((0,2)\). Since \(A(x)\) will have an absolute maximum (and absolute minimum) over the closed interval \([0,2]\), we consider \(A(x)=2x\sqrt{4−x^2}\) over the interval \([0,2]\). If the absolute maximum occurs at an interior point, then we have found an absolute maximum in the open interval.

Step 6: As mentioned earlier, \(A(x)\) is a continuous function over the closed, bounded interval \([0,2]\). Therefore, it has an absolute maximum (and absolute minimum). At the endpoints \(x=0\) and \(x=2\), \(A(x)=0.\) For \(0<x<2\), \(A(x)>0\).

Therefore, the maximum must occur at a critical point. Taking the derivative of \(A(x)\), we obtain

\[ \begin{align*} A'(x) &=2\sqrt{4−x^2}+2x⋅\dfrac{1}{2\sqrt{4−x^2}}(−2x) \\[4pt] &=2\sqrt{4−x^2}−\dfrac{2x^2}{\sqrt{4−x^2}} \\[4pt] &=\dfrac{8−4x^2}{\sqrt{4−x^2}} . \end{align*}\]

To find critical points, we need to find where \(A'(x)=0.\) We can see that if \(x\) is a solution of

\[\dfrac{8−4x^2}{\sqrt{4−x^2}}=0, \label{ex5eq1} \]

then \(x\) must satisfy

\[8−4x^2=0. \nonumber \]

Therefore, \(x^2=2.\) Thus, \(x=±\sqrt{2}\) are the possible solutions of Equation \ref{ex5eq1}. Since we are considering \(x\) over the interval \([0,2]\), \(x=\sqrt{2}\) is a possibility for a critical point, but \(x=−\sqrt{2}\) is not. Therefore, we check whether \(\sqrt{2}\) is a solution of Equation \ref{ex5eq1}. Since \(x=\sqrt{2}\) is a solution of Equation \ref{ex5eq1}, we conclude that \(\sqrt{2}\) is the only critical point of \(A(x)\) in the interval \([0,2]\).

Therefore, \(A(x)\) must have an absolute maximum at the critical point \(x=\sqrt{2}\). To determine the dimensions of the rectangle, we need to find the length \(L\) and the width \(W\). If \(x=\sqrt{2}\) then

\[y=\sqrt{1−\dfrac{(\sqrt{2})^2}{4}}=\sqrt{1−\dfrac{1}{2}}=\dfrac{1}{\sqrt{2}}.\nonumber \]

Therefore, the dimensions of the rectangle are \(L=2x=2\sqrt{2}\) and \(W=2y=\dfrac{2}{\sqrt{2}}=\sqrt{2}\). The area of this rectangle is \( A=LW=(2\sqrt{2})(\sqrt{2})=4.\)

Exercise \(\PageIndex{5}\)

Modify the area function \(A\) if the rectangle is to be inscribed in the unit circle \(x^2+y^2=1\). What is the domain of consideration?

If \((x,y)\) is the vertex of the square that lies in the first quadrant, then the area of the square is \(A=(2x)(2y)=4xy.\)

\(A(x)=4x\sqrt{1−x^2}.\) The domain of consideration is \([0,1]\).

Solving Optimization Problems when the Interval Is Not Closed or Is Unbounded

In the previous examples, we considered functions on closed, bounded domains. Consequently, by the extreme value theorem, we were guaranteed that the functions had absolute extrema. Let’s now consider functions for which the domain is neither closed nor bounded.

Many functions still have at least one absolute extrema, even if the domain is not closed or the domain is unbounded. For example, the function \(f(x)=x^2+4\) over \((−∞,∞)\) has an absolute minimum of \(4\) at \(x=0\). Therefore, we can still consider functions over unbounded domains or open intervals and determine whether they have any absolute extrema. In the next example, we try to minimize a function over an unbounded domain. We will see that, although the domain of consideration is \((0,∞),\) the function has an absolute minimum.

In the following example, we look at constructing a box of least surface area with a prescribed volume. It is not difficult to show that for a closed-top box, by symmetry, among all boxes with a specified volume, a cube will have the smallest surface area. Consequently, we consider the modified problem of determining which open-topped box with a specified volume has the smallest surface area.

Example \(\PageIndex{6}\): Minimizing Surface Area

A rectangular box with a square base, an open top, and a volume of \(216 \,\text{in}^3\) is to be constructed. What should the dimensions of the box be to minimize the surface area of the box? What is the minimum surface area?

Step 1: Draw a rectangular box and introduce the variable \(x\) to represent the length of each side of the square base; let \(y\) represent the height of the box. Let \(S\) denote the surface area of the open-top box.

A box with square base is shown. The base has side length x, and the height is y.

Step 2: We need to minimize the surface area. Therefore, we need to minimize \(S\).

Step 3: Since the box has an open top, we need only determine the area of the four vertical sides and the base. The area of each of the four vertical sides is \(x⋅y.\) The area of the base is \(x^2\). Therefore, the surface area of the box is


Step 4: Since the volume of this box is \(x^2y\) and the volume is given as \(216\,\text{in}^3\), the constraint equation is


Solving the constraint equation for \(y\), we have \(y=\dfrac{216}{x^2}\). Therefore, we can write the surface area as a function of \(x\) only:

\[S(x)=4x\left(\dfrac{216}{x^2}\right)+x^2.\nonumber \]

Therefore, \(S(x)=\dfrac{864}{x}+x^2\).

Step 5: Since we are requiring that \(x^2y=216\), we cannot have \(x=0\). Therefore, we need \(x>0\). On the other hand, \(x\) is allowed to have any positive value. Note that as \(x\) becomes large, the height of the box \(y\) becomes correspondingly small so that \(x^2y=216\). Similarly, as \(x\) becomes small, the height of the box becomes correspondingly large. We conclude that the domain is the open, unbounded interval \((0,∞)\). Note that, unlike the previous examples, we cannot reduce our problem to looking for an absolute maximum or absolute minimum over a closed, bounded interval. However, in the next step, we discover why this function must have an absolute minimum over the interval \((0,∞).\)

Step 6: Note that as \(x→0^+,\, S(x)→∞.\) Also, as \(x→∞, \,S(x)→∞\). Since \(S\) is a continuous function that approaches infinity at the ends, it must have an absolute minimum at some \(x∈(0,∞)\). This minimum must occur at a critical point of \(S\). The derivative is

\[S′(x)=−\dfrac{864}{x^2}+2x.\nonumber \]

Therefore, \(S′(x)=0\) when \(2x=\dfrac{864}{x^2}\). Solving this equation for \(x\), we obtain \(x^3=432\), so \(x=\sqrt[3]{432}=6\sqrt[3]{2}.\) Since this is the only critical point of \(S\), the absolute minimum must occur at \(x=6\sqrt[3]{2}\) (see Figure \(\PageIndex{9}\)).

When \(x=6\sqrt[3]{2}\), \(y=\dfrac{216}{(6\sqrt[3]{2})^2}=3\sqrt[3]{2}\,\text{in.}\) Therefore, the dimensions of the box should be \(x=6\sqrt[3]{2}\,\text{in.}\) and \(y=3\sqrt[3]{2}\,\text{in.}\) With these dimensions, the surface area is

\[S(6\sqrt[3]{2})=\dfrac{864}{6\sqrt[3]{2}}+(6\sqrt[3]{2})^2=108\sqrt[3]{4}\,\text{in}^2\nonumber \]

The function S(x) = 864/x + x2 is graphed. At its minimum there is a dashed line and text that reads “Minimum surface area is 108 times the cube root of 4 square inches when x = 6 times the cube root of 2 inches.”

Exercise \(\PageIndex{6}\)

Consider the same open-top box, which is to have volume \(216\,\text{in}^3\). Suppose the cost of the material for the base is \(20¢/\text{in}^2\) and the cost of the material for the sides is \(30¢/\text{in}^2\) and we are trying to minimize the cost of this box. Write the cost as a function of the side lengths of the base. (Let \(x\) be the side length of the base and \(y\) be the height of the box.)

If the cost of one of the sides is \(30¢/\text{in}^2,\) the cost of that side is \(0.30xy\) dollars.

\(c(x)=\dfrac{259.2}{x}+0.2x^2\) dollars

Key Concepts

  • To solve an optimization problem, begin by drawing a picture and introducing variables.
  • Find an equation relating the variables.
  • Find a function of one variable to describe the quantity that is to be minimized or maximized.
  • Look for critical points to locate local extrema.


Wolfram|Alpha Widgets Overview Tour Gallery Sign In

Share this page.

  • StumbleUpon
  • Google Buzz

Output Type

Output width, output height.

To embed this widget in a post, install the Wolfram|Alpha Widget Shortcode Plugin and copy and paste the shortcode above into the HTML source.

To embed a widget in your blog's sidebar, install the Wolfram|Alpha Widget Sidebar Plugin , and copy and paste the Widget ID below into the "id" field:

Save to My Widgets

Build a new widget.

We appreciate your interest in Wolfram|Alpha and will be in touch soon.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains * and * are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Multivariable calculus

Course: multivariable calculus   >   unit 3.

  • Constrained optimization introduction
  • Lagrange multipliers, using tangency to solve constrained optimization
  • Finishing the intro lagrange multiplier example
  • Lagrange multiplier example, part 1
  • Lagrange multiplier example, part 2

The Lagrangian

  • Meaning of the Lagrange multiplier
  • Proof for the meaning of Lagrange multipliers

Want to join the conversation?

  • Upvote Button navigates to signup page
  • Downvote Button navigates to signup page
  • Flag Button navigates to signup page

Video transcript

Book cover

International Fuzzy Systems Association World Congress

IFSA 2007: Foundations of Fuzzy Logic and Soft Computing pp 789–798 Cite as

Artificial Bee Colony (ABC) Optimization Algorithm for Solving Constrained Optimization Problems

  • Dervis Karaboga 1 &
  • Bahriye Basturk 1  
  • Conference paper

4954 Accesses

552 Citations

10 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4529))

This paper presents the comparison results on the performance of the Artificial Bee Colony (ABC) algorithm for constrained optimization problems. The ABC algorithm has been firstly proposed for unconstrained optimization problems and showed that it has superior performance on these kind of problems. In this paper, the ABC algorithm has been extended for solving constrained optimization problems and applied to a set of constrained problems .

  • Particle Swarm Optimization
  • Constrain Optimization Problem
  • Objective Function Evaluation
  • Nectar Amount
  • Optimal Particle Swarm Optimization

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview.  Download preview PDF.

Parsopoulos, K.E., Vrahatis, M.N.: Particle Swarm Optimization Method for Constrained Optimization Problems. In: Intelligent Technologies - Theory and Applications: New Trends in Intelligent Technologies, pp. 214–220. IOS Press, Amsterdam (2002)

Google Scholar  

Hedar, A.R., Fukushima, M.: Derivative-Free Filter Simulated Annealing Method for Constrained Continuous Global Optimization

Michalewicz, Z., Schoenauer, M.: Evolutionary Algorithms for Constrained Parameter Optimization Problems. Evolutionary Computation 4(1), 1–32 (1995)

Article   Google Scholar  

Floudas, C.A., Pardalos, P.M. (eds.): A Collection of Test Problems for Constrained Global Optimization Algorithms. LNCS, vol. 455. Springer, Heidelberg (1990)

MATH   Google Scholar  

Himmelblau, D.M.: Applied Nonlinear Programming. McGraw-Hill, New York (1972)

Joines, J.A., Houck, C.R.: On the use of non-stationary penalty functions to solve nonlinear constrained optimization problems with gas. In: Proc. IEEE Int. Conf. Evol. Comp., pp. 579–585 (1994)

Hu, X., Eberhart, R.C.: Solving constrained nonlinear optimization problems with particle swarm optimization. In: Proceedings of the Sixth World Multiconference on Systemics, Cybernetics and Informatics (2002)

Hu, X., Eberhart, R.C., Shi, Y.H.: Engineering optimization with particle swarm. In: IEEE Swarm Intelligence Symposium, pp. 53–57 (2003)

Parsopoulos, K.E., Vrahatis, M.N.: Unified Particle Swarm Optimization for Solving Constrained Engineering Optimization Problems. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3612, pp. 582–591. Springer, Heidelberg (2005)

Zavala, A.E.M., Aguirre, A.H., Diharce, E.R.V.: Constrained optimization via particle evolutionary swarm optimization algorithm (PESO). In: Proceedings of the 2005 conference on Genetic and evolutionary computation (GECCO’05), pp. 209–216 (2005)

Karaboga, D.: An Idea Based On Honey Bee Swarm For Numerical Optimization. Technical Report-TR06, Erciyes University, Engineering Faculty, Computer Engineering Department (2005)

Basturk, B., Karaboga, D.: An Artificial Bee Colony (ABC) Algorithm for Numeric function Optimization. In: IEEE Swarm Intelligence Symposium 2006, Indianapolis, Indiana, USA, May 12-14 (2006)

Goldberg, D.E., Deb, K.: A comparison of selection schemes used in genetic algorithms. In: Rawlins, G.J.E. (ed.) Foundations of Genetic Algorithms, pp. 69–93 (1991)

Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11, 341–359 (1997)

Article   MATH   MathSciNet   Google Scholar  

Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: 1995 IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)

Corne, D., Dorigo, M., Glover, F. (eds.): New Ideas in Optimization. McGraw-Hill, New York (1999)

Download references

Author information

Authors and affiliations.

Erciyes University, Engineering Faculty, The Department of Computer Engineering,  

Dervis Karaboga & Bahriye Basturk

You can also search for this author in PubMed   Google Scholar

Editor information

Rights and permissions.

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper.

Karaboga, D., Basturk, B. (2007). Artificial Bee Colony (ABC) Optimization Algorithm for Solving Constrained Optimization Problems. In: Melin, P., Castillo, O., Aguilar, L.T., Kacprzyk, J., Pedrycz, W. (eds) Foundations of Fuzzy Logic and Soft Computing. IFSA 2007. Lecture Notes in Computer Science(), vol 4529. Springer, Berlin, Heidelberg.

Download citation


Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-540-72917-4

Online ISBN : 978-3-540-72950-1

eBook Packages : Computer Science Computer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Help | Advanced Search

Mathematics > Optimization and Control

Title: a data-driven approach to pde-constrained optimization in inverse problems.

Abstract: Inverse problems are ubiquitous in science and engineering. Many of these are naturally formulated as a PDE-constrained optimization problem. These non-linear, large-scale, constrained optimization problems know many challenges, of which the inherent non-linearity of the problem is an important one. As an alternative to this physics-driven approach, data-driven methods have been proposed. These methods come with their own set of challenges, and it appears that, ideally, one would devise hybrid methods that combine the best of both worlds. In this paper, we propose one way of combining PDE-constrained optimization with recently proposed data-driven reduced-order models. Starting from an infinite-dimensional formulation of the inverse problem with discrete data, we propose a general framework for the analysis and discretisation of such problems. The proposed approach is based on a relaxed formulation of the PDE-constrained optimization problem, which reduces to a weighted non-linear least-squares problem. The weight matrix turns out to be the Gram matrix of solutions of the PDE, and it can be estimated directly from the measurements. We provide a number of representative case studies and numerical examples.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems

Part of Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Main Conference Track

Yanchen Deng, Shufeng Kong, Caihua Liu, Bo An

Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs). It has been shown that BP can achieve state-of-the-art performance on various benchmarks by mixing old and new messages before sending the new one, i.e., damping. However, existing methods on tuning a static damping factor for BP not only is laborious but also harms their performance. Moreover, existing BP algorithms treat each variable node's neighbors equally when composing a new message, which also limits their exploration ability. To address these issues, we seamlessly integrate BP, Gated Recurrent Units (GRUs), and Graph Attention Networks (GATs) within the massage-passing framework to reason about dynamic weights and damping factors for composing new BP messages. Our model, Deep Attentive Belief Propagation (DABP), takes the factor graph and the BP messages in each iteration as the input and infers the optimal weights and damping factors through GRUs and GATs, followed by a multi-head attention layer. Furthermore, unlike existing neural-based BP variants, we propose a novel self-supervised learning algorithm for DABP with a smoothed solution cost, which does not require expensive training labels and also avoids the common out-of-distribution issue through efficient online learning. Extensive experiments show that our model significantly outperforms state-of-the-art baselines.

Name Change Policy

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to requesting a name change in the electronic proceedings.

Use the "Report an Issue" link to request a name change.

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 29 March 2024

Enhancing combinatorial optimization with classical and quantum generative models

  • Javier Alcazar 1 , 2 ,
  • Mohammad Ghazi Vakili   ORCID: 1 , 3 , 4 ,
  • Can B. Kalayci   ORCID: 1 , 5 &
  • Alejandro Perdomo-Ortiz   ORCID: 1  

Nature Communications volume  15 , Article number:  2761 ( 2024 ) Cite this article

Metrics details

  • Computer science
  • Quantum information

Devising an efficient exploration of the search space is one of the key challenges in the design of combinatorial optimization algorithms. Here, we introduce the Generator-Enhanced Optimization (GEO) strategy: a framework that leverages any generative model (classical, quantum, or quantum-inspired) to solve optimization problems. We focus on a quantum-inspired version of GEO relying on tensor-network Born machines, and referred to hereafter as TN-GEO. To illustrate our results, we run these benchmarks in the context of the canonical cardinality-constrained portfolio optimization problem by constructing instances from the S&P 500 and several other financial stock indexes, and demonstrate how the generalization capabilities of these quantum-inspired generative models can provide real value in the context of an industrial application. We also comprehensively compare state-of-the-art algorithms and show that TN-GEO is among the best; a remarkable outcome given the solvers used in the comparison have been fine-tuned for decades in this real-world industrial application. Also, a promising step toward a practical advantage with quantum-inspired models and, subsequently, with quantum generative models

Similar content being viewed by others

solving constrained optimization problems

Hybrid quantum investment optimization with minimal holding period

Samuel Mugel, Mario Abad, … Román Orús

solving constrained optimization problems

Quantum computing for finance

Dylan Herman, Cody Googin, … Yuri Alexeev

solving constrained optimization problems

Best practices for portfolio optimization by quantum computing, experimented on real quantum devices

Giuseppe Buonaiuto, Francesco Gargiulo, … Marco Pota


Along with machine learning and the simulation of materials, combinatorial optimization is one of top candidates for practical quantum advantage. That is, the moment where a quantum-assisted algorithm outperforms the best classical algorithms in the context of a real-world application with a commercial or scientific value. There is an ongoing portfolio of techniques to tackle optimization problems with quantum subroutines, ranging from algorithms tailored for quantum annealers (e.g., refs. 1 , 2 ), gate-based quantum computers (e.g., refs. 3 , 4 ) and quantum-inspired (QI) models based on tensor networks (e.g., ref. 5 ).

Regardless of the quantum optimization approach proposed to date, there is a need to translate the real-world problem into a polynomial unconstrained binary optimization (PUBO) expression – a task which is not necessarily straightforward and that usually results in an overhead in terms of the number of variables. Specific real-world use cases illustrating these PUBO mappings are depicted in refs. 6 and 7 . Therefore, to achieve practical quantum advantage in the near-term, it would be ideal to find a quantum optimization strategy that can work on arbitrary objective functions, bypassing the translation and overhead limitations raised here.

In our work, we offer a solution to these challenges by proposing a generator-enhanced optimization (GEO) framework which leverages the power of (quantum or classical) generative models. This family of solvers can scale to large problems where combinatorial problems become intractable in real-world settings. We present the main results where we highlight the different features of GEO by performing a comparison with alternative solvers, such as Bayesian optimizers, and generic solvers, like simulated annealing. In the case of the specific real-world, large-scale application of portfolio optimization, we compare against the state-of-the-art (SOTA) optimizers and show the competitiveness of our approach.


Here are more salient highlights that make the proposed approach advantageous over other available solvers:

It leverages the power of generative models: The essence of the solver is that it is aiming to unveil non-obvious structure in the data, and once it has captured those correlations, it suggests new candidates with features similar to the top ones seen until that iteration phase (see Fig.  1 ).

figure 1

The GEO framework leverages generative models to utilize previous samples coming from any quantum or classical solver. The trained quantum or classical generator is responsible for proposing candidate solutions which might be out of reach for conventional solvers. This seed data set (step 0) consists of observation bitstrings \({\{{{{{{{{{\boldsymbol{x}}}}}}}}}^{(i)}\}}_{{{{{{{{\rm{seed}}}}}}}}}\) and their respective costs \({\{{\sigma }^{(i)}\}}_{{{{{{{{\rm{seed}}}}}}}}}\) . To give more weight to samples with low cost, the seed samples and their costs are used to construct a softmax function which serves as a surrogate to the cost function but in probabilistic domain. This softmax surrogate also serves as a prior distribution from which the training set samples are withdrawn to train the generative model (steps 1–3). As shown in the figure between steps 1 and 2, training samples from the softmax surrogate are biased favoring those with low cost value. For the work presented here, we implemented a tensor-network (TN)-based generative model. Therefore, we refer to this quantum-inspired instantiation of GEO as TN-GEO. Other families of generative models from classical, quantum, or hybrid quantum-classical can be explored as expounded in the main text. The quantum-inspired generator corresponds to a tensor-network Born machine (TNBM) model which is used to capture the main features in the training data, and to propose new solution candidates which are subsequently post-selected before their costs \({\{{\sigma }^{(i)}\}}_{{{{{{{{\rm{new}}}}}}}}}\) are evaluated (steps 4-6). The new set is merged with the seed data set (step 7) to form an updated seed data set (step 8) which is to be used in the next iteration of the algorithm. More algorithmic details for the two TN-GEO strategies proposed here, as a booster or as a stand-alone solver, can be found in the main text and in Supplementary Note  1 .F and 1.G.

The entire approach is data-driven: This means that the availability of more data, whether from previous attempts to solve the problem or from other state-of-the-art solvers, is expected to enhance performance. In the example of GEO as a booster, we used data explored by Simulated Annealing (SA) but if we had previous observations from any or many other solvers, we could combine it and give it as a starting point to GEO.

The model is cost function agnostic, i.e., it is a black box solver . This is paramount since any cost function can be solved with our approach. Most of the proposals for quantum or quantum inspired optimization require the cost function of the problem to be mapped to a quadratic or polynomial expression. This opens the possibility to tackle any discrete optimization problem, regardless of how complicated or expensive it is to compute the cost function. This is possible since the only information passed to the generative model are the bitstrings who have been explored and their respective cost value.

Versatility and Strategic Focus on Portfolio Optimization: This follows from the item above. The main motivation for selecting the cardinality-constrained portfolio optimization as the NP-hard problem for our study was the availability of concrete benchmarks and an extensive literature of solvers which have been fine-tuned over the past decades. Every time a new metaheuristic is proposed, chances are portfolio optimization is used to benchmark. Other recent independent works have considered other real-world applications of GEO 8 , 9 . For example, the authors in ref. 8 considered an industrial case related to a floor planning NP-hard problem. This black-box feature is one of the most prominent ones that render our approach advantageous compared to other quantum heuristics, such as the quantum approximate optimization algorithm (QAOA) 3 , 4 , which relies on the cost function to be a polynomial in terms of the binary variables.

Although other proposals leveraging generative models as a subroutine within the optimizer have appeared recently since the publication of our manuscript (e.g., see GFlowNets 10 and the variational neural annealing 11 algorithms), our framework offers the capability for both: handling arbitrary cost functions (i.e., a blackbox solver) and the possibility to swap the generator for a quantum or quantum-inspired implementation. GEO also has the enhanced feature that the more data is available, the more information can be passed and used to train the (quantum) generator.

As shown in Fig.  1 , depending on the GEO specifics we can construct an entire family of solvers whose generative modeling core range from classical, QI or quantum circuit (QC) enhanced, or hybrid quantum-classical model. These options can be realized by utilizing, for example, Boltzmann machines 12 or Generative Adversarial Networks (GAN) 13 , Tensor-Network Born Machines (TNBM) 14 , Quantum Circuit Born Machines (QCBM) 15 or Quantum-Circuit Associative Adversarial Networks (QC-AAN) 16 respectively, to name just a few of the many options for this probabilistic component.

QI algorithms come as an interesting alternative since these allow one to simulate larger scale quantum systems with the help of efficient tensor-network (TN) representations. Depending on the complexity of the TN used to build the quantum generative model, one can simulate from thousands of problem variables to a few tens, the latter being the limit of simulating an universal gate-based quantum computing model. This is, one can control the amount of quantum resources available in the quantum generative model by choosing the QI model.

Therefore, from all quantum generative model options, we chose to use a QI generative model based on TNs to test and scale our GEO strategy to instances with a number of variables commensurate with those found in industrial-scale scenarios. We refer to our solver hereafter as TN-GEO . For the training of our TN-GEO models we followed the work of Han et al. 17 where they proposed to use Matrix Product States (MPS) to build the unsupervised generative model. The latter extends the scope from early successes of quantum-inspired models in the context of supervised ML 18 , 19 , 20 , 21 .

In this work, we will discuss two modes of operation for our family of quantum-enhanced solvers:

In TN-GEO as a “booster" we leverage past observations from classical (or quantum) solvers. To illustrate this mode we use observations from simulated annealing (SA) runs. Results are presented in this section and simulation details are provided in Supplementary Note  1 .F.

In TN-GEO as a stand-alone solver all initial cost function evaluations are decided entirely by the quantum-inspired generative model, and a random prior is constructed just to give support to the target probability distribution the MPS model is aiming to capture. Results are presented in this section and Simulation details are provided in Supplementary Note  1 .G.

Both of these strategies are captured in the algorithm workflow diagram in Fig.  1 and described in more detail in Supplementary Note  1 .

To illustrate the implementation for both of these settings, we tested their performance on an NP-hard version of the portfolio optimization problem with cardinality constraints. The selection of optimal investment on a specific set of assets, or portfolios , is a problem of great interest in the area of quantitative finance. This problem is of practical importance for investors, whose objective is to allocate capital optimally among assets while respecting some investment restrictions. The goal of this optimization task, introduced by Markowitz 22 , is to generate a set of portfolios that offers either the highest expected return (profit) for a defined level of risk or the lowest risk for a given level of expected return. In this work, we focus in two variants of this cardinality constrained optimization problem. The first scenario aims to choose portfolios which minimize the volatility or risk given a specific target return (more details are provided in Supplementary Note  1 .A.) To compare with the reported results from the best performing SOTA algorithms, we ran TN-GEO in a second scenario where the goal is to choose the best portfolio given a fixed level of risk aversion . This is the most commonly used version of this optimization problem when it comes to comparison among SOTA solvers in the literature (more details are provided in Supplementary Note  1 .B).

The following results are broken into three subsections, each highlighting different features from GEO. First, we focus on GEO as a booster and how it can build from results obtained with other solvers. Second, we focus on GEO as a standalone and compare its performance to SA and the Bayesian optimization library GPyOpt 23 . In the final subsection, we focus on a bencmark comparison of GEO with state-of-the-art solvers. While in TN-GEO as a booster and as a stand-alone solver we implemented and fine-tuned each solver, and in the final benchmark, we leveraged the state-of-the-art results from nine other solvers reported in the last two decades. In the latter case, each non-GEO solver was thoroughly fine-tuned by the researchers of each reference. This portfolio optimization problem is so canonical that when a new solver is proposed, researchers can compare their results by taking the results from the new proposed solver, as long as the benchmark problems are run in identical conditions. This was one of the main motivations for us to choose this well-established benchmark problem. The “rules of the game" for reporting each market index and performance indicator are reported in Supplementary Note  1 .B. In contrast, for the other two subsections, the criteria of evaluation are different, and it emphasizes the performance of GEO when one imposes a limit on the total wall-clock time (e.g., as in the booster mode subsection) and when there is a limited number of calls to the cost function (e.g., as in the stand-alone subsection). The latter is a potential scenario when the bottleneck or expensive step is the cost function evaluation itself (e.g., as it is the case of drug discovery where each evaluation (each candidate molecule) might require synthesis in the lab and an expensive and long process towards its Food and Drug Administration (FDA) approval). Note the desirable condition of the cost function being expensive is only ideal to offset the typical longer time incurred in the steps training the generative model. As shown in teh following subsection, this condition can be significantly relaxed when we use GEO as a booster, since a hybrid strategy where GEO is initialized with previous solutions from other solvers can yield an advantage as well for GEO, even in scenarios where the evaluation of the cost function is inexpensive.

TN-GEO as a booster for any other combinatorial optimization solver

In Fig.  2 we present the experimental design and the results obtained from using TN-GEO as a booster. In these experiments we illustrate how using intermediate results from simulated annealing (SA) can be used as seed data for our TN-GEO algorithm. As described in Fig.  2 A, there are two strategies we explored (strategies 1 and 2) to compare with our TN-GEO strategy (strategy 4). To fairly compare each strategy, we provide each with approximately the same computational wall-clock time. For strategy 2, this translates into performing additional restarts of SA with the time allotted for TN-GEO. In the case of strategy 1, where we explored different settings for SA from the start compared to those used in strategy 2, this amounts to using the same total number of number of cost functions evaluations as those allocated to SA in strategy 2. For our experiments this number was set to 20,000 cost function evaluations for strategies 1 and 2. In strategy 4, the TN-GEO was initialized with a prior consisting of the best 1,000 observations out of the first 10,000 coming from strategy 2 (see Supplementary Note  1 .F for details). To evaluate the performance enhancement obtained from the TN-GEO strategy we compute the relative TN-GEO enhancement ,  η , which we define as

figure 2

A Shows the schematic representation, strategies 1–3 correspond to the current options a user might explore when solving a combinatorial optimization problem with a suite of classical optimizers such as simulated annealing (SA), parallel tempering (PT), generic algorithms (GA), among others. In strategy 1, the user would use its computational budget with a preferred solver. In strategy 2-4 the user would inspect intermediate results and decide whether to keep trying with the same solver (strategy 2), try a new solver or a new setting of the same solver used to obtain the intermediate results (strategy 3), or, as proposed here, to use the acquired data to train a quantum or quantum-inspired generative model within a GEO framework such as TN-GEO (strategy 4). B The results show the relative TN-GEO enhancement from TN-GEO over either strategy 1 or strategy 2. Positive values indicate runs where TN-GEO outperformed the respective classical strategies (see Eq. ( 1 )). The data represents bootstrapped medians from 20 independent runs of the experiments and error bars correspond to the 95% confidence intervals. The two instances presented here correspond to portfolio optimization instances where all the assets in the S& P 500 market index where included ( N  = 500), under two different cardinality constraints ( κ = 30 and κ = 50). This cardinality constraint indicate the number of assets that can be included at a time in valid portfolios, yielding a search space of \(M=\left(\begin{array}{c}N\\ \kappa \end{array}\right)\) , with M  ~ 10 69 portfolios candidates for κ  = 50.

Here, \({C}_{\min }^{{{{{{{{\rm{cl}}}}}}}}}\) is the lowest minimum value found by the classical strategy (e.g., strategies 1–3) while \({C}_{\min }^{{{{{{{{\rm{TN}}}}}}}}-{{{{{{{\rm{GEO}}}}}}}}}\) corresponds to the lowest value found with the quantum-enhanced approach (e.g., with TN-GEO). Therefore, positive values reflect an improvement over the classical-only approaches, while negative values indicate cases where the classical solvers outperform the quantum-enhanced proposal.

As shown in the Fig.  2 B, we observe that TN-GEO outperforms on average both of the classical-only strategies implemented. The quantum-inspired enhancement observed here, as well as the trend for a larger enhancement as the number of variables (assets) becomes larger, is confirmed in many other investment universes with a number of variables ranging from N  = 30 to N  = 100 (see Supplementary Note  3 for more details). Although we show an enhancement compared to SA, similar results could be expected when other solvers are used, since our approach builds on solutions found by the solver and does not compete with it from the start of the search. Furthermore, the more data available, the better the expected performance of TN-GEO is. An important highlight of TN-GEO as a booster is that these previous observations can come from a combination of solvers, as different as purely quantum or classical, or hybrid.

The observed performance enhancement compared with the classical-only strategy must be coming from a better exploration of the relevant search space, i.e., the space of those bitstring configurations x representing portfolios which could yield a low risk value for a specified expected investment return. That is the intuition behind the construction of TN-GEO. The goal of the generative model is to capture the important correlations in the previously observed data, and to use its generative capabilities to propose similar new candidates.

Generating new candidates is by no means a trivial task in ML and it determines the usefulness and power of the model since it measure its generalization capabilities. In this setting of QI generative models, one expects that the MPS-based generative model at the core of TN-GEO is not simply memorizing the observations given as part of the training set, but that it will provide new unseen candidates. This is an idea which has been recently tested and demonstrated to some extent on synthetic data sets (see e.g., refs. 24 , 25 , 26 ). In Fig.  3 we demonstrate that our quantum-inspired generative model is generalizing to new samples and that these add real value to the optimization search. This demonstrates the generalization capabilities of quantum generative models in the context of a real-world application in an industrial scale setting, and corresponds to one of the main findings in our paper.

figure 3

The blue histogram represents the number of observations or portfolios obtained from the classical solver ( seed data set ), corresponding to an investment universe with N  = 50 and N  = 100 assets. In orange we represent samples coming from our quantum generative model at the core of TN-GEO. The green dash line is positioned at the best risk value found in the seed data. This mark emphasizes all the new samples obtained with the quantum generative model and which correspond to lower portfolio risk value (better minima) than those available from the classical solver by itself. The number of samples in the case of N  = 50 is equal to 31, while 349 samples were obtained from the MPS generative model in the case of N  = 100.

Note that our TN-based generative model not only produces better minima than the classical seed data, but it also generates a rich amount of samples in the low cost spectrum. This bias is imprinted in the design of our TN-GEO and it is the purpose of the softmax surrogate prior distribution shown in Fig.  1 . This richness of new samples could be useful not only for the next iteration of the algorithm, but they may also be readily of value to the user solving the application. In some applications there is value as well in having information about the runners-up. Ultimately, the cost function is just a model of the system guiding the search, and the lowest cost does not translate to the best performance in the real-life investment strategy.

Generator-enhanced optimization as a stand-alone solver

Next, we explore the performance of our TN-GEO framework as a stand-alone solver. The focus is in combinatorial problems whose cost functions are expensive to evaluate and where finding the best minimum within the least number of calls to this function is desired. In Fig.  4 we present the comparison against four different classical optimization strategies. As the first solver, we use the random solver, which corresponds to a fully random search strategy over the 2 N bitstrings of all possible portfolios, where N is the number of assets in our investment universe. As second solver, we use the conditioned random solver, which is a more sophisticated random strategy compared to the fully random search. The conditioned random strategy uses the a priori information that the search is restricted to bitstrings containing a fixed number of κ assets. Therefore the number of combinatorial possibilities is \(M=\left(\begin{array}{c}N\\ \kappa \end{array}\right)\) , which is significantly less than 2 N . As expected, when this information is not used the performance of the random solver over the entire 2 N search space is worse. The other two competing strategies considered here are SA and the Bayesian optimization library GPyOpt 23 . In both of these classical solvers, we adapted their search strategy to impose this cardinality constraint with fixed κ as well (details in Supplementary Note  1 .E). This raises the bar even higher for TN-GEO which is not using that a priori information to boost its performance. Specific adaptions of the MPS generative model could be implemented to conserve the number of assets by construction, borrowing ideas from condensed matter physics where one can impose MPS conservation in the number of particles in the quantum state. As explained in Supplementary Note  1 .G, we only use this information indirectly during the construction of the artificial seed data set which initializes the algorithm (step 0, Fig.  1 ), but it is not a strong constraint during the construction of the QI generative model (step 3, Fig.  1) or imposed to generate the new candidate samples coming from it (step 4, Fig.  1 ). Post selection can be applied a posteriori such that only samples with the right cardinality are considered as valid candidates towards the selected set (step 5, Fig.  1 ).

figure 4

In this comparison of TN-GEO against four classical competing strategies, investment universes are constructed from subsets of the S& P 500 with a diversity in the number of assets (problem variables) ranging from N  = 30 to N  = 100. The goal is to minimize the risk given an expected return which is one of the specifications in the combinatorial problem addressed here. Error bars and their 95% confidence intervals are calculated from bootstrapping over 100 independent random initializations for each solver on each problem. The main line for each solver corresponds to the bootstrapped median over these 100 repetitions, demonstrating the superior performance of TN-GEO over the classical solvers considered here. As specified in the text, with the exception of TN-GEO, the classical solvers use to their advantage the a priori information coming from the cardinality constraint imposed in the selection of valid portfolios.

In Fig.  4 we demonstrate the advantage of our TN-GEO stand-alone strategy compared to any of these widely-used solvers. In particular, it is interesting to note that the gap between TN-GEO and the other solvers seems to be larger for larger number of variables.

Comparison with state-of-the-art algorithms

Finally, we compare TN-GEO with nine different leading SOTA optimizers covering a broad spectrum of algorithmic strategies for this specific combinatorial problem, based on and referred hereafter as: (1) GTS 27 , the genetic algorithms, tabu search, and simulated annealing; (2) IPSO 28 , an improved particle swarm optimization algorithm 28 ; (3) IPSO-SA 29 , a hybrid algorithm combining particle swarm optimization and simulated annealing; (4) PBILD 30 , a population-based incremental learning and differential evolution algorithm; (5) GRASP 31 , a greedy randomized adaptive solution procedure; (6) ABCFEIT 32 , an artificial bee colony algorithm with feasibility enforcement and infeasibility toleration procedures; (7) AAG 33 , a hybrid algorithm integrating ant colony optimization, artificial bee colony and genetic algorithms; (8) VNSQP 34 , a variable neighborhood search algorithm combined with quadratic programming; and, (9) ABC-HP 35 , a rapidly converging artificial bee colony algorithm. (10) Additionally, we included a classical version of GEO, based on the Neural Autoregressive Density Estimation (NADE) 36 model as the generator; We refer to this implementation as NADE-GEO.

The test data used by the vast majority of researchers in the literature who have addressed the problem of cardinality-constrained portfolio optimization come from OR-Library 37 , which correspond to the weekly prices between March 1992 and September 1997 of the following indexes: Hang Seng in Hong Kong (31 assets); DAX 100 in Germany (85 assets); FTSE 100 in the United Kingdom (89 assets); S&P 100 in the United States (98 assets); and Nikkei 225 in Japan (225 assets). It is important to note that with the exception of NADE-GEO and TN-GEO, each of these nine solvers has been fine-tuned by the authors in the respective reference. In each, the authors have reported their best results to succeed in this canonical benchmark problem, and those are the values that we report and compare against the two versions of GEO implemented here. Details for the hyperparameter fine-tuning of TN-GEO and NADE-GEO can be found in Supplementary Note  1 .H.

Although the full comparison involves the ten algorithms stated above, in this section, we will concentrate on presenting a reduced version of the full results focusing only the current state-of-the-art optimizers, i.e., excluding the metaheuristics from the early 2000’s, and including NADE-GEO. In particular, the selected algorithms whose results are presented in this section are GRASP, ABCFEIT, AAG, VNSQP, ABC-HP and NADE-GEO. Full results are presented in Supplementary Note  3 .

Therefore, here we present the results obtained with TN-GEO and its comparison with NADE-GEO and five of the different SOTA metaheuristic algorithms mentioned above whose results are publicly available in the literature. Table  1 shows the results of those algorithms and all performance metrics for each of the five index data sets (for more details on the evaluation metrics, see Supplementary Note  1 .B). Each algorithm corresponds to a different column, with TN-GEO in the rightmost column. The values are shown in italic entities if the TN-GEO algorithm performed better or equally well compared to the other algorithms on the corresponding performance metric. The numbers in bold mean that the algorithm found the best (lowest) value across all algorithms.

From all the entries in this table, 67% of them correspond to italic entries, where TN-GEO either wins or draws, which is a significant percentage giving that these optimizers are among the best reported in the last decades.

In Table  2 we show a pairwise comparison of TN-GEO against each of the six selected SOTA optimizers. This table reports the number of times TN-GEO wins, loses, or draws compared to results reported for the other optimizer, across all the performance metrics and for all the 5 different market indexes. Therefore, we report in the same table the overall percentage of wins plus draws in each case. We see that this percentage is greater than 50% in all the cases.

Furthermore, in Table  2 , we use the Wilcoxon signed-rank test 38 , which is a widely used nonparametric statistical test used to evaluate and compare the performance of different algorithms in different benchmarks 39 . Therefore, to statistically validate the results, a Wilcoxon signed-rank test is performed to provide a meaningful comparison between the results from TN-GEO algorithm and the selected SOTA metaheuristic algorithms. The Wilcoxon signed-rank test tests the null hypothesis that the median of the differences between the results of the algorithms is equal to 0. Thus, it tests whether there is no significant difference between the performance of the algorithms. The null hypothesis is rejected if the significance value ( p ) is less than the significance level ( α ), which means that one of the algorithms performs better than the other. Otherwise, the hypothesis is retained.

As can be seen from the table, the null hypotheses are accepted at α  = 0.05 for the TN-GEO algorithm over these recent SOTA algorithms except for NADE-GEO, that is rejected, meaning that TN-GEO performs significantly better than NADE-GEO. Thus, in terms of performance on all metrics combined, the results show that there is no significant difference between TN-GEO and the five selected SOTA optimizers (GRASP, ABCFEIT, AAG, VNSQP, and ABC-HP).

In particular, TN-GEO is on par with the most competitive of all the solvers, referred to here as ABC-HP. The authors of ref. 35 attribute the success of this recent ant bee colony solver to a good balance of diversification (good exploration of the search space) and intensification (search around regions in the neighborhood of local minima). Since GEO generates its candidates from the correlations learned from the data, it is not restricted to local search but can be considered a global search solver, which is a difficult property to include in most of the solvers, which usually only exploit the local neighborhood of the best intermediate solutions. Overall, the results confirm the competitiveness of our quantum-inspired proposed approach against SOTA metaheuristic algorithms. This is remarkable, considering that these metaheuristics have been explored and fine-tuned for decades.

Compared to other quantum optimization strategies, an important feature of TN-GEO is its algorithmic flexibility. As shown here, unlike other proposals, our GEO framework can be applied to arbitrary cost functions, which opens the possibility of new applications that cannot be easily addressed by an explicit mapping to a polynomial unconstrained binary optimization (PUBO) problem. Our approach is also flexible with respect to the source of the seed samples, as they can come from any solver, possibly more efficient or even application-specific optimizers. The demonstrated generalization capabilities of the generative model that forms its core, helps TN-GEO build on the progress of previous experiments with other state-of-the-art solvers, and it provides new candidates that the classical optimizer may not be able to achieve on its own. We are optimistic that this flexible approach will open up the broad applicability of quantum and quantum-inspired generative models to real-world combinatorial optimization problems at the industrial scale.

Although we have limited the scope of this work to tensor network-based generative quantum models, it would be a natural extension to consider other generative quantum models as well. For example, hybrid classical-quantum models such as quantum circuit associative adversarial networks (QC-AAN) 16 can be readily explored to harness the power of generative quantum models with so-called noisy intermediate-scale quantum (NISQ) devices 40 . In particular, the QC-AAN framework opens up the possibility of working with a larger number of variables and going beyond discrete values e.g., variables with continuous values as those found in Mixed Integer Programming (MIP) optimization problems 41 or 42 . Both quantum-inspired and hybrid quantum-classical algorithms can be tested in this GEO framework in even larger problem sizes of this NP-hard version of the portfolio optimization problem or any other combinatorial optimization problem. As the number of qubits in NISQ devices increases, it would be interesting to explore generative models that can utilize more quantum resources, such as Quantum Circuit Born Machines (QCBM) 15 : a general framework to model arbitrary probability distributions and perform generative modeling tasks with gate-based quantum computers.

The question of whether a significant advantage can be obtained from GEO by using quantum devices is an active research topic currently being explored. One proposal to reach a more systematic and incremental enhancement from the best quantum-inspired solution to an enhanced quantum-hardware realization was recently proposed in ref. 43 . There, one starts from the best available quantum-inspired tensor-network solution and maps it to a quantum circuit. This can be subsequently modified by adding gates beyond those from the decomposition to increase the plausible correlations beyond those accessible with the quantum-inspired tensor-network-based solution. The access to longer-range correlations enhances, in turn, the expressibility of the quantum generative model while taking it beyond the capabilities of classical simulation. In that work, the specific case of generative models was illustrated, and therefore, these recent decomposition techniques can be directly applied to extend the capabilities of TN-GEO explored here, and, as the technologies mature and the level of noise is reduced, explore these enhanced models directly on quantum devices. Additionally, in ref. 44 a comparison of quantum generative models with state-of-the-art classical generative models, such as Transformers and Recurrent Neural Networks, was presented, and the results were very encouraging in the data sets studied.

Increasing the expressive power of the quantum-inspired core of MPS to other more complex but still efficient QI approaches, such as tree-tensor networks 45 , is another interesting research direction. Although we have fully demonstrated the relevance and scalability of our algorithm for industrial applications by increasing the performance of classical solvers on industrial scale instances (all 500 assets in the S&P 500 market index), there is a need to explore the performance improvement that could be achieved by more complex TN representations or on other combinatorial problems.

Although the goal of GEO was to show good behavior as a general black-box algorithm without considering the specifics of the study application, it is a worthwhile avenue to exploit the specifics of the problem formulation to improve its performance and runtime. In particular, for the portfolio optimization problem with a cardinality constraint, it is useful to incorporate this constraint as a natural MPS symmetry, thereby reducing the effective search space of feasible solutions from the size of the universe to the cardinality size. While imposing such constraints is possible with tensor-networks constructions as recently demonstrated in ref. 46 , there does not seem to be a native way to add such common constraints in canonical deep learning models based on neural-network units.

Beyond the strategy of GEO as a booster, another way to use GEO in conjunction with classical solvers is to use it as the optimization subroutine for the smaller subproblems originating from decomposition or multilevel techniques [see for e.g., refs. 47 , 48 , 49 ] used to mitigate the limitation in the number of qubits in NISQ devices.

Usually, these subproblems are solved with gate-based quantum optimization heuristics such as the Quantum Approximate Optimization Algorithm (QAOA) 3 or D-wave annealing devices, but one could implement a quantum-circuit version of GEO, for example, using QCBM as the generative models, to solve these smaller subproblems and assist the solution of the larger problem via the hybrid quantum-classical decomposition approach. The general question of the hardware requirements needed to prove quantum advantage is a challenging one, and it is beyond the scope of this work, but we think GEO opens the possibility to start exploring this question in more realistic scenarios and in more general cost functions than those that can be natively considered with other approaches such as QAOA.

Finally, our thorough comparison with SOTA algorithms, which have been fine-tuned for decades on this specific application, shows that our TN-GEO strategy manages to outperform a couple of these and is on par with the other seven optimizers. This is a remarkable feat for this approach and hints at the possibility of finding commercial value in these quantum-inspired strategies in large-scale real-world problems, as the instances considered in this work. Also, it calls for more fundamental insights towards understanding when and where it would be beneficial to use this TN-GEO framework, which relies heavily on its quantum-inspired generative ML model. For example, understanding the intrinsic bias in these models, responsible for their remarkable performance, is another important milestone on the road to practical quantum advantage with quantum devices in the near future. The latter can be asserted given the tight connection of these quantum-inspired TN models to fully quantum models deployed on quantum hardware. And this question of when to go with quantum-inspired or fully quantum models is a challenging one that we are exploring in ongoing future work.

Data availability

The data generated in this study is available at:

Code availability

The code used to generate the data in this study has been deposited at: . Your access to and use of the downloadable code (the “Code”) contained in this Section is subject to a non-exclusive, revocable, non-transferable, and limited right to use the Code for the exclusive purpose of undertaking academic, governmental, or not-for-profit research. Use of the Code or any part thereof for commercial or clinical purposes is strictly prohibited in the absence of a Commercial License Agreement from Zapata AI .

Kadowaki, T. & Nishimori, H. Quantum annealing in the transverse ising model. Phys. Rev. E. 58 , 5355 (1998).

Article   ADS   CAS   Google Scholar  

Farhi, E. et al. A quantum adiabatic evolution algorithm applied to random instances of an NP-Complete problem. Science 292 , 472–475 (2001).

Article   ADS   MathSciNet   CAS   PubMed   Google Scholar  

Farhi E., Gutmann S. & Goldstone, J. A quantum approximate optimization algorithm. (2014).

Hadfield, S. et al. From the quantum approximate optimization algorithm to a quantum alternating operator ansatz. Algorithms 12 , 34 (2019).

Article   MathSciNet   Google Scholar  

Mugel, S. et al. Dynamic portfolio optimization with real datasets using quantum processors and quantum-inspired tensor networks. Phys. Rev. Res. 4 , 013006 (2022).

Article   CAS   Google Scholar  

Perdomo-Ortiz, A., Dickson, N., Drew-Brook, M., Rose, G. & Aspuru-Guzik, A. Finding low-energy conformations of lattice protein models by quantum annealing. Sci. Rep. 2 , 571 (2012).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Perdomo-Ortiz, A. et al. Readiness of quantum optimization machines for industrial applications. Phys. Rev. Appl. 12 , 014004 (2019).

Banner, W. P. et al. Quantum inspired optimization for industrial scale problems, (2023).

Moussa, C., Wang, H., Araya-Polo, M., Bäck, T. & Dunjko, V. Application of quantum-inspired generative models to small molecular datasets, (2023).

Bengio, E., Jain, M., Korablyov, M., Precup, D. & Bengio, Y. Flow network based generative models for non-iterative diverse candidate generation. (2021).

Hibat-Allah, M., Inack, E. M., Wiersema, R., Melko, R. G. & Carrasquilla, J. Variational neural annealing. Nat. Mach. Intell. 3 , 952–961 (2021).

Article   Google Scholar  

Cheng, S., Chen, J. & Wang, L. Information perspective to probabilistic modeling: Boltzmann machines versus born machines. Entropy 20 , 583 (2018).

Goodfellow, I. et al. Generative adversarial netsAdvances in neural information processing systems 27 , (2014).

Cheng S., Chen J. & Wang L. Information perspective to probabilistic modeling: Boltzmann machines versus Born machines, Entropy 20 , 583 (2017).

Benedetti, M. et al. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Inf. 5 , 45 (2019).

Article   ADS   Google Scholar  

Rudolph, M. S. et al. Generation of high-resolution handwritten digits with an ion-trap quantum computer. Phys. Rev. X 12 , 031010 (2022).

CAS   Google Scholar  

Han, Z.-Y., Wang, J., Fan, H., Wang, L. & Zhang, P. Unsupervised generative modeling using matrix product states. PRX 8 , 031012 (2018).

Stoudenmire, E. & Schwab, D. J. Supervised learning with tensor networks, Advances in Neural Information Processing Systems 29 (2016).

Efthymiou, S., Hidary, J. & Leichenauer, S. TensorNetwork for machine learning. (2019).

Roberts, C. et al. TensorNetwork: A library for physics and machine learning. (2019).

Fishman, M., White, S. R. & Stoudenmire, E. M. The ITensor software library for tensor network calculations. (2020).

Markowitz, H. Portfolio selection. J. Finance 7 , 77–91 (1952).

Google Scholar  

The GPyOpt Gpyopt: A bayesian optimization framework in python, (2016).

Bradley, T.-D., Stoudenmire, E. M. & Terilla, J. Modeling sequences with quantum states: a look under the hood. Mach. Learn.: Sci. Technol. 1 , 035008 (2020).

Stokes, J. & Terilla, J. Probabilistic modeling with matrix product states. Entropy 21 , 1236 (2019).

Miller, J., Rabusseau, G. & Terilla, J. Tensor networks for probabilistic sequence modeling, in International Conference on Artificial Intelligence and Statistics. pp. 3079–3087 (PMLR, 2021).

Chang, T.-J., Meade, N., Beasley, J. E. & Sharaiha, Y. M. Heuristics for cardinality constrained portfolio optimisation. Comput. Oper. Res. 27 , 1271–1302 (2000).

Deng, G.-F., Lin, W.-T. & Lo, C.-C. Markowitz-based portfolio selection with cardinality constraints using improved particle swarm optimization. Expert Syst. Appl. 39 , 4558–4566 (2012).

Mozafari, M., Jolai, F. & Tafazzoli, S. A new ipso-sa approach for cardinality constrained portfolio optimization. Int. J. Ind. Eng. Comput. 2 , 249–262 (2011).

Lwin, K. & Qu, R. A hybrid algorithm for constrained portfolio selection problems. Appl. Intell. 39 , 251–266 (2013).

Baykasoğlu, A., Yunusoglu, M. G. & Özsoydan, F. B. A grasp based solution approach to solve cardinality constrained portfolio optimization problems. Comput. Ind. Eng. 90 , 339–351 (2015).

Kalayci, C. B., Ertenlice, O., Akyer, H. & Aygoren, H. An artificial bee colony algorithm with feasibility enforcement and infeasibility toleration procedures for cardinality constrained portfolio optimization. Expert Syst. Appl. 85 , 61–75 (2017).

Kalayci, C. B., Polat, O. & Akbay, M. A. An efficient hybrid metaheuristic algorithm for cardinality constrained portfolio optimization. Swarm Evolut. Comput. 54 , 100662 (2020).

Akbay, M. A., Kalayci, C. B. & Polat, O. A parallel variable neighborhood search algorithm with quadratic programming for cardinality constrained portfolio optimization. Knowl.-Based Syst. 198 , 105944 (2020).

Cura, T. A rapidly converging artificial bee colony algorithm for portfolio optimization. Knowl.-Based Syst. 233 , 107505 (2021).

Uria, B., Côté, M.-A., Gregor, K., Murray, I., & Larochelle, H., Neural autoregressive distribution estimation, (2016).

Beasley, J. E. Or-library: distributing test problems by electronic mail. J. Oper. Res. Soc. 41 , 1069–1072 (1990).

Wilcoxon, F. Individual comparisons by ranking methods, in Breakthroughs in statistics . pp. 196–202 (Springer, 1992).

Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 , 1–30 (2006).

MathSciNet   Google Scholar  

Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2 , 79 (2018).

Lee, J. & Leyffer, S. Mixed integer nonlinear programming . (Springer New York, 2011)

Nowak, I. Relaxation and decomposition methods for mixed integer nonlinear programming . (Birkhäuser Basel, 2006)

Rudolph, M. S. et al. Synergistic pretraining of parametrized quantum circuits via tensor networks. Nat. Commun. 14 , 8367 (2023).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Hibat-Allah, M., Mauri, M., Carrasquilla, J. & Perdomo-Ortiz, A. A framework for demonstrating practical quantum advantage: Racing quantum against classical generative models. Commun. Phys. 7 , 68 (2024).

Cheng, S., Wang, L., Xiang, T. & Zhang, P. Tree tensor networks for generative modeling. Phys. Rev. B 99 , 155131 (2019).

Lopez-Piqueres, J., Chen, J. & Perdomo-Ortiz, A. Symmetric tensor networks for generative modeling and constrained combinatorial optimization, Mach. Learn.: Sci. Technol. 4 (2022).

Ponce, M. et al. Graph decomposition techniques for solving combinatorial optimization problems with variational quantum algorithms, (2023)

Ushijima-Mwesigwa, H. et al. Multilevel combinatorial optimization across quantum architectures. ACM Trans. Quantum Comput. 2 , 1–29 (2021).

Zhou, Z., Du, Y., Tian, X. & Tao, D. Qaoa-in-qaoa: solving large-scale maxcut problems on small quantum machines. Phys. Rev. Appl. 19 , 024027 (2023).

Download references


We would like to acknowledge Manuel S. Rudolph, Marta Mauri, Matthew J.S. Beach, Yudong Cao, Luis Serrano, Jhonathan Romero-Fontalvo, Brian Dellabetta, Matthew Kowalsky, Jacob Miller, John Realpe-Gomez, and Collin Farquhar for their feedback on an early version of this manuscript

Author information

Authors and affiliations.

Zapata Computing Canada Inc., 25 Adelaide St E, Suite 1500, Toronto, ON, M5C 3A1, Canada

Javier Alcazar, Mohammad Ghazi Vakili, Can B. Kalayci & Alejandro Perdomo-Ortiz

Acadian Asset Management LLC, 24 King William St, London, EC4R 9AT, England

Javier Alcazar

Department of Chemistry, University of Toronto, Toronto, ON, M5G 1Z8, Canada

Mohammad Ghazi Vakili

Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada

Department of Industrial Engineering, Pamukkale University, Kinikli Campus, 20160, Denizli, Turkey

Can B. Kalayci

You can also search for this author in PubMed   Google Scholar


A.P.-O. conceived the idea of the GEO framework. J.A., M.G.V., and A.P.-O. contributed to its final form. J.A. performed all the numerical experiments and results for GEO as a booster and as a standalone solver. M.G.V., J.A., and C.B.K. contributed to the results section comparing GEO with state-of-the-art solvers and the respective statistical analysis. M.G.V. implemented the NADE-GEO solver. A.P.-O. helped supervise and coordinate the efforts in this work. All authors regularly analyzed the numerical results and contributed to the final version of the manuscript.

Corresponding author

Correspondence to Alejandro Perdomo-Ortiz .

Ethics declarations

Competing interests.

The authors declare the following competing interests: J.A., M.G.V., C.B.K., and A.P.-O. were employed by Zapata Computing Canada Inc. during the development of this work. There was no collaboration between Acadian Asset Management LLC. and Zapata AI during the development of this work.

Peer review

Peer review information.

: Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

Reprints and permissions

About this article

Cite this article.

Alcazar, J., Ghazi Vakili, M., Kalayci, C.B. et al. Enhancing combinatorial optimization with classical and quantum generative models. Nat Commun 15 , 2761 (2024).

Download citation

Received : 14 February 2021

Accepted : 15 March 2024

Published : 29 March 2024


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

solving constrained optimization problems

Preference-Aware Constrained Multi-Objective Bayesian Optimization (Student Abstract)

  • Alaleh Ahmadianshalchi Washington State University
  • Syrine Belakaria Stanford university
  • Janardhan Rao Doppa Washington State University

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

  • Video/Poster

How to Cite

  • Endnote/Zotero/Mendeley (RIS)


  • For Readers
  • For Authors
  • For Librarians

Developed By


Login to access subscriber-only resources.

Part of the PKP Publishing Services Network

Copyright © 2024, Association for the Advancement of Artificial Intelligence

More information about the publishing system, Platform and Workflow by OJS/PKP.

A GPU-Based Ising Machine With a Multi-Spin-Flip Capability for Constrained Combinatorial Optimization

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.


  1. Solving Nonlinear Constrained Optimization Problems with Matlab

    solving constrained optimization problems

  2. how to solve constrained optimization problems in matlab

    solving constrained optimization problems

  3. how to solve constrained optimization problems in matlab

    solving constrained optimization problems

  4. How To Solve Constrained Optimization Problems Using Genetic Algorithm (GA) Solver In Matlab

    solving constrained optimization problems

  5. Solving Constrained Optimization Problem Using Penalty Function Method

    solving constrained optimization problems

  6. Solving Constrained Optimization Problems Using Particle Swarm Optimization Algorithm (Matlab Code)

    solving constrained optimization problems


  1. constrained optimization| Returns to Scale| optimal input combinations

  2. Constrained Optimization (Part 2 of 3)

  3. Lagrange Multipliers

  4. Optimality conditions: unconstrained problems, quadratic functions

  5. 253.072.1 Constrained Optimization 2: Lagrange Multipliers

  6. Lec 25: Constrained Optimization (penalty function method- part 2, method of multipliers- part1)


  1. Constrained optimization

    The constrained-optimization problem (COP) is a significant generalization of the classic constraint-satisfaction problem (CSP) ... Virtually, this corresponds on ignoring the evaluated variables and solving the problem on the unassigned ones, except that the latter problem has already been solved. More precisely, the cost of soft constraints ...

  2. 2.7: Constrained Optimization

    In this section we will use a general method, called the Lagrange multiplier method, for solving constrained optimization problems: Maximize (or minimize) : f(x, y) (or f(x, y, z)) given : g(x, y) = c (or g(x, y, z) = c) for some constant c. The equation g(x, y) = c is called the constraint equation, and we say that x and y are constrained by g ...

  3. Lagrange multipliers intro

    The "Lagrange multipliers" technique is a way to solve constrained optimization problems. Super useful! Background. Contour maps; ... However, if you (or more realistically a computer) were solving a given constrained optimization problem, it's not like you would first find the unconstrained maximum, check if it fits the constraint, then turn ...

  4. Constrained optimization introduction (video)

    AboutTranscript. The Lagrange multiplier technique is how we take advantage of the observation made in the last video, that the solution to a constrained optimization problem occurs when the contour lines of the function being maximized are tangent to the constraint curve. Created by Grant Sanderson.

  5. 13.9: Constrained Optimization

    However, solving constrained optimization problems is a very important topic in applied mathematics. The techniques developed here are the basis for solving larger problems, where more than two variables are involved. We illustrate the technique again with a classic problem.

  6. PDF MATH 4211/6211

    an unconstrained problem of x2. Another way to solving this is using 1 = x2 1 + (2x2)2 4x1x2 where the equality holds when x1 = 2x2.So x1 = p 2=2 and x2 = p 2=4. However, not all equality-constrained problems can be easily converted into unconstrained ones. Xiaojing Ye, Math & Stat, Georgia State University 5

  7. PDF Introduction to Constrained Optimization

    Practice Problem 1 1. Write constraints for each of the following: a) A batch of cookies requires 3 cups of flour, and a cake requires 4. Write a constraint limiting the amount of cookies and cakes that can be made with 24 cups of flour. b) Box type 1 can hold 20 books and box type 2 can hold 12. Write a constraint for the number of boxes

  8. What Is Constrained Optimization?

    Constrained optimization, also known as constraint optimization, is the process of optimizing an objective function with respect to a set of decision variables while imposing constraints on those variables.. In this tutorial, we'll provide a brief introduction to constrained optimization, explore some examples, and introduce some methods to solve constrained optimization problems.

  9. Lagrange multipliers, using tangency to solve constrained optimization

    Lagrange multipliers, using tangency to solve constrained optimization. The Lagrange multiplier technique is how we take advantage of the observation made in the last video, that the solution to a constrained optimization problem occurs when the contour lines of the function being maximized are tangent to the constraint curve.

  10. 10.8: Constrained Optimization

    Constrained Optimization and Lagrange Multipliers. In Preview Activity 10.8.1 10.8. 1, we considered an optimization problem where there is an external constraint on the variables, namely that the girth plus the length of the package cannot exceed 108 inches. We saw that we can create a function g g from the constraint, specifically g(x, y ...

  11. Solving constrained optimization problems via multifactorial evolution

    It attempts to solve multiple optimization problems simultaneously using a single evolving population. Due to the implicit knowledge transfer, multifactorial evolution exhibits the potential to solve complex optimization problems. This paper tries to take advantage of multifactorial evolution to solve constrained optimization problems (COPs).

  12. Calculus I

    This problem is a little different from the previous problems. Both the constraint and the function we are going to optimize are areas. The constraint is that the overall area of the poster must be 200 in 2 while we want to optimize the printed area (i.e. the area of the poster with the margins taken out).

  13. Introduction to Constrained Optimization in the Wolfram Language

    Constrained optimization problems are problems for which a function f(x) is to be minimized or maximized subject to constraints \[CapitalPhi] (x). Here f:\[DoubleStruckCapitalR]^n-> \[DoubleStruckCapitalR] is called the objective function and \[CapitalPhi](x) is a Boolean-valued formula. In the Wolfram Language the constraints \[CapitalPhi](x) can be an arbitrary Boolean combination of ...

  14. Constrained Nonlinear Optimization Algorithms

    Constrained Optimization Definition. Constrained minimization is the problem of finding a vector x that is a local minimum to a scalar function f ( x ) subject to constraints on the allowable x: min x f ( x) such that one or more of the following holds: c(x) ≤ 0, ceq(x) = 0, A·x ≤ b, Aeq·x = beq, l ≤ x ≤ u. There are even more ...

  15. Constrained Optimization and the Lagrange Method

    B.3 Constrained Optimization and the Lagrange Method. One of the core problems of economics is constrained optimization: that is, maximizing a function subject to some constraint. We previously saw that the function y = f (x_1,x_2) = 8x_1 - 2x_1^2 + 8x_2 - x_2^2 y = f (x1,x2) = 8x1 − 2x12 + 8x2 − x22 has an unconstrained maximum at the ...

  16. Solving Constrained Multi-objective Optimization Problems with

    Most optimization problems in real-life have multiple constraints. Constrained optimization problems with more than one objective, with at least two objectives in conflict with one another, are referred to as constrained multi-objective optimization problems (CMOPs).

  17. Push and pull search for solving constrained multi-objective

    This paper proposes a push and pull search (PPS) framework for solving constrained multi-objective optimization problems (CMOPs). To be more specific, the proposed PPS divides the search process into two different stages: push and pull search stages. In the push stage, a multi-objective evolutionary algorithm (MOEA) is used to explore the ...

  18. A Filled Penalty Function Method for Solving Constrained Optimization

    An important method to solve constrained optimization problem is to approach the optimal solution of constrained optimization problem gradually by sequential unconstrained optimization method, namely penalty function method. And the filling function method is one of the effective methods to solve the global optimal problem. In this paper, a class of augmented Lagrangian objective filled ...

  19. 4.7: Optimization Problems

    Solving Optimization Problems over a Closed, Bounded Interval. The basic idea of the optimization problems that follow is the same. We have a particular quantity that we are interested in maximizing or minimizing. ... Solving the constraint equation for \(y\), we have \(y=\dfrac{216}{x^2}\). Therefore, we can write the surface area as a ...

  20. Constrained Optimization

    Constrained Optimization. Added Mar 16, 2017 by vik_31415 in Mathematics. Constrained Optimization. Send feedback | Visit Wolfram|Alpha. Get the free "Constrained Optimization" widget for your website, blog, Wordpress, Blogger, or iGoogle. Find more Mathematics widgets in Wolfram|Alpha.

  21. The Lagrangian (video)

    So kind of the whole point of this Lagrangian is that it turns our constrained optimization problem involving R and B and this new made-up variable lambda into an unconstrained optimization problem where we're just setting the gradient of some function equal to zero so computers can often do that really quickly so if you just hand the computer ...

  22. On solving constrained optimization problems with neural networks: a

    Deals with the use of neural networks to solve linear and nonlinear programming problems. The dynamics of these networks are analyzed. In particular, the dynamics of the canonical nonlinear programming circuit are analyzed. The circuit is shown to be a gradient system that seeks to minimize an unconstrained energy function that can be viewed as a penalty method approximation of the original ...

  23. Artificial Bee Colony (ABC) Optimization Algorithm for Solving

    The ABC algorithm has been firstly proposed for unconstrained optimization problems and showed that it has superior performance on these kind of problems. In this paper, the ABC algorithm has been extended for solving constrained optimization problems and applied to a set of constrained problems . Keywords. Particle Swarm Optimization

  24. A data-driven approach to PDE-constrained optimization in inverse problems

    Inverse problems are ubiquitous in science and engineering. Many of these are naturally formulated as a PDE-constrained optimization problem. These non-linear, large-scale, constrained optimization problems know many challenges, of which the inherent non-linearity of the problem is an important one. As an alternative to this physics-driven approach, data-driven methods have been proposed ...

  25. Solving a Min-Max Optimization Problem with Constraint

    I'm trying to solve the following optimization problem: min max{x + 2y − 1, 2x + 0.5y + 0.75} s.t. x + y = 1, so far i've tried to solve it separately in terms of max and min but im not sure when i can use the constrain, can i solve it as : max{x + 2y − 1, 2x + 0.5y + 0.75} s.t x+y=1 and then taking the min value

  26. Deep Attentive Belief Propagation: Integrating Reasoning and ...

    Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs). It has been shown that BP can achieve state-of-the-art performance on various benchmarks by mixing old and new messages before sending the new one, i.e., damping.

  27. Enhancing combinatorial optimization with classical and ...

    Solving combinatorial optimization problems using quantum or quantum-inspired machine learning models would benefit from strategies able to work with arbitrary objective functions. Here, the ...

  28. Preference-Aware Constrained Multi-Objective Bayesian Optimization

    We consider the problem of constrained multi-objective optimization over black-box objectives, with user-defined preferences, with a largely infeasible input space. Our goal is to approximate the optimal Pareto set from the small fraction of feasible inputs. The main challenges include huge design space, multiple objectives, numerous constraints, and rare feasible inputs identified only ...

  29. A GPU-based Ising Machine with a Multi-spin-flip Capability for

    Abstract: Ising machines are domain-specific computers that solve combinatorial optimization problems (COPs). They utilize an Ising model to represent a COP and search for the optimal spin configuration of the Ising model to solve the COP. Most Ising machines are based on simulated annealing (SA) and update the spin configuration according to single-spin-flip Markov Chain Monte Carlo methods.

  30. A novel regional traffic control strategy for mixed traffic system with

    The Frank-Wolfe algorithm is used to solve the lower-level model, with a penalty function introduced to transform the constrained traffic assignment problem (TAP) into an unconstrained TAP. The proposed method is applied using the data of Beijing urban road network and a sensitivity analysis is conducted to examine the impacts of critical ...