Due 8:00am, Thursday, August 28, 2025.
To ensure you are all up and running right away with computing, here is a small assignment due a the beginning of class Thursday.
Install Anaconda as described in the class description, and replicate the implementation of Newton's method that I did in class today.
Bring your laptop to class and show me that you got it to work. If you run into any difficulties with this, please let me know by email right away.
Due 11:59pm, Friday, September 5, 2025.
[4 points]
Take the last 4 digits of your UB person number, interpreted as a decimal integer, and divide by 100. By hand, determine the binary representation the resulting number. (For example, if your person number were 87654719, you'd find the binary representation of 47.19.) Use the repeated multiplying and dividing by 2 method shown in class, and show all your work.
A floating point system similar to the IEEE 754 system has: a sign bit, then a 2-bit exponent biased by 2, then a 3-bit mantissa. Assuming all bit-codes are used for numbers, and that the all-zeros code is used for 0 ...
a [4 points] List all the non-negative machine numbers in this system if denormalization is NOT used. For each, give the bit-code in both binary and hex form (pad with zeros on the left for the hex conversion), the number it represents - in binary fixed-point form (e.g. 110.0101), and finally in decimal notation (e.g. 1.875).
b What is the smallest positive machine number with and without denormalization. For each, give all the forms requested in part a.
c [1 point] The "machine epsilon" or εmach, defined to be the difference between 1 and the next larger machine number, is a very important characteristic of a floating point number system like this. What is εmach for this system?
d [1 point] How big is the "hole at zero" for this system in part (a), i.e. the gap between the smallest positive number and 0?
e [1 point] How big is the hole at zero in part (b)?
f [1 point] What would be the largest number in the system if the top exponent code were reserved for "Inf"s and "NaN"?
Note to self: The now-reduced part b can be omitted because it's covered in part d.
[5 points] Derive by hand the hexadecimal machine IEEE 754 64-bit code for the number 9.3. Show all the steps clearly.
Due 11:59pm, Friday, September 12, 2025.
Consider the difficulty of accurately evaluating sin x - x when x is near 0.
a [5 points] Find an approximate bound for the relative rounding error in evaluating the formula directly (in terms of machine epsilon). The desired answer is a small multiple of machine epsilon divided by a power of x. You may use low-order Taylor polynomial approximations and discard higher order terms when appropriate in order to get this answer.
b [3 points] Find a bound on the relative truncation error incurred by using the approximation sin x - x ~ -x^3/3! + x^5/5! - x^7/7!.
c [5 points] If we use the approximation sin x - x ~ -x^3/3! + x^5/5! - x^7/7!, for approximately which values of x (near 0) will the approximation be evaluated more accurately than the formula sin x - x itself if we are using arithmetic with machine epsilon = 2^-52? How many decimal digits are reliable in the worst case if we use the approximation and direct evaluation in the appropriate ranges? It is essential that you show all your work and make your reasoning clear to the grader!
a Ackleh 2.10.1(a) [3 points]
b Ackleh 2.10.3 using Python. [(a) 3 points, (b) 3 points]
c Ackleh 2.10.4 (4th root of 1000) [1 point]
Python formatting hint (uses an "f-string"):
print(f'{1/300:+.8f} {1/300:.15e} and {777:4d}')
+0.00333333 3.333333333333334e-03 and 777
Due 11:59pm, Friday, September 19, 2025.
For each of the 6 functions, g, whose graphs are sketched below:
Consider the example g(x) = (x2 + 6) ⁄ 5 on G=[1,2.3]. Show that g maps G into G and show that g is a contraction, either by brute force or using appropriate Propositions from Section 2.3.
Transcribe the statement and proof of Proposition 2.3, p 42, explain the hypothesis that's in addition to g being a contraction, and explain and/or justify each step of the proof.
Due 11:59pm, Friday, September 26, 2025.
Prove that for a real nxn matrix, the matrix norm induced by the vector infinity-norm is the maximum row sum of absolute values.
The following code generates 2000 random linear systems and a random perturbation of each, and makes a picture. Run it and describe/show how the results are consistent or not with Theorem 3.10.
import numpy as np import matplotlib.pyplot as plt def vnorm(x): return np.abs( x ).max() # alternatively np.linalg.norm(x,np.inf) def mnorm(a): return np.abs(a ).sum(axis=1).max() n = 5 plt.subplot(111,aspect=1) for k in range(2000): a0 = np.random.randn(n,n) # random base matrix a0inv = np.linalg.inv(a0) ka = mnorm(a0)*mnorm(a0inv) x0 = np.random.randn(n) # solution of base system b0 = a0@x0 # right hand side of base system delta = 1.e-6*np.random.rand() # random magnitude of perturbation da = delta*np.random.randn(n,n) # random perturbation of matrix db = delta*np.random.randn(n) # random perturbation of right hand side a = a0 + da b = b0 + db x = np.linalg.solve(a,b) # solve perturbed system using GEPP dx = x - x0 lhs = vnorm(dx)/vnorm(x0) # relative change in solution rhs = float( ka/(1 - mnorm(da)*mnorm(a0inv))*( vnorm(db)/vnorm(b) + mnorm(da)/mnorm(a) )) # bound on relative change from Theorem 3.10 plt.loglog(rhs,lhs,'o',ms=1,alpha=0.5) plt.xlabel('RHS') plt.ylabel('LHS');
Perform GE(PP) by hand on the matrix
[ 1 0 0 0 1 ] [-1 1 0 0 1 ] [-1 -1 1 0 1 ] [-1 -1 -1 1 1 ] [-1 -1 -1 -1 1 ]
demonstrating that the bound 2n − 1 on the growth rate in GEPP can in fact occur. Show the intermediate result just for each "k", not for every single row operation.
The Hilbert matrices are a classic example of ill-conditioned matrices. As you know, the condition number of a matrix in any norm is the product if the norms of the matrix and of its inverse.
Although accurately computing the inverse of an ill-conditioned matrix is problematic in general, the Hilbert matrices have rational elements, hence so do their inverses, and since sympy can do rational arithmetic exactly, we can compute the inverses of the Hilbert matrices exactly. Here is code to construct the Hilbert matrices and their inverses with sympy:
# sympy for exact inverse of Hilbert matrix import sympy as sp # construct the Hilbert matrix of order n a = sp.ones(n) for i in range(n): for j in range(n): a[i,j] /= (1+i+j) display(a) ainv = a.inv() ainv
and here is a way to compute the infinity norm (maximum row sum of absolute values) of a matrix using numpy (which has a nicer operation set than sympy):
np.abs(m).sum(axis=1).max()
(Numpy can handle m being a sympy Matrix.) Make sure you understand how the above line of code works.
(c) For each value of n from 1 to 10, construct and use numpy.linalg.solve() (GEPP) to solve 4000 randomly chosen linear systems Hnx = b and compute the largest relative error in your solutions and the error bound provided by Wilkinson's theorem assuming the worst possible growth rate, ρ. (n on the horizontal axis, log error on vertical; a dot for the worst error in your 4000 trials, and a dot for the Wilkinson error bound.) Comment on the results.
Due 11:59pm, Friday, October 10, 2025.
In solving a linear system Ax=b with finite-precision arithmetic, we have a (Wilkinson) bound on the norm of error relative to the solution: ||error|| ⁄ ||x||. For example, in Homework 4, Question 4, I found that the relative error bound with ϵmach = 2 − 52 for systems with Hilbert matrix H3 is about 2 × 10 − 11.
This does not mean that each component of the solution has this relative accuracy.
Provide an example of a pair of 3-vectors x and an approximation x̃, for which the norm of the difference relative to the norm of x is 1% while there is a 100% error in one of the components.
(b) Below I have written a version of Householder QR that is more economical than the one I wrote in class by avoiding matrix-matrix multiplication to form each Hi in HQRstep. Obtain an exact count of the number of arithmetic operations used by this implementation of HQR applied to an mxn matrix with m ≥ n. Begin by obtaining a formula (you can use sympy to do the sum) for the operation count for HQRstep applied to an MxN matrix. (M,N will start as m,n and be succcesively smaller by 1 at each stage.)
Include in your answer the code annotated with the operation cost for each line.
Note that this is still not the optimal implementation of Householder QR, which would avoid ever forming Q, and simply provide the list of v's that determine Q.
Due 11:59pm, Friday, October 24, 2025.
a Write down in Lagrange form, without simplifying, a polynomial that passes through the points (0,3), (2,3), (3,5), (6,4).
b Simplify by cleaning up the numerical coefficients, but do not multiply out the (x-x_i) terms.
c Re-express the polynomial in the monomial basis by expanding products and collecting terms.
d Plot the polynomial on the interval [0,6] and add dots at the data points.
e Modify the code we wrote in class on Day 13 to use the Newton basis instead of the monomial basis, and graphically verify the generated polynomial is the same. Suggestion make the first curve dark-colored and thick (say linewidth=3) and the second one light-colored and thin.
Prove Thm 4.6. Please take only one logical step per line, and justify every step to the extent needed to make it clear to yourself. At the end, state which step required the most ingenuity, in your opinion.
a Find the interpolating quadratic polynomial to the data (-1,1/e), (0,1), (1,e), and use it to approximate √(e). Hint: we are interpolating f(x) = ex, √(e) = f(0.5).
b Use the interpolation error theorem to obtain a bound on the error in your estimate in part a.
c Using floating point arithmetic and np.exp() in Python, find the error in your estimate in part a, and comment on how it compares with the bound you obtained in part b.
a If points x0 = − 1, x2, ..., xn = 1 are uniformly spaced in the interval [-1,1], what is the maximum of absolute value of the product (x − x0)(x − x1)...(x − xn) that appears in the polynomial interpolation error theorem, for n = 1,2,3,4,...,40? Analytically would be great, but it's ok to approximate it numerically by evaluating the max of the product at, say, linspace(-1,1,2000).
b For Chebyshev nodes x0, x1, ..., xn on the interval [-1,1], the maximum of absolute value of the product (x-x1)(x-x2)...(x-xn) is 1/2^n. Augment your code in part a to compute the ratio of the max for uniform and for Chebyshev nodes, and comment on how much better the Chebyshev node choice is than the uniform choice (as a function of n).
a. Show that the error bound of Thm 4.13 (p241) for a piecewise linear interpolant is attained by any quadratic function on every subinterval of the interpolation partition, where the norm in the error bound is taken over just the subinterval.
How good is cubic spline interpolation (with "natural" end conditions) for Runge's function using 19 uniformly spaced points on [-1,1] (including the endpoints)? For comparison, the degree-19 polynomial interpolant using 19 uniformly spaced nodes is REALLY bad, as measured by the L∞-norm of the discrepancy, but using Chebyshev nodes is much better: see pictures below. Support your answer with numbers and/or plots generated using the code I demonstrated in class on Day 15.
If we wanted to draw an approximate circle with Bézier cubics, we might try doing it in 4 pieces, approximating the quarter-circle {(cosθ, sinθ) | θ ∈ [0, π ⁄ 2} with a Bézier segment. In that case it seems that we'd have to have P0 = (1, 0) and P3 = (0, 1), and for the approximate circle not to have a corner at the segment ends, we'd need a vertical tangent at P0, and a horizontal tangent at P3. Thus P1 should be directly above P0: P1 = (1, something) and P2 directly to the right of P3: P2 = (something, 1). And for symmetry we'd also need those two somethings to be the same. Thus it seems like we have only one degree of freedom: the value of "something". By trial-and-error, or other means, find a good value. Explain how you came up with your answer, make a plot of your approximate circle, and give some measure of its deviation from a perfect circle.