Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Solutions to Mock Exam: Statistical Inference and Stochastic Processes, Lecture notes of Calculus

Solutions to a mock exam covering topics in statistical inference and stochastic processes, including the calculation of normalization constants, the Markov Chain Monte Carlo (MCMC) method, and the Euler-Maruyama scheme for solving stochastic differential equations.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

goofy-6
goofy-6 🇬🇧

5

(6)

230 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Mock exam: solutions
U. Vaes
March 2020
Question 1. 1. We first calculate the normalization constant:
Z=Z
−∞
eλx
1
[0](x) dx=1eλα
λ.
The cumulative distribution is given by
Gα,λ(x) = Zx
−∞
gα,λ(y) dy=1
ZZx
−∞
eλy
1
[0](y) dy= max 0,min 1,1eλx
1eλα ,
which is a compact way of writing
Gα,λ(x)
0,if x < 0;
1eλx
1eλα if 0 xα;
1,if x > α.
The generalized inverse of Gα,λ is given by
Fα,λ(u) = inf{x:Gα,λ(x)u}=1
λlog 1u(1 eλα)
To check our result, we can verify that Fα,λ(0) = 0 and Fα,λ(1) = α. Notice that Fα,λ coincides
with G1
α,λ on (0,1), which is how we obtained the expression for Fα,λ: for u(0,1),
Gα,λ(x) = u1eλx
1eλα =ux=1
λlog 1u(1 eλα).
If {Ui}i=1,2,... is a stream of independent U(0,1) random variables, then {Fα,λ(Ui)}i=1,2,... is a
stream of IID samples from gλ,α.
2. Let us rewrite the expression of f(x) for convenience:
f(x) = 32 x(1 x) e4(x1)
3+e4
1
[0,1](x).
The derivative of f(x) in (0,1) is given by
f0(x) = 32 (1 2x) e4(x1) 4x(1 x) e4(x1)
3+e4,
1
pf3
pf4
pf5

Partial preview of the text

Download Solutions to Mock Exam: Statistical Inference and Stochastic Processes and more Lecture notes Calculus in PDF only on Docsity!

Mock exam: solutions

U. Vaes

March 2020

Question 1. 1. We first calculate the normalization constant:

Z =

−∞

e−λx^1 0,α dx = 1 − e−λα λ.

The cumulative distribution is given by

Gα,λ(x) =

∫ (^) x

−∞

gα,λ(y) dy =^1 Z

∫ (^) x

−∞

e−λy^1 0,α dy = max

0 , min

1 , 1 −^ e

−λx 1 − e−λα

which is a compact way of writing

Gα,λ(x)

0 , if x < 0; 1 −e−λx 1 −e−λα^ if 0^ ≤^ x^ ≤^ α; 1 , if x > α.

The generalized inverse of Gα,λ is given by

Fα,λ(u) = inf{x : Gα,λ(x) ≥ u} = − 1 λ log

1 − u(1 − e−λα)

To check our result, we can verify that Fα,λ(0) = 0 and Fα,λ(1) = α. Notice that Fα,λ coincides with G− α,λ^1 on (0, 1), which is how we obtained the expression for Fα,λ: for u ∈ (0, 1),

Gα,λ(x) = u ⇔ 1 − e−λx 1 − e−λα^ =^ u^ ⇔^ x^ =^ −^

λ log

1 − u(1 − e−λα)

If {Ui}i=1, 2 ,... is a stream of independent U (0, 1) random variables, then {Fα,λ(Ui)}i=1, 2 ,... is a stream of IID samples from gλ,α.

  1. Let us rewrite the expression of f (x) for convenience:

f (x) = 32 x(1 − x) e−4(x−1) 3 + e^4 1 0,1. The derivative of f (x) in (0, 1) is given by

f ′(x) = 32 (1 − 2 x) e−4(x−1)^ − 4 x(1 − x) e−4(x−1) 3 + e^4 ,

which vanishes when

(1 − 2 x) − 4 x(1 − x) = 0 ⇔ 4 x^2 − 6 x + 1 = 0 ⇔

2 x −

i.e. when x =^3 ±

Of the two roots, only x 1 := 3 −

√ 5 4 lies in [0,^ 1].^ To check that this root corresponds to a maximum, we can calculate the second derivative

f ′′(x) = − 64 (8x

(^2) − 16 x + 5) e−4(x−1) 3 + e^4

(x − 1)^2 − (^38)

e−4(x−1) 3 + e^4

which is negative at x 1 , so we conclude

arg max x∈[0,1]

f (x) h(x) = arg max x∈[0,1]

f (x) = x 1.

This implies that the best (i.e. the smallest, since the acceptance probability/rate is given by 1 M and we want to maximize this rate) constant for rejection sampling is given by

M 1 := inf{M : f (x) ≤ M h(x)∀x ∈ [0, 1]} = inf

M :

f (x) h(x) ≤^ M^ ∀x^ ∈^ [0,^ 1]

= inf

M :

f (x 1 ) h(x 1 ) ≤^ M

f (x 1 ) h(x 1 ) =

4e

e

√ 5

3 + e^4. The rejection sampler works as follows:

  • Generate X ∼ h, i.e. here X ∼ U (0, 1), and U ∼ U (0, 1).
  • If U ≤ (^) Mf 1 ( hx()x) , accept X and stop (or return to the first step to generate other samples).
  • Else, reject X and return to the first step. Suppose now that we use g 1 , 4 instead of h. We calculate

f (x) g 1 , 4 (x)

32 x(1−x) e −4(x−1) 3+e^4 4e−^4 x 1 −e−^4

= 8^1 −^ e

− 4 3 + e^4

x(1 − x) e−4(x−1) e−^4 x^ = 8 e

e^4 + 3 x(1 − x),

which is maximized at x 2 = 12. Given that

M 1 = f^ (x^1 ) h(x 1 )

f^ (x^2 ) g 1 , 4 (x 2 )

=: M 2 ,

rejection sampling is more efficient using g 1 , 4 – if the constant M in the rejection sampling algorithm is chosen optimally in both cases, using g 1 , 4 leads to a higher acceptance probability.

Question 2. 1. The MH algorithm is given in the lecture notes. The proposal density associated with the proposal y =

1 − β^2 x + β w, w ∼ N (0, 1), is given by q(y|x) =

√^1

2 πβ^2

exp

(y −

1 − β^2 x)^2 2 β^2

We first calculate the transition function p(x, y) = “P[Xn+1 = y|Xn = x]”. We add quotation marks here because, in fact, the transition probability measure is not absolutely continuous with respect to the Lebesgue measure, so it does not really make sense to consider its density. Strictly speaking, we should view p(x, ·) as a probability measure on R. Employing the same reasoning as in the lecture notes, we obtain that the probability that a proposal from x is accepted is given by ∫

R

q(y|x) α(x, y) dy.

Therefore, we find that for any set B ∈ B(R), where B(R) denotes the Borel σ-algebra on R,

p(x, B) =

B

q(y|x) α(x, y) dy +

R

q(y|x) α(x, y) dy

δx(B), (1)

where δx is a Dirac measure. Another suitable notation to write this is

p(x, ·) = q(·|x) α(x, ·) +

R

q(y|x) α(x, y) dy

δx,

For this equation to make sense as an equality of measures, we interpret the first term in the right-hand side as the measure induced by the function q(·|x) α(x, ·). Indeed, remember that any function f ∈ L^1 (R) (or even L^1 loc, but don’t worry if you haven’t seen this notation before), induces a measure μf by μf (B) =

B

f (x) dx.

Let us emphasize that these comments are mostly for your information; as mentioned in the revision class, the MCMC question at the exam will focus on discrete state spaces, as do the lecture notes for the most part. A distribution π is reversible for the Markov chain if ∫ A

π(x) p(x, B) dx =

B

π(y) p(y, A) dy ∀A, B ∈ B(R).

Employing the expression of p(x, ·) that we found in (1), the left-hand side is

LHS =

A

B

π(x)q(y|x) α(x, y) dy dx +

A

π(x)

R

q(y|x) α(x, y) dy

δx(B) dx,

=

A

B

π(x)q(y|x) π(y) π(x) + π(y) dy dx +

A∩B

R

q(y|x) π(y) π(x) + π(y) dy

π(x)dx,

=

A

B

q(y|x) π(x)π(y) π(x) + π(y) dy dx +

A∩B

π(x)dx −

A∩B

R

q(y|x) π(x)^ π(y) π(x) + π(y) dy dx

Since q(x|y) = q(y|x), this expression is invariant upon swapping A and B, and so (easy to check) the RHS can be developed simiarly to obtain the same expression.

Question 3. 1. A continuous time Gaussian process is defined in Definition 4.8 of the lecture notes.

  1. Strict and weak stationarity are defined in Definitions 4.9 and 4.10, respectively.
  1. If {Xt} is a Gaussian process with mean function μ(t) and covariance function C(s, t), then   

Xt 1 .. . XtN

 ∼ N

m(t 1 ) .. . m(tN )

C(t 1 , t 1 )... C(t 1 , tN ) .. .

C(tN , t 1 )... C(tN , tN )

 =:^ N^ (m,^ Σ),

where m and Σ can be calculated explicitly. We can generate a N (m, Σ) random vector from a vector Y = (Y 1 ,... , YN )T^ of IID N (0, 1) random variables by using Lemma 2.4 in the lecture notes. If C denotes a solution of CCT^ = Σ, which can be obtained e.g. by Cholesky decomposition, it holds that X = m + CY ∼ N (m, Σ).

  1. Suppose that (^) ( Z 1 Z 2

∼ N

m 1 m 2

σ 11 σ 12 σ 21 σ 22

=: N (m, Σ),

The conditional distribution of Z 2 conditional on Z 1 is given by

fZ 2 |Z 1 (z 2 |z 1 ) = ∫ fZ^1 ,Z^2 (z^1 , z^2 ) R fZ^1 ,Z^2 (z^1 , z^2 ) dz^2

=

Z(z 1 ) exp

z 1 − m 1 z 2 − m 2

)T (

σ 11 σ 12 σ 21 σ 22

z 1 − m 1 z 2 − m 2

where Z(z 1 ) is the normalization constant. Using the Schur’s complement or Cramer’s formula, we calculate ( σ 11 σ 12 σ 12 σ 22

σ 11 σ 22 − σ^212

σ 22 −σ 12 −σ 12 σ 11

so

(z − m)T^ Σ(z − m) = 1 σ 11 σ 22 − σ^212

z 1 − m 1 z 2 − m 2

)T (

σ 22 −σ 12 −σ 12 σ 11

z 1 − m 1 z 2 − m 2

σ 11 σ 22 − σ^212 (σ 11 z 2 z 2 − 2 σ 11 m 2 z 2 − 2 σ 12 z 2 (z 1 − m 1 )) + C(z 1 )

= σ^11 σ 11 σ 22 − σ^212

z 2 −

m 2 + σ^12 σ 11 (z 1 − m 1 )

  • C(z 1 ).

where C(z 1 ) is a (changing) constant independent of z 2. This shows that

fZ 2 |Z 1 (z 2 |z 1 ) ∝ exp

2 γ(z 1 ) (z^2 −^ μ(z^1 ))

2

where μ(z 1 ) = m 2 + σ σ^1211 (z 1 − m 1 ) and γ(z 1 ) = σ^11 σ^22 −σ (^212) σ 11. Suppose now that we have already generated Xt 1 , Xt 2 ,... , Xtn and that {Xt} is a Markov process. Since {Xt} is a Markov process,

E[Xtn+1 |Xt 1 ,... , Xtn ] = E[Xtn+1 |Xtn ]

the one used in class. Letting Yt = |Xt|^2 and using Itˆo’s formula, we have

dYt = “2Xt dXt” + |σXt|^2 dt = 2λ|Xt|^2 dt + 2σ|Xt|^2 dWt + σ^2 |Xt|^2 dt = (2λ + σ^2 )Ys ds + 2σ Ys dWs.

In integral form, this is

Yt − Y 0 =

∫ (^) t 0

(2λ + σ^2 )Ys ds +

∫ (^) t 0

2 σ Ys dWs,

Letting f (t) = E[Yt] and differentiating the previous equation, we obtain a differential equation for f : f ′(t) = (2λ + σ^2 )f (t) ⇒ f (t) = f (0) e(2λ+σ^2 )t.

In order for this function to converge to 0 as t → ∞, it is necessary and sufficient that

2 λ + σ^2 < 0.

The update formula for the θ Euler method reads, in the case of scalar gBM,

(1 − λ θ ∆t)Xn+1 = Xn

1 + λ(1 − θ)∆t + σ

∆t ξ

that is, assuming 1 − λ θ ∆t 6 = 0,

Xn+1 = Xn

1 + λ(1 − θ)∆t + σ

∆t ξ 1 − λ θ ∆t

Therefore

E|Xn+1|^2 = E|Xn|^2 E

1 + λ(1 − θ)∆t + σ

∆t ξ 1 − λ θ ∆t

= E|Xn|^2 |1 +^ λ(1^ −^ θ)∆t|

(^2) + σ (^2) ∆t | 1 − λ θ ∆t|^2.

The discrete time approximation is mean-square stable if and only if

|1 + λ(1 − θ)∆t|^2 + σ^2 ∆t | 1 − λ θ ∆t|^2 <^1 ⇔ |1 + λ(1 − θ)∆t|^2 − | 1 − λ θ ∆t|^2 + σ^2 ∆t < 0 ⇔ ∆t(2λ + σ^2 + ∆t(1 − 2 θ)λ^2 ) < 0 ⇔ 2 λ + σ^2 + ∆t(1 − 2 θ)λ^2 < 0.

When θ = 12 , this condition becomes 2λ + σ^2 < 0, which is the mean-square stability condition for the underlying equation: the stability region of the numerical solution coincides with that of the continuous solution.