关于Exercise 3.8的疑问

《Reinforcement Learning, An Introduction》（2rd Edition）的Excercise 3.8是这样的：

Suppose $\gamma=0.5$ and the following sequence of rewards is received $R_1 = 1, R_2 = 2, R_3=6, R_4=3,$ and $R_5=2$ , with $T=5$ . What are $G_0,G_1,...,G_5$ ? Hint: Work backwards.

在一份网上流传的Sutton本人给出的答案（未经验证）是这样的：

$G_0=2,G_1=3,G_2=2,G_3=\frac{1}{2},G_4=\frac{1}{8},G_5=0$

毫无疑问， $G_5=0$ 。但是根据Return和Reward的递推公式：

$G_t=R_{t+1}+\gamma G_{t+1}$

很容易得到：

$G_4=R_5+\gamma G_5=2+0=2$

同理可得 $G_3=4,G_2=8,G_1=6,G_0=2$ ，这和Sutton给出的答案差异太大了！

通过正向的Return计算公式：

$G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2R_{t+3}+\gamma^3R_{t+4}+...\gamma^{T-t-1}R_{T-t}$

可以得到同样的结论。

难道是哪里理解错了？还是Sutton给出了的答案不对？希望各位看到的同道指点一二，多谢！