subject

Consider the following gridworld MDP. The states are grid squares, identified by their row and column number (row first). The agent always starts in state (1,1), marked with the letter S. There are two terminal goal states, (2,3) with reward 5 and (1,3) with reward -5. Rewards are 0 in non-terminal states. (The reward for a state is received as the agent moves into the state). The transition function is such that the intended agent movement (Up, Down, Left, or Right) happens with probability .8. With probability .1 each, the agent ends up in one of the states perpendicular to the intended direction. If a collision with a wall happens, the agent stays in the same state. +5
S -5
Which of the following is the optimal policy for this grid ?
A. Right Right +5
Up Left -5
B. Down Left +5
Right Up -5
C. Right Down +5
Up Right -5
D. Right Right +5
Right Right -5

ansver
Answers: 2

Other questions on the subject: Computers and Technology

image
Computers and Technology, 22.06.2019 21:40, tdahna0403
Develop a function to create a document in the mongodb database “city” in the collection “inspections.” be sure it can handle error conditions gracefully. a. input -> argument to function will be set of key/value pairs in the data type acceptable to the mongodb driver insert api call b. return -> true if successful insert else false (require a screenshot)
Answers: 2
image
Computers and Technology, 23.06.2019 12:30, umimgoingtofail
What is the difference between the internet and the world wide web?
Answers: 1
image
Computers and Technology, 23.06.2019 15:00, hunteryolanda82
Based on the current economic situation do you expect the employment demand for graduating engineers to increase or decrease? explain the basis for your answer. with a significant economic recovery, what do you think will happen to future enrollments in graduating engineering programs?
Answers: 1
image
Computers and Technology, 23.06.2019 21:20, nathanfletcher
In microsoft word, when you highlight existing text you want to replace, you're in              a.  advanced mode.    b.  automatic mode.    c.  basic mode.    d.  typeover mode
Answers: 1
You know the right answer?
Consider the following gridworld MDP. The states are grid squares, identified by their row and colum...

Questions in other subjects:

Konu
Mathematics, 23.03.2021 16:20