subject

Consider the following gridworld MDP. The states are grid squares, identified by their row and column number (row first). The agent always starts in state (1,1), marked with the letter S. There are two terminal goal states, (2,3) with reward 5 and (1,3) with reward -5. Rewards are 0 in non-terminal states. (The reward for a state is received as the agent moves into the state). The transition function is such that the intended agent movement (Up, Down, Left, or Right) happens with probability .8. With probability .1 each, the agent ends up in one of the states perpendicular to the intended direction. If a collision with a wall happens, the agent stays in the same state. +5
S -5
Which of the following is the optimal policy for this grid ?
A. Right Right +5
Up Left -5
B. Down Left +5
Right Up -5
C. Right Down +5
Up Right -5
D. Right Right +5
Right Right -5

ansver
Answers: 2

Other questions on the subject: Computers and Technology

image
Computers and Technology, 22.06.2019 04:30, zetrenne73
How can you know if the person or organization providing the information has the credentials and knowledge to speak on this topic? one clue is the type of web site it is--the domain name ".org" tells you that this site is run by a nonprofit organization.
Answers: 2
image
Computers and Technology, 22.06.2019 10:40, vannahboo2022
Program using c++ only on visual studio pig is a simple two player dice game, played with one die. the first player to reach or surpass 50 is the winner. each player takes a turn rolling the dice. they add to the pot with each roll, having to decide to roll again and increase the pot, or cash out. the risk being they could lose the amount they’ve accumulated into the pot. the rules for each player’s die roll. 1. roll the dice. a. if user rolled a 1, i. the pot gets set to zero ii. the other player goes to step 1. b. a roll of 2-6 is added to the pot. 2. user can choose to hold or roll again. a. choice roll. return to step 1. b. choice hold. i. increment player score by the pot amount. ii. pot gets set to 0. iii. second player gets to roll and goes to step 1. program requirements: ● before each opponent begins ○ output the score for the person and the computer. ○ output the opponents whose turn is beginning and ask the user to hit enter to continue. ● with each dice roll. ○ output the die value, and amount of the round pot. ○ if it’s the users roll ask if they want to roll again ( r ) or hold ( h ). your program should allow r, r, h or h as valid input. if input is anything else, ask the user again until valid input is obtained. ○ the ai will continue playing until the round pot is 20 or more. ● once a player’s score is greater or equal to 50 then they have won, it will no longer ask if they want to keep rolling the die or not. ● once there is a winner ○ score totals are output along with who the winner was. user or computer ○ player is asked if they want to play again y or n. valid input should be y, y, or n, n. ● when a new game starts the starting roll goes to the player that did not roll last. if the user rolled last in the previous game, then the computer rolls first and vice versa. when the program first begins, the player will make the first roll of the first game. development notes : ● you will need a way to roll dice in your program. the rand() function works well, but returns an integer. if we want numbers 0 – 9 we can get the value modulus 10. ● call srand() with a value to seed it. it’s common to seed it with the current computer clock, include ctime, and then call srand(time(
Answers: 1
image
Computers and Technology, 22.06.2019 16:20, kimmmmmmy333
Octothorpe is another name for what common computer keyboard symbol?
Answers: 1
image
Computers and Technology, 23.06.2019 07:00, sugaree95
What are three software programs for mobile computing?
Answers: 1
You know the right answer?
Consider the following gridworld MDP. The states are grid squares, identified by their row and colum...

Questions in other subjects:

Konu
Mathematics, 20.11.2020 23:10