In [2]:
import gym
import numpy as np

class SailingEnv(gym.Env):
    def __init__(self):
        self.action_space = gym.spaces.Discrete(3)
        self.observation_space = gym.spaces.Box(low=0, high=10, shape=(2,), dtype=np.int64)
        self.wind_direction = np.array([0, 1])
        self.position = np.array([0, 0])
        self.rudder_angle = 0
        self.target = np.array([8.0, 9.0])

    def reset(self):
        self.wind_direction = np.array([0, 1])
        self.position = np.array([0, 0])
        self.rudder_angle = 0
        return self.position

    def step(self, action):
        # Define a vector of rudder angles for each possible action
        rudder_angles = np.array([-1, 0, 1])
    
        # Set the rudder angle based on the action
        self.rudder_angle = rudder_angles[action]
        
        wind_angle = np.arctan2(self.wind_direction[1], self.wind_direction[0])
        boat_velocity = self.wind_direction - 5 * np.array([np.sin(wind_angle + np.deg2rad(self.rudder_angle)), np.cos(wind_angle + np.deg2rad(self.rudder_angle))])
        relative_velocity = boat_velocity - self.wind_direction
        boat_direction = np.arctan2(relative_velocity[1], relative_velocity[0])
        heading_angle = boat_direction + np.deg2rad(self.rudder_angle)
        self.position += boat_velocity.astype(int)
        self.wind_direction = np.array([0, 1], dtype=float)

        reward = np.cos(heading_angle - wind_angle)
        done = self.position[0] >= 10 or self.position[1] >= 10 or self.position[0] < 0 or self.position[1] < 0
        return self.position, reward, done, {}

env = SailingEnv()

# Q-learning algorithm
num_episodes = 100000
alpha = 0.5
gamma = 0.9
epsilon = 0.1
#q_table will have 10X10X4 cells
q_table = np.random.uniform(low=-2, high=0, size=([10,10] + [env.action_space.n]))
LEARNING_RATE = 0.1
DISCOUNT = 0.95
EPSILON=0.9

for i in range(num_episodes):
    done = False
    episode_reward = 0
    observation = env.reset()
    
    while not done:
        if np.random.random() < epsilon:
            # take a random action with probability epsilon
            action = env.action_space.sample()
        else:
            # choose the action with the highest Q-value
            action = np.argmax(q_table[observation[0], observation[1], :])
        
        next_observation, reward, done, _ = env.step(action)
        next_action = np.argmax(q_table[next_observation[0], next_observation[1], :])
        
        # update Q-value using Q-learning update rule
        q_table[observation[0], observation[1], action] += alpha * (reward + gamma * q_table[next_observation[0], next_observation[1], next_action] - q_table[observation[0], observation[1], action])
        observation = next_observation
        episode_reward += reward
        
        # Output the information for the current step
       
    print(f" New Pos={next_observation}, RWD={reward}, EP_REW={episode_reward}, Done={done}")

if (i + 1) % 100 == 0:
    print(f"Episode {i+1}/{num_episodes}, Reward: {episode_reward}")
    


 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.123233995736766e-17, Done=True
 New Pos=[-4  1], RWD=6.123233995736766e-17, EP_REW=6.12323399573

In the line of code wind_angle = np.arctan2(self.wind_direction[1], self.wind_direction[0]), the arctan2 function is used to compute the angle between the self.wind_direction vector and the x-axis, which represents the angle of the wind direction relative to the horizontal axis.

The arctan2 function computes the angle between the positive x-axis and the point (y, x) in the Cartesian plane, given as its two arguments (y, x). Note that the y coordinate is given as the first argument and the x coordinate is given as the second argument, which is the opposite order from what you might expect in the np.arctan function.

In this case, self.wind_direction[1] represents the vertical component of the wind direction vector, and self.wind_direction[0] represents the horizontal component of the wind direction vector. By passing these components as arguments to arctan2, we can compute the angle between the wind direction and the x-axis.

The resulting wind_angle variable is the angle between the wind direction and the x-axis, in radians. Note that the arctan2 function returns values in the range [-π, π], which is equivalent to the range [-180°, 180°] when converted to degrees.

In summary, the arctan2 function is used to compute the angle of the wind direction relative to the horizontal axis, which is useful for determining how the wind will affect the motion of a ship or boat in the simulation.






Here's a step-by-step explanation of what's happening:

wind_angle is the angle of the wind direction relative to the horizontal axis, computed using np.arctan2 as explained in my previous response.

np.deg2rad(self.rudder_angle) converts the rudder_angle value from degrees to radians. This value is added to wind_angle to account for the effect of the rudder on the boat's motion.

np.array([np.sin(wind_angle + np.deg2rad(self.rudder_angle)), np.cos(wind_angle + np.deg2rad(self.rudder_angle))]) creates a 2-dimensional vector that represents the direction of the boat relative to the wind direction. The sin and cos functions are used to compute the x and y components of the vector, respectively, based on the angle of the wind direction and the rudder angle.

5 * np.array([np.sin(wind_angle + np.deg2rad(self.rudder_angle)), np.cos(wind_angle + np.deg2rad(self.rudder_angle))]) multiplies the boat direction vector by a scalar value of 5, which represents the speed of the boat relative to the wind direction. This gives us the velocity vector of the boat relative to the wind direction.

self.wind_direction - 5 * np.array([np.sin(wind_angle + np.deg2rad(self.rudder_angle)), np.cos(wind_angle + np.deg2rad(self.rudder_angle))]) subtracts the velocity vector of the boat relative to the wind direction from the wind direction vector to get the velocity vector of the boat in the absolute reference frame.

the value of 5 is an arbitrary value chosen by the author of the code to represent the speed of the boat relative to the wind direction. The actual speed of the boat in the simulation will depend on many factors, such as the size and shape of the boat, the force of the wind, and any other forces or constraints in the environment.

The "speed of the boat relative to the wind direction" refers to how fast the boat is moving with respect to the direction of the wind.

When a boat moves on water, its speed and direction of motion are affected by the surrounding environment, such as the water current, waves, and wind. In sailing, the speed and direction of the boat relative to the wind direction are particularly important, because they determine the boat's ability to sail upwind or downwind, and how fast it can reach a given destination.

The velocity vector of the boat relative to the wind direction, which is calculated in the line of code 5 * np.array([np.sin(wind_angle + np.deg2rad(self.rudder_angle)), np.cos(wind_angle + np.deg2rad(self.rudder_angle))]), represents how fast the boat is moving in the direction of the wind, relative to the wind itself. This velocity vector is subtracted from the wind direction vector to obtain the boat's velocity vector in the absolute reference frame.

In summary, the "speed of the boat relative to the wind direction" is a measure of how fast the boat is moving in the direction of the wind, and it is an important factor in sailing and boat navigation.