cover image: Congestion management in the GTHA : Congestion management in the Greater Toronto and Hamilton Area

Premium

20.500.12592/5qvhxx

Congestion management in the GTHA : Congestion management in the Greater Toronto and Hamilton Area

2 Apr 2013

In each iteration, k, the agent observes the current state s, chooses and executes an action a that belongs to the available set of actions A, and then the Q-Factor is updated according to the obtained reward r(s,a) and the state transition to state s'. [...] The simplest way to extend RL to the MARL is to consider the local state and local action for each agent assuming a stationary environment and that the agent’s policy is the prime factor affecting the environment. [...] The agent is the learner and the decision-maker that interacts with the environment by first receiving the system’s state and the reward and then selecting an action accordingly. [...] The state-space and the action-space are distributed such that the agent learns the joint policy with one of the neighbours at a time following the principle of modular Q-learning. [...] Reward Definition: The Reduction in the Total Cumulative Delay The immediate reward for certain agent is defined as the reduction (saving) in the total cumulative delay associated with that agent, i.e., the difference between the total cumulative delays of two successive decision points.
education economics technology public transport science and technology research commuting intelligent transportation systems mathematical optimization mathematics transportation road transport transit transport state congestion pricing road pricing traffic teaching and learning ttc toronto transit commission rapid transit optimization reinforcement learning shuttle bus multi-objective optimization q-learning

Authors

Abdulhai, Baher

Pages
120
Published in
Ottawa, Ontario

Related Topics

All