AGV Scheduling System Technical Documentation

AGV Scheduling System Technical Documentation

AGV Scheduling System Technical Documentation

Version Editor Date Changes
1.0.0 Kang Hao 30/06/2023 Initial Framework
1.1.0 Kang Hao 09/05/2024 Blog Modified

Design Background and Objectives

Two Navigation Methods and Usage Scenarios for AGV/AMR

Automated Guided Vehicles (AGV) and Autonomous Mobile Robots (AMR) are widely used in modern industrial and logistics scenarios. Navigation technology is a core component of AGVs and AMRs, with SLAM (Simultaneous Localization and Mapping) and QR code navigation being two common methods. These different navigation methods largely determine the usage scenarios of the vehicles:

SLAM Navigation

Applicable scenarios:

  1. Complex environments easy to locate, such as factories, warehouses, logistics centers, etc.;
  2. Scenarios requiring rapid adaptation to environmental changes, such as frequent changes in warehouse goods layout and factory production line adjustments.


  • Performs real-time localization and mapping without the need for pre-marking the environment;
  • Can automatically recognize and avoid obstacles, adapting to dynamic environments;
  • High accuracy and real-time capability, supporting collaboration among multiple devices;
  • Requires high computational power and sensor precision, relatively costly.
QR Code Navigation

Applicable scenarios:

  1. Relatively fixed environments, such as production lines, sorting areas, etc.;
  2. Scenarios requiring low cost, easy deployment, and maintenance.


  • Pre-arranged QR codes are needed, with relatively fixed navigation routes;
  • High precision and stability in QR code-based navigation;
  • Simple system deployment, relatively low cost;
  • Minimally affected by environmental disturbances (like lighting, stains), but struggles with dynamic environmental changes.

Design Objectives for the Scheduling System

The designed scheduling system needs to:

  • Be compatible with the above navigation methods and scenarios, supporting mixed scheduling (multiple navigation methods on the same map, compatible with different heights of LiDAR for unified positioning);
  • Meet industry standards and requirements, supporting functionalities like generation, editing, and saving of two types of maps; setting of waypoints, task editing, user permissions, and other basic features;
  • Ensure system security and stability.

For ease of discussion, AGV/AMR will be collectively referred to as robots.

System Architecture Design and Requirements Implementation

National Standards for Scheduling Systems

According to GB/T 41402-2022, "Logistics robots—General technical specification for information systems", the information systems for logistics robots can be divided based on roles into:

  • Management subsystem
  • Scheduling subsystem

In simple terms, the management subsystem is responsible for user interaction and data input, including permission editing for different accounts and corresponding statistical data presentation. Its basic functions include:

  • Map management: Generating two types of maps, editing map files (like removing noise from SLAM maps), linking maps with vehicles, deleting maps, etc.;
  • Robot management: Adding robots, linking robots with maps, deleting robots, grouping robots, etc.;
  • Peripheral management: Managing peripherals compatible with the scheduling system communication protocols, such as curtain door controllers, elevator controllers, etc.;
  • Task management: Creating new tasks, editing tasks, starting tasks, pausing tasks, canceling tasks, deleting tasks, etc.;
  • Exception management: Features an exception dashboard capable of clearing soft errors and resetting vehicle statuses;
  • Statistics management: Features a statistical data dashboard, which allows for daily/weekly/monthly/yearly views of vehicle mileage, operation time, completed tasks, or viewing all statistics for a particular vehicle;
  • Permissions management: Ability to set master and sub-accounts with different permissions, including the ability to add, delete maps, vehicles, tasks, etc.;
  • Operation monitoring: Features a real-time dashboard to monitor the location and status of robots on different maps.

The scheduling subsystem is the main control system, responsible for direct communication with robots and peripherals, controlling robot movements, handling task execution, and various exception processes. Its basic functions include:

  • Path planning: Planning the shortest route with fewest turns based on the robot’s load status and the network of roads;
  • Task allocation: Choosing the most suitable robot for a task based on task type and the status of different robots on the map, while monitoring task execution and handling task breakpoint recovery;
  • Traffic control: Responsible for congestion management when multiple robots encounter each other, with capabilities to preemptively stop robots to avoid collisions and deadlock issues through replanning routes;
  • Space management: Capable of setting robot physical dimensions to calculate their occupied space on the network, aiming to support scheduling of different types, models of robots on the same map;
  • Robot maintenance: Capable of automatic/manual recharging of robots under different charging thresholds, instructing idle robots to move to "rest areas", and terminating charging when a set battery level is reached;
  • Exception handling: Capable of managing task

    interruptions/pauses/resumptions, various robot avoidance, emergency stops, recoverable fault handling, and escalating exception information.

According to the national standard, the designed system architecture is as follows:

Scheduling System Design.png

The diagram shows the logistics robot information system within the solid lines, while the dashed lines represent the device layer and business layer interacting with the system. Moreover, the system meets the following conditions:

  • Supports continuous fault-free operation (24/7), capable of exception handling, breakpoint recovery, and space reclamation;
  • Capable of persistently storing critical data, restoring to the previous operational state upon restart;
  • Supports expansion of AGV numbers compliant with communication protocols;
  • Supports clients in creating and editing navigation maps;
  • Supports expanded storage capabilities without affecting business operations;
  • Supports business enhancements through software upgrades, system expansions, and adding external connected devices;
  • Possesses some disaster recovery capabilities, supporting distributed deployment;
  • The system communication protocol supports encryption requirements, including communication with peripherals, meeting mainstream encryption methods like SM 4.

Design Principles and Concepts

As illustrated above, the complete robot information system is divided into a management subsystem and a scheduling subsystem. The management subsystem can largely be seen as the interface directly interacting with operators, while the scheduling subsystem is responsible for logical implementation.

Combining existing scheduling software and previous industry experience, I would like to introduce a concept followed during the design of my scheduling system: product thinking and microservices style. By implementing these two concepts, user experience has been enhanced, developers and deployment scenarios have been decoupled, software iteration efficiency has been improved, facilitating experimentation with more algorithms, and enhancing the overall competitiveness of the project.

Product Thinking is a way of thinking in software development that focuses on the entire lifecycle of a product. It starts from the perspectives of users and business values, focusing on solving user problems, enhancing user experiences, achieving business goals, and continuously innovating and optimizing during this process.

Microservices is a software architecture style that divides a large application into multiple small, loosely coupled services. Each service has its own business functions, data storage, and operational environments. Each microservice can be independently developed, deployed, and scaled, making the application easier to maintain and update. Microservices interact through lightweight communication protocols (such as HTTP RESTful API). This architectural style helps to enhance the scalability, resilience, and fault tolerance of the system while reducing the complexity of system development and operation.

Combining these two concepts results in:

  1. User Experience: Microservices enable development teams to respond more quickly to user demands and update product features, thus enhancing user experience.
  2. Business Agility: Adopting a microservices architecture allows teams to focus on their respective business domains, improving business agility. When product requirements change, teams can quickly adjust their business directions and focus their efforts on developing key features.
  3. Innovation Capacity: The microservices architecture reduces the risks of experimenting with new technologies and features. Development teams can try new technologies and methods without affecting the entire system. This helps to enhance the product’s innovation capacity and competitiveness.
  4. Scalability: The microservices architecture makes scaling the application easier. Products can flexibly expand or reduce services based on business needs to adapt to the ever-changing market environment.

In summary, the microservices architecture helps products remain competitive in a rapidly changing market environment, achieving agile development, rapid iteration, and high-quality user experiences. Additionally, product thinking requires development teams to pay more attention to user needs, aiming to provide better products and services.

Examples illustrating each point:

  • User Experience: When developing the scheduling system, thorough research on usage scenarios and user groups was conducted. The main scenarios are divided into two major categories: one involves material transportation along workshop production lines, offering SLAM navigation AGVs. The user group consists of production line workers, typically with middle to high school education. The operating process involves workers using call devices communicating with the scheduling system or "call" and "picked up" buttons provided on Android tablets to generate tasks for AGVs to move to and from different locations. This requires the system interface provided to users to be simple and easy to operate, while the control panel can be moved anywhere, necessitating an interface and system decoupling and lightweight design to meet operating conditions. The other category is the warehousing logistics scenario, primarily using QR codes, where an important aspect of QR code maps is the generation of array maps, with consistent spacing between waypoints, and the need for batch deletion of regional waypoints.

  • Business Agility: This is reflected in team division and cooperation. Excluding the product manager, the team consists of four personnel: a front-end engineer, a UI engineer, a backend engineer, and a scheduling system development engineer. When establishing major version development plans, it is convenient to set different goals based on each person’s division of labor, such as more user-friendly operation logic and display effects, richer backend functions, and a more robust scheduling system. As long as established communication

    protocols are followed and respective release conditions are met, a reliable solution can be provided to customers.

  • Innovation Capacity: Reasonable module division allows for asynchronous iteration of technology in each domain. For example, the front-end colleague used Google Bloks to provide a task arrangement interface, which is simple to use. Meanwhile, the scheduling is experimenting with different solvers to improve efficiency and reduce resource consumption.

  • Scalability: Scalability is reflected both in the functional iterations under different divisions of labor and in the implementation and optimization of different modules within the scheduling subsystem. For example, the task control module later added AGV grouping and vehicle selection, adapting to multi-floor, multi-vehicle scenarios. The newly added AGV avoidance module uses sub-task indexing from the task control module to accelerate feature iteration speed.

Communication Protocol Selection and Analysis

When choosing communication protocols, we need to consider connection duration, framework support, and business requirements. According to the designed system architecture, appropriate protocol selection is needed between each business layer.

As shown in the diagram, two communication protocols are used between the frontend and backend:

  • http: Used for API interface communication, complying with RESTful standards, completing functions such as user registration, authentication, map modification, task generation, etc.;
  • websocket: Used for establishing a long connection between the backend and frontend, pushing alert information from the scheduling subsystem, such as AGV collision prevention bar activation, task failure, task completion, etc.

As shown in the diagram, one communication protocol is used between the backend and the scheduling subsystem:

  • http: Used to obtain edited map waypoints, task information, task status changes to assign AGVs, etc.;
  • Message queue: A message queue based on Redis is used for the backend to asynchronously obtain status information sent by the scheduling, including task status, AGV body status, etc.

As shown in the diagram, one communication protocol is used between the scheduling subsystem and the AGV body:

  • MQTT: A common subscribe-publish communication protocol for the Internet of Things, using agreed-upon topics to transfer status, tasks, control, alerts, and other data.

Functional Modules and Component Division

To enhance the system’s maintainability, reusability, flexibility, and stability, the division of system functional modules and components follows the design principle of "high cohesion, loose coupling".

AGV Manager.png

AGV Dispatcher.png

Management Subsystem

Since the final solution has not yet been determined, other methods might be used to implement interfaces, etc. Here, we briefly describe the implemented architecture of the management subsystem.

Django is used as the backend framework for reasons including:

  • Security: Django provides built-in protection against many common web security vulnerabilities, such as cross-site request forgery, cross-site scripting attacks, and SQL injection, reducing operational pressure.
  • Scalability: The Model-View-Template design pattern allows separation of database models from business logic, designing different apps according to different needs, such as:
    • map: Includes creating, modifying, saving, retrieving maps, setting AGV positioning confidence;
    • agv: Includes linking maps, adding, deleting, grouping vehicles, etc.;
    • task: Includes creating, modifying, deleting, activating, pausing, canceling operations, etc.;
    • user: Includes creating, deleting, setting permissions for master and sub-accounts.
  • High performance: Integrating memcached and Redis.

The management subsystem and scheduling system both use the Python technology stack, simplifying the variety of technologies used by the team.

By configuring Nginx to agree on different version numbers of api, multi-version deployment and switching can be implemented.

The management subsystem diagram is as follows:


Scheduling Subsystem

The scheduling subsystem is the core part of the logistics robot information management system. Unlike the various implementation methods of the management subsystem, the basic modules of the scheduling subsystem have clear functional logic and performance indicators. Below, we detail the architecture and functional logic of the scheduling subsystem.

Dispatch Subsystem.png

Communication Module

The communication module is responsible for the communication between the scheduling system and the AGV state machine as well as the management subsystem backend, making it one of the core modules. Requirements include:

  • Using

    the http protocol for two-way communication with the backend and other systems (such as elevator group controllers), with encryption capabilities;

  • Using the MQTT protocol for communication with the body, with a single service required to support communication with at least 100 vehicles, reducing latency as much as possible, reducing database IO pressure, supporting specific data writing to Redis, and persisting critical data storage;
  • Supporting message queue asynchronous communication with other functional modules.

The MQTT bus module is the scheduling subsystem module that implements communication with the AGV state machine. Main functions include:

  • Detecting AGV heartbeats to determine if they are offline;

  • Receiving subtopics sent by the AGV state machine at 5 Hz, including:

    • agv_state: Real-time coordinates, confidence, status, real-time battery level, etc.;
    • task_status: Task status, progress, etc.;
    • traffic_sign: Traffic control communication, receiving stop commands, etc.

    Issuing task information and control commands to the AGV body, including:

    • map_info: Map information edited by the frontend, etc.;
    • posi_info: Position verification;
    • task_info: Generated task information, including waypoint tasks;
    • traffic_ctrl: Traffic control information responsible for stopping at points;
    • ctrl_command: Task cancellation, body status clearing, etc.
  • Writing key data to Redis, passing specific topics to other modules through the message queue;

  • Using asyncio architecture to reduce resource consumption during the IO process.

MQTT Bus.png

http Bus

The http bus is the scheduling subsystem module responsible for communication with the backend and other peripherals, with major domestic peripheral manufacturers like Wanglong Elevator Controllers using the http protocol for communication.


  • Obtaining map information from the backend;
  • Obtaining task information from the backend;
  • Reporting alert information;
  • Communicating with peripherals.

HTTP Bus.png

Task Scheduling Module

The task scheduling module is responsible for task execution, vehicle selection, subtask completion, task interruption/breakpoint recovery, etc.

The business process of the task scheduling module is roughly as follows:


The core logic includes:

  • The AGV selection logic evaluates based on task type, AGV distance cost to the task’s starting point, status, AGV health information (battery level, total mileage, etc.), and selects the most suitable vehicle for the task;
  • Monitoring AGV task progress and subtask completion status, persistently storing data to facilitate breakpoint recovery or reporting failure reasons;
  • Capable of performing tasks at set times or occurrences.
Path Planning Module

The path planning module, as a core component of the scheduling subsystem, provides an API interface encapsulated with start and end points and spatial information for requesting planned paths. To achieve efficient path planning, we use an improved A* algorithm, dynamically calculating the following function to find the shortest path:

$$f(n) = g(n) + h'(n)$$

where $h'(n)$ represents the heuristic cost, including aspects such as:

  • Estimated distance from the current node to the endpoint;
  • Rotation cost from the current node to adjacent nodes;
  • Adjacent node’s free status.

To ensure the algorithm effectively balances distance costs and other costs when the two points are very close or very far apart, the key lies in adjusting the magnitude relationships of various heuristic costs.

Moreover, the path planning module uses Python standard data structures and access methods to ensure the shortest path search time and low resource usage.

The process diagram is as follows:

Path Finding.png

Traffic Control Module

The traffic control module is an important module that ensures smooth operation of multiple vehicles on the same map, preventing AGVs from colliding or locking up. AGVs operate on the network using a reserve-occupy strategy, similar to existing systems, but with some modifications to avoid delays during long-path clearances. Additionally, to reduce server pressure, a request-response communication method was designed, turning parallel operations into asynchronous ones, saving unnecessary resource consumption.

The process diagram is as follows:

Scheduling System Design - Traffic Control.jpg

This is a traffic control logic widely used in the industry. First, an initial path is given, allowing the AGV to start moving from the starting point. During the AGV’s movement, whenever it passes a waypoint, it reports arriving at that waypoint via MQTT. The traffic control module then determines if the forward N waypoints are free. If free, the response is passable; if the forward up to the Nth waypoint is occupied by another vehicle, the traffic control module responds for the AGV to stop at the Nth point and wait for the occupation to be lifted. After the occupation is lifted, a continue running command is issued, allowing the AGV to proceed.

In scenarios where multiple vehicle paths overlap, this is a typical "deadlock" scenario in the scheduling field. At this time, the scheduling module evaluates based on the AGV’s battery level, load, and other factors, choosing the AGV with the smallest movement cost for replanning. The replanning process considers the occupation status of the waypoints. If a feasible path exists, the scheduling module generates a new path and issues it to the AGV, thereby breaking the "deadlock" scenario.

The process diagram is as follows:

Traffic Lock.png

Robot Maintenance Module

The robot maintenance module controls AGV operations when AGVs are not in working status. It is responsible for the following functions:

  • Guiding AGVs to rest points when idle;
  • Automatically ending tasks and returning to charge based on set battery thresholds;
  • Automatically leaving the charging station after charging to the specified threshold;
  • Balancing the deployment of AGVs throughout the entire area.

In terms of the return-to-charge function, handling exceptional scenarios is particularly crucial. Charging stations typically support AGVs automatically docking and manual charging. A typical exceptional scenario is: when an AGV triggered for automatic return-to-charge heads to the nearest charging station, the charging station originally reserved by that AGV is manually occupied by another vehicle. At this time, the robot maintenance module automatically detects this situation and interrupts the AGV’s action to the original charging station, replanning the nearest free charging station. Similar logic also applies to scenarios heading to rest areas.

AGV Auto Recharge.png

Single-Step Operation Module

Single-step operations are often used in scenarios such as single-machine testing, demonstrations, site trial runs, etc., stripping away task commands to control AGVs. This is not elaborated on here.

Other Functional Modules
AGV Avoidance Module

In specific scenarios, when conditions trigger AMR to avoid, it is necessary to clear the space ahead of the incoming vehicle, requiring the AMR to move to a pre-set avoidance point to avoid until the incoming vehicle passes, then continue the current task.

This scenario occurs because the incoming vehicle is not controlled by this scheduling system, making it impossible to predict the incoming vehicle’s movement pattern and control it, or because it has higher traffic authority, such as certain workshop-operated small trains, etc.

Elevator Control Module

The elevator control module interfaces with mainstream elevator controller manufacturers on the market, such as the previous collaboration with Wanglong Intelligence, the largest elevator controller technology company in China, widely used in intelligent elevator scenarios in office buildings, hospitals, and warehousing scenes.

The elevator control module includes:

  • Calling the elevator;
  • Position maintenance;
  • AGV entering and exiting the elevator.

Basic functions, enabling the scheduling subsystem to have cross-floor scheduling capabilities.

Encryption/Decryption Module

The encryption/decryption module is responsible for the system’s encryption requirements.

Breakpoint Recovery and Exception Handling Mechanism

In the AGV scheduling system, the breakpoint recovery and exception handling mechanism are crucial for system stability and reliability. To ensure effective handling of various issues and exceptions, we need to consider system design, task exception handling, and vehicle exception handling comprehensively.

System Design

During the system design phase, we need to ensure the system has good modularity and scalability. Each module should have clear responsibilities and interfaces to quickly locate and resolve issues when they arise. Additionally, we need to implement a robust communication mechanism to ensure normal information exchange between systems in unstable network environments.

Communication interruption handling logic:

Communication Interruption.png

Task Exception Handling

Task exceptions may include task allocation failures, task execution timeouts, task interruptions, etc. To address these issues, we need to implement the

following functionalities in the system:

  • Persistently storing task information: The system needs to be able to persistently store all task information (including task status, execution progress, etc.) to facilitate task execution recovery in case of exceptions;
  • Task retry mechanism: When task execution fails, the system should have the capability to reallocate tasks or retry execution;
  • Task timeout detection: The system needs to monitor the execution time of tasks. When a task execution times out, the corresponding exception handling process should be triggered;
  • Task interruption handling: When a task is unexpectedly interrupted, the system needs to promptly detect it and take measures, such as reallocating tasks or notifying relevant personnel for intervention.

Task execution timeout handling logic:

Task Timeout.png

Vehicle Exception Handling

Vehicle exceptions may include vehicle malfunctions, insufficient battery, communication interruptions, etc. To address these issues, we need to implement the following functionalities in the system:

  • Vehicle status monitoring: The system needs to continuously monitor the status of vehicles, such as battery level, fault information, etc. When a vehicle experiences an exception, the system should promptly issue an alert and trigger the corresponding handling process;
  • Vehicle fault handling: In case of vehicle malfunction, the system needs to be able to automatically or manually diagnose the fault. Once a problem is diagnosed, the system should notify maintenance personnel to repair it and exclude the faulty vehicle from task allocation;
  • Battery management: The system needs to ensure vehicles have sufficient battery to perform tasks. When a vehicle’s battery falls below a preset threshold, the system should automatically direct the vehicle to a charging station for charging, and reinclude the vehicle in task allocation after charging is completed;
  • Communication exception handling: When communication between a vehicle and the scheduling system is interrupted, the system needs to be able to promptly detect and take measures. For example, attempting to reestablish communication or notifying relevant personnel for intervention.

In summary, the breakpoint recovery and exception handling mechanism plays a vital role in the AGV scheduling system. To ensure system stability and reliability, we need to comprehensively consider and design in aspects such as system design, task exception handling, and vehicle exception handling.

Vehicle fault handling logic:

Vehicle Fault.png

Insufficient battery handling logic:

Insufficient Battery.png

System Metrics

Performance Requirements

The scheduling system performance standards comply with GB/T 41402-2022, which should meet:

  1. Third-party interface call response time not exceeding 3 s;
  2. System critical data storage time not less than three months;
  3. Support for a single management space not smaller than 10,000 m²

Based on subsequent tests, some performance metrics exceed national standards, reflected in:

  • Pathfinding time in a 100×100 m² scenario at the 0.01 s level;
  • Overall time from path generation, task issuance, to AGV action at the 0.1 s level

Communication Requirements

The system network topology diagram is as follows:


According to the national standard, wireless transmission quality should meet:

  • Maximum network latency not exceeding 200 ms;
  • Maximum latency jitter not exceeding 100 ms;
  • Wireless roaming switch time less than 200 ms;
  • Packet loss rate not exceeding $1×10^{-2}$;
  • Packet error rate not exceeding $1×10^{-4}$.






AGV Scheduling System Technical Documentation
AGV Scheduling System Technical Documentation Version Editor Date Changes 1.0.0 Kang Hao 30/06/2023 Initial Framework 1.1.0 Kang Hao 09/05/2024 Blog Modified Desi…