Scheduling methods for astronomical satellite Target of Opportunity tasks with high-frequency dynamic arrivals
-
摘要: 以巡天设备每天将探测到数以万计的变源天体以及对变源天体的观测需求增长为背景,形成了由高频动态到达的机遇目标(ToO)及其后随观测任务组成的长序列任务规划问题。该问题具有观测事件随机性、数据获取高时效性、可选择性多和约束复杂的特点,常被视为 NP 难题,因此获取监督学习的标签数据不易。而针对采用无监督学习的深度强化学习(DRL)方法求解长序列任务规划问题时,卫星作为智能体难以快速收敛至全局最优策略。为此本文借鉴局部注意力(LA)机制的思想对指针网络(PN)进行改进,提出局部注意力指针网络(LA-PN)算法。该算法通过引入时间窗口的方式,使模型专注于对当前决策有重要影响的序列部分,减少了无效探索。通过仿真结果对比分析,验证算法的收益性、实时性和泛化性。
-
关键词:
- 机遇目标(ToO) /
- 后随观测 /
- 任务规划 /
- 深度强化学习(DRL) /
- 局部注意力指针网络(LA-PN)
Abstract: In the context of the growing demand for observing a vast number of variable celestial objects detected by sky survey equipment every day, the long sequence task planning problem consisting of high-frequency dynamic Target of Opportunity (ToO) events and follow-up observations has the characteristics of observation event randomness, high timeliness of data acquisition, multiple selectable options, and complex constraints, often considered an NP-hard problem. Consequently, obtaining labeled data for supervised learning is challenging. When applying unsupervised learning through deep reinforcement learning (DRL) methods to solve the long-sequence task planning problem, satellites as agents find it difficult to quickly converge to a global optimal strategy. To address this, this paper draws on the concept of local attention (LA) to improve the pointer network (PN), proposing the Local Attention Pointer Network (LA-PN) algorithm. This algorithm introduces a time window to focus the model on the crucial sequence parts for the current decision, reducing ineffective exploration. Simulation results demonstrate the algorithm's profitability, real-time performance, and generalization ability. -
-
计量
- 文章访问数: 154
- HTML全文浏览量: 20
- PDF下载量: 8
-
被引次数:
0(来源:Crossref)
0(来源:其他)