面向高频动态到达的天文卫星机遇目标任务规划方法

王旭航; 吴海燕

doi:10.11728/cjss2024-0125

面向高频动态到达的天文卫星机遇目标任务规划方法

doi: 10.11728/cjss2024-0125 cstr: 32142.14.cjss2024-0125

王旭航^{1, 2},
吴海燕^1,

1.
中国科学院国家空间科学中心
2.
中国科学院大学

基金项目: 中国科学院战略性先导科技专项（A类) 空间科学（二期）地面支撑系统科学卫星任务运控技术

计量
- 文章访问数: 201
- HTML全文浏览量: 22
- PDF下载量: 9
- 被引次数:
  0(来源:Crossref)
  
  0(来源:其他)
出版历程
- 收稿日期: 2024-10-08
- 录用日期: 2025-01-13
- 修回日期: 2025-01-08
- 网络出版日期: 2025-03-10

Scheduling methods for astronomical satellite Target of Opportunity tasks with high-frequency dynamic arrivals

摘要

摘要: 以巡天设备每天将探测到数以万计的变源天体以及对变源天体的观测需求增长为背景，形成了由高频动态到达的机遇目标（ToO）及其后随观测任务组成的长序列任务规划问题。该问题具有观测事件随机性、数据获取高时效性、可选择性多和约束复杂的特点，常被视为 NP 难题，因此获取监督学习的标签数据不易。而针对采用无监督学习的深度强化学习（DRL）方法求解长序列任务规划问题时，卫星作为智能体难以快速收敛至全局最优策略。为此本文借鉴局部注意力（LA）机制的思想对指针网络（PN）进行改进，提出局部注意力指针网络（LA-PN）算法。该算法通过引入时间窗口的方式，使模型专注于对当前决策有重要影响的序列部分，减少了无效探索。通过仿真结果对比分析，验证算法的收益性、实时性和泛化性。
- 机遇目标（ToO） /
- 后随观测 /
- 任务规划 /
- 深度强化学习（DRL） /
- 局部注意力指针网络（LA-PN）
Abstract: In the context of the growing demand for observing a vast number of variable celestial objects detected by sky survey equipment every day, the long sequence task planning problem consisting of high-frequency dynamic Target of Opportunity (ToO) events and follow-up observations has the characteristics of observation event randomness, high timeliness of data acquisition, multiple selectable options, and complex constraints, often considered an NP-hard problem. Consequently, obtaining labeled data for supervised learning is challenging. When applying unsupervised learning through deep reinforcement learning (DRL) methods to solve the long-sequence task planning problem, satellites as agents find it difficult to quickly converge to a global optimal strategy. To address this, this paper draws on the concept of local attention (LA) to improve the pointer network (PN), proposing the Local Attention Pointer Network (LA-PN) algorithm. This algorithm introduces a time window to focus the model on the crucial sequence parts for the current decision, reducing ineffective exploration. Simulation results demonstrate the algorithm's profitability, real-time performance, and generalization ability.
- Target of Opportunity (ToO) /
- follow-up observation /
- task scheduling /
- deep reinforcement learning(DRL) /
- local attention pointer network (LA-PN)