A cognitive communication jamming strategy based on Transformer and Deep Reinforcement Learning

Abstract

The advent of sophisticated communication technologies, such as cognitive radio and anti-jamming techniques, has significantly elevated the challenge of disrupting enemy communications. Nevertheless, the inherent openness of wireless communications remains a vulnerability that can be exploited to interfere with them. Some contemporary Reinforcement Learning (RL)-based jamming strategies examine methods for rapidly identifying the optimal jamming strategy for a specific modulated signal. However, such algorithms lack the flexibility and responsiveness required to effectively counter the enemy’s evolving communication strategies. To address this issue, we propose a Transformer and Deep Reinforcement Learning (DRL)-based jamming strategy that can be trained to identify jamming methods for multiple digital and analog signals. In particular, the Transformer Encoder is employed as a network for DRL to process the state information pertaining to the enemy communication. Subsequently, the decision module of the Double Deep Q Network (DDQN) is utilized to select the jamming action based on the processed information. Furthermore, we have devised a reward function and constructed an invalid jamming list, with the objective of selecting an action that requires low power consumption and enhances the convergence speed of the algorithm. The experimental results demonstrate that the algorithm proposed in this paper exhibits notable performance advantages in comparison to other networks and DRL algorithms.

Introduction

In the context of the military field, wireless communication plays a significant role in a number of key areas, including the transmission of intelligence and the facilitation of battlefield command. The ability to disrupt enemy communication during combat has a significant impact on the likelihood of achieving battlefield victory [1]. However, the advent of frequency hopping communication [2], direct spread spectrum communication [3], adaptive technology, and anti-jamming communication technology [4], particularly the utilization of cognitive radio technology [5], [6], has rendered the interception of enemy communication a progressively challenging endeavor. In such a scenario, a straightforward approach would be to utilize high-power noise to mitigate interference within the designated frequency band [7]. However, this jamming method not only consumes a considerable amount of energy but also has the potential to disrupt the communication of the own side. In a real-world setting, the communication infrastructure on the own side must be maintained in real-time and at a high quality, and the energy available for jamming is limited. Therefore, it is imperative to develop low-power, precise, intelligent, and adaptive jamming strategies for enemy communications.

The study of communication jamming strategies has recently attracted the attention of researchers, with a number of significant findings emerging from the field. The conventional approach to jamming employs game theory, optimization theory, and other theoretical techniques to identify the optimal parameters for jamming [8]. However, these research methods are contingent upon the a priori information of the communicating parties and the environment. In scenarios where a priori information is unavailable, the aforementioned theories are not applicable. Reinforcement learning [9] does not require a priori information, can interact with the environment in real time, and obtain feedback information from it to guide action. As a result, it can be effectively applied to the study of jamming strategies. In the field of radar countermeasures, numerous jamming strategies based on RL have been proposed. Gong put forth an algorithm for radar jamming based on the Discriminative Deep Dyna-Q algorithm [10]. Xia selected the optimal jamming frequency based on the Q Learning algorithm for the jamming problem of frequency-quick-change radar [11]. Zhang put forth a collaborative jamming methodology based on PER-DDQN to address the intelligent decision-making challenge in radar countermeasures, and devised a system of multiple jammers to counteract multifunctional grouping radars [12]. Furthermore, RL has been employed in the domain of communication countermeasures. Amuru [13] put forth a jamming bandit algorithm founded upon a multi-armed slot machine framework, which is capable of identifying the optimal physical layer parameters for an attack. Subsequently, Zhuan Sun enhanced the ϵ-greedy algorithm and proposed the greedy bandit algorithm [14], which is capable of discerning the optimal jamming strategy in the absence of any a priori information about the communication signals and has a shorter learning time than the jamming bandit algorithm. However, the modulation styles employed in these studies are limited to AWGN, BPSK, and QPSK, which may result in the learned jamming strategy being suboptimal. To address this issue, Zhuan Sun proposed the use of orthogonal decomposition to achieve diverse jamming styles [15], and subsequently employed the orthogonal decomposition algorithm to develop an unconventional jamming approach for constellation diagram distortion signals [16]. Later, Zhou [17] demonstrated the continuity of the reward function associated with the strategy space constructed in the literature [15], and proposed an algorithm that can learn the optimal jamming strategy in a more expeditious manner based on this theory.

Nevertheless, previous jamming methods based on RL are only trainable for a single state. Consequently, when the enemy modulates the signal, the optimal jamming parameters must be re-identified, which is not a viable approach in rapidly changing environments. To address this issue, this paper proposes a novel jamming strategy based on DRL. The system is capable of interfering with multiple communication signals in a rapid and effective manner through the process of training. In order to interfere with as many modulated signals as possible, this paper considers a variety of digital and analog signals. The key to identifying an effective jamming action is to consider how to deal with the state information of these communication signals. The Transformer is capable of extracting effective features from state information through the use of a multi-head attention mechanism, and has demonstrated considerable success in the domain of signal processing. Yao employed the Transformer to achieve signal equalization in underwater visible light communication channels [18]. Chang and Chen, on the other hand, have utilized the Transformer for signal classification and recognition tasks [19], [20], respectively. Additionally, Transformer has been utilized in the domain of wireless interference recognition [21]. The integration of Transformer with DRL has also been employed in the context of web firewall [22] and communication anti-jamming fields [23]. Accordingly, this paper employs Transformer as a network for DRL to process state information.

This paper proposes an interference strategy based on Transformer and DRL. The main work can be summarized as follows.

•A communication interference model is established to simulate the real environment, considering various digital and analog signals.
•By modeling the interference model, the interference action is designed in a three-dimensional space encompassing modulation type, interference power to communication power ratio, and duty cycle. DRL’s advantage in handling multi-state and multi-action spaces is utilized to select the interference action. A reward function is devised to guide the algorithm towards selecting low-power actions.
•We propose a communication interference strategy based on the Transformer and DRL, leveraging the multi-head attention mechanism of the Transformer to extract state features used as a network component of DRL. Additionally, we establish an invalid jamming list to reduce interactions with the environment, thereby improving algorithm convergence speed.

The remainder of the paper is structured as follows: Section 2 describes the system model constructed in this paper, Section 3 gives a detailed description of the algorithms used in this paper and the structure of the network, Section 4 presents the experimental setup and the results, and finally, the work is summarized in Section 5.

Access through your organization

Check access to the full text by signing in through your organization.Access through your organization

Section snippets

Process of communication interference

The objective of this paper is to investigate cognitive interference strategies for multiple modulated signals based on DRL. Firstly, a communication interference model is established, as illustrated in Fig. 1. The model comprises four components: the sender, the receiver, the interferer, and the channel. The channel component is responsible for modeling the impact of environmental noise and other factors on the signal.

The overall flow of the model is as follows. Transmitter sends communication

Jamming strategy based on Transformer and Double Deep Q Network

This paper examines the development of precise, adaptable, and low-power interference strategies for multiple modulated signals, which necessitate decision-making for a multitude of states. RL is a process that enables an agent to learn from its interactions with the environment, making it an ideal approach for decision-making tasks. DRL extends this concept by allowing agents to process data, transform it into a hidden state, and extract information from it. Accordingly, this paper employs DRL

Experimental setup

The algorithms in this paper are implemented based on the PyTorch library using the software PyCharm 2023.2.3 on a computer equipped with Intel Core i7 CPU, 16 GB RAM and NVIDIA GeForce 4060 Ti GPU and optimized with Adam optimizer. The experiments in this paper are based on simulation with the channel noise set to AWGN with mean 0 and variance 1 and SNR of 20 dB. The formula for SNR is shown in Eq. (11). SNR=10×log10PTPN

PT represents the power of the communication signal, PN denotes the

Discussion of research limitations

This paper proposes a jamming strategy based on Transformer and DRL, which is capable of effectively disrupting a range of communication signals. The experimental results demonstrate that the method proposed in this paper can reduce the number of interactions, accelerate the convergence rate, and enhance the stability of the performance.

Nevertheless, the current experiments are solely based on simulations, which cannot fully replicate the impact of the real environment on the signal.