As a core component of mechanical transmission systems, the fault diagnosis of ball screw pairs is of great significance for ensuring the reliability of equipment. Currently, the main problems faced in fault diagnosis are the difficulty in extracting nonlinear features from vibration signals and the insufficient fusion of multimodal information. In view of this, this paper proposes a vibration signal image encoding method based on Gramian Angular Difference Field (GADF) and combines it with the Convolutional Neural Network-Gated Recurrent Unit-Attention Mechanism (CNN-GRU-Attention) model to achieve fault classification. Specifically, the one-dimensional vibration signal is first converted into a two-dimensional angular image by GADF, which retains the temporal features while enhancing the spatial expression ability of the signal; then, CNN is used to extract the spatial features of the image, GRU is used to capture the temporal dependencies, and the attention mechanism is introduced to optimize the weight distribution of key features. Experimental results show that the average diagnostic accuracy of this model on the ball screw pair fault dataset reaches 100%, which is significantly better than traditional machine learning models and single neural network models, fully verifying the effectiveness and robustness of this method.