全文下载: 202401020.pdf
文章编号: 1672-6987(2024)01-0146-13; DOI: 10.16351/j.1672-6987.2024.01.020
苏桂昌, 张瑞坤, 刘祥鹏*(青岛科技大学 数理学院, 山东 青岛 266061)
摘要: 对于针式打印字体电离层垂测数据扫描图片的像素偏低、字体不连通、文本行粘连无法检测等问题,提出了一种基于CRNN深度学习框架的数据自动提取技术,该技术主要包括图像预处理、文本检测、序列文本识别和识别结果版面处理4个模块。首先,对于3种不同行间距类型的针式打印字体垂测数据扫描图片采用图像模板匹配、降噪处理和倾斜矫正等方法进行图像预处理。然后对预处理后的图片利用投影法进行文本检测加以分割,其中投影分割检测算法中加入了垂直投影、水平投影和检测候选框修正功能,可有效处理粘连文本区域,提高检测精度。最后,考虑到图片数组长度不一,避免切分字符,所以将分割后的文本识别问题转化为序列学习问题,利用CRNN深度学习算法进行文本识别,再通过坐标融合算法,将识别结果保存成Excel标准化格式,从而实现数据自动提取保存。实验结果表明,本研究所提出的算法,文本检测召回率977%,文本识别综合评价指标F值就单个字符识别率9749%,整组字符识别率9478%,并与其他算法进行了比较,验证了其有效性,因此本文所提算法具有较高的实用性,能满足工程应用实际需求
关键词: 电离层; 针式打印字体; 投影分割; 文本检测; CRNN; 文本识别
中图分类号: TP 301.6文献标志码: A
引用格式: 苏桂昌, 张瑞坤, 刘祥鹏. 针式打印字体电离层垂测数据自动提取技术[J]. 青岛科技大学学报(自然科学版), 2024, 45(1): 146-158.
SU Guichang, ZHANG Ruikun, LIU Xiangpeng. Automatic extraction technology of Ionospheric vertical data with pin printer font[J]. Journal of Qingdao University of Science and Technology(Natural Science Edition), 2024, 45(1): 146-158.
Automatic Extraction Technology of Ionospheric Vertical
Data with Pin Printer Font
SU Guichang, ZHANG Ruikun, LIU Xiangpeng
(College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China)
Abstract: Aiming at the problems such as low pixel, disconnected font and undetectable text line adhesion in the scanning images of vertical ionospheric data for pin printer font, an automatic data extraction technique based on CRNN deep learning framework is proposed, which includes four modules: image preprocessing, text detection, sequence text recognition and result layout processing. Firstly, image template matching, noise reduction and tilt correction were used to preprocess the scanned images of three types of pin print vertical data with different line spacing types. Then, text detection and segmentation were performed on the preprocessed images by projection method. In the projection segmentation detection algorithm, vertical projection, horizontal projection and detection candidate frame correction functions were added. It can effectively deal with the cohesive text area and improve the detection accuracy. Finally, considering the different length of the image array, the segmentation of characters is avoided, the segmented text recognition problem is transformed into a sequence learning problem, and the CRNN deep learning algorithm composed of CNN+RNN+CTC is used for text recognition, and then the recognition results are saved into Excel standardized format by coordinate fusion algorithm, so as to realize automatic data extraction and saving. The experimental results show that the algorithm proposed in this paper has a text detection recall rate of 977%, a text recognition comprehensive evaluation index F value of 9749% for a single character recognition rate and 9478% for a whole group of characters recognition rate, and is compared with other algorithms to verify its effectiveness. Therefore, the algorithm proposed in this paper has high practicability and can meet the actual needs of engineering applications.
Key words: ionosphere; pin printer font; projection segmentation; text detection; CRNN; text recognition
收稿日期: 2023-08-31
基金项目: 国家自然科学基金项目(62103215,12001308).
作者简介: 苏桂昌(1999-),男,硕士研究生.*通信联系人.