PDF(990 KB)
PDF(990 KB)
PDF(990 KB)
大规模非结构化数据资源快速存储方法研究
Research on fast storage methods for large scale unstructured data resources
非结构化数据资源具有较高的研究价值,伴随着信息化技术、互联网技术应用范围的扩大,非结构化数据资源规模随之增大,对其存储技术提出了较大的挑战,因此提出了大规模非结构化数据资源快速存储方法,采用层次聚类算法分组处理非结构化数据资源。以某一组非结构化数据资源为对象,结合数据资源传输距离、节点能量、传输方向等因素,确定非结构化数据资源转发路径,描述非结构化数据资源存储过程,制定分层扩展存储机制,从而实现大规模非结构化数据资源的快速存储。实验数据表明,在不同实验工况背景下,应用本文方法后获得的非结构化数据资源存储速率最大值为1 920 MB/s,非结构化数据资源存储位置准确性最大值为98%。
Unstructured data resources contain great research value.With the expansion of the application range of information technology and internet technology,the scale of unstructured data resources increases,which poses a great challenge to its storage technology.A rapid storage method for large-scale unstructured data resources is proposed.The hierarchical clustering algorithm is used to group unstructured data resources.Taking a group of unstructured data resources as the object,combining the transmission distance,node energy,transmission direction and other factors of data resources,the forwarding path of unstructured data resources is determined,the storage process of unstructured data resources is described,and the storage mechanism of hierarchical expansion is formulated,so as to realize the rapid storage of large-scale unstructured data resources.The experimental data shows that the maximum storage rate of unstructured data resource obtained by the proposed method is 1 920 MB/s under different experimental conditions,and the maximum storage location accuracy of unstructured data resource is 98%,which fully confirms the better application performance of the proposed method.
数据资源 / 非结构化 / 安全存储 / 存储机制 / 快速存储
data resources / unstructured / secure storage / storage mechanism / fast storage
| [1] |
高健, 魏峻, 许利杰, 等. 基于预分区策略的装备数据分布式存储方法[J]. 计算机科学与探索, 2021, 15(1):96-108.
随着传感器技术和计算机技术的发展,装备在研制生产过程中会产生大量的数据,这些数据是海量的、多源的和异构的,企业需要考虑如何将数据进行快速处理和存储管理,进而利用加工后的数据提升装备生产制造能力。对卫星、飞机等典型装备数据进行了研究,提出了一种基于预分区策略的分布式数据存储方法。该方法研究HBase的预分区机制和装备数据模型特点,研究装备数据快速存储的影响因子,并给出了数据快速存储算法,使海量装备数据可以负载均衡地、快速地存储在HBase数据库里。最后,对模型的数据存储性能、负载均衡性、各类装备的适用性进行了评估试验。试验结果表明,该方法可以覆盖多种类型的装备数据,并在数据存储效率上有良好的表现。
|
| [2] |
李朝奎, 王露瑶, 周新邵, 等. 基于HBase的矢量空间数据存储与查询方法及其应用[J]. 地理科学, 2022, 42(7):1146-1154.
研究了HBase存储机制,针对现有存储查询方法效率低等缺陷,设计了HBase矢量空间数据存储表模式,如行键、过滤列族、几何列族及非几何列族等,以MapReduce算法为基础改进了原有的区域查询方法,上述改进有效提高了HBase中矢量空间数据查询效率。以某地近100 a地质灾害数据进行实验,结果表明:设计的存储模型可行,查询算法与传统查询算法相比效率更高;由于MapReduce运行过程中的通信等原因,当数据量小于5万级时,算法优势并不明显;当数据量大于10万级时,算法查询时间低于原来的1/2,而数据量达到100万级时,算法查询时间仅为算法改进前查询时间的1/20。数据量越大,并行化处理优势越明显。
Based on the research of HBase storage mechanism, this article aims at the low efficiency of existing storage query methods, the HBase spatial data storage table patterns such as row key, filter column family, geometric column family and non-geometric column family are designed, and the original region query method is improved based on MapReduce algorithm, the above improvements effectively improve the query efficiency of vector spatial data in HBase. The experiment was carried out with the data of geological hazards in recent 100 years. The results show that the storage model designed in this article is feasible, and the query algorithm is more efficient than the traditional query algorithm. Due to the communication in the process of MapReduce, when the amount of data is less than 50 000 byte, the advantages of this algorithm are not obvious; When the amount of data is more than 100 000 byte, the query time of this algorithm is less than 1/2 of the original; The query time of the algorithm is only 1/20th of that before the improvement of the algorithm when the amount of data reaches 1 million byte. The greater the amount of data, the more obvious the advantages of parallel processing. |
| [3] |
赵越, 余红英, 王一奇. 一种高速数据存储方法的设计与验证[J]. 数据采集与处理, 2021, 36(2):384-390.
|
| [4] |
康海燕, 邓婕. 面向医疗数据安全存储的增强混合加密方法[J]. 北京理工大学学报, 2021, 41(10):1058-1068.
|
| [5] |
宋红娟. 中国旅游产业融合的趋势和模式变化—基于非结构化数据[J]. 管理评论, 2023, 35(1):97-107.
|
| [6] |
吴万青, 赵永新, 王巧, 等. 一种满足差分隐私的轨迹数据安全存储和发布方法[J]. 计算机研究与发展, 2021, 58(11):2430-2443.
|
| [7] |
曾梦, 邹北骥, 张文生, 等. 多模态医疗数据中海量小文件存储优化方法[J]. 软件学报, 2023, 34(3):1451-1469.
|
| [8] |
米启超, 赵红梅, 林丽萍. 基于多通道卷积神经网络的非结构化数据标注[J]. 计算机仿真, 2021, 38(6):400-404.
|
| [9] |
黄安琪, 杨文晖, 苗放. 微服务下DRC非结构化数据注册引擎设计[J]. 计算机工程与设计, 2022, 43(12):3570-3579.
|
| [10] |
喻波, 王志海, 孙亚东, 等. 非结构化文档敏感数据识别与异常行为分析[J]. 智能系统学报, 2021, 16(5):932-939.
|
/
| 〈 |
|
〉 |