With the cloud storage getting cheaper and the cloud processing getting more powerful, the cloud becomes the best choice for the storage and analysis of data collected by enterprises. The innate characteristics of the cloud computing, that is, cheap storage and good performance computing make it better than other traditional technologies serving the industry.
Low Requirements of the Cloud Computing for the Configuration of the Client and the Server, and Low Cost
Hadoop cluster is relatively cheap, there are two main reasons. Its required software is open source, so it can reduce the cost. In fact, you can free download Apache Hadoop distribution. At the same time, the Hadoop cluster controls the costs by supporting commercial hardware. So you don’t have to buy the server hardware to build a powerful Hadoop cluster [9]. One of the core concepts of the cloud computing is to reduce the processing load of the user terminal through constant improvement of the processing capacity of the “cloud”. Clients only need to input and output, and all other functions such as computing, storage and processing are managed by the “cloud”. Users only need to order relevant services of the “cloud” according to their own needs. In addition, the storage equipments of the “cloud” can be cheap PCs, even old computers. Compared with the single professional storage equipments with large volume, the “cloud” has larger storage capacity and lower storage cost, and can realize dynamic upgrade and extensions according to the demand.
Cloud Computing Can Offer Massive Computing and Storage Capacity, Has a Great Deal of Extensibility
Like any other type of data, an important problem faced by big data analysis is also the increasing amount of data. And the biggest advantage of big data is that it can realize real-time or near real-time analysis and process. Hadoop cluster parallel processing capabilities can significantly improve the analysis speed, but with the increase of the amount of data to analysis, the cluster’s capacity is likely to be affected. But thankfully,by adding additional cluster nodes can effectively extend the cluster. The cloud computing can gather resources such as memories, hard drives and CPUs of all nodes into a giant virtual cooperative working pool of resources, providing storage and computing services for the outside together. With the increase of nodes, the capacity of storage and computing can unlimitedly increase.
Cloud Computing Can Provide Storage with High Reliability and Security
Data collected in all parts of the industry such as production and sales will be stored multiple service nodes of the cloud with multiple copies. Data stored in the cloud will not be affected even if accidentally deleted. And there is no need of fearing virus invasion and the data loss caused by hardware damage.
Industry Cloud Storage Model Construction Based on Hadoop Technology
Hadoop is a tool which can realize data storage expansion by using standard hardware and can distribute data among many low-cost computers. After data distribution follows the difficulties of data location and handling which can be solved by Map Reduce. Map Reduce provides a framework, and data in a cluster are parallel processed among many nodes. It is allowed to map the processing to many location data and cut similar data elements to a single result.
Aimed at challenges faced by the construction of big data storage system at the present, on the basis of the advantages of the Hadoop technology, we put forward the industrial cloud storage model based on the Hadoop platform, which can effectively solve problems such as processing capacity limitation, storage capacity limitation and single point failure. In the cloud computing technology, data collected by sensing equip‐ments, bar codes, two-dimensional codes, RFID and so on are provided in the form of saas (software-as-a-service) in the cloud, terminals are responsible for collecting data and sending data to cloud applications, and present massive intuitive data as well as statistic results in all parts of the industrial production, These data are generally widely distributed and unstructured, but Hadoop is very suitable for this kind of data because the work principle of Hadoop is that the data is split into slices, each “slice” will be analyzed by assigning to a specific cluster nodes. The data distribution is not have to uniform, for each shard is processed separately on each independence cluster nodes.
which can not only provide the independency of the platform, but also reduce the possibility of problems caused by loading central server and the single point fault. Meanwhile, using MapReduce model can also avoid single fault in the calculation of lots of high dimensional data. This framework showed in Fig. 1 is divided into the front end and the back end from top to bottom.
The Front End
The front end consists of the sensor, the bar code, the QR code, RFID and other mobile devices, which is used for collecting data and presenting optimized information. Signals collected from all parts of the industrial production are uploaded to the back end through this application for the further processing, and are presented in the appropriate form. This information can provides basis for decision making, bringing deep knowledge of the industry and competitive advantages.
The Back End
The back end is the core of this model, mainly including three modules such as the WEB server, MapReduce algorithm and HDFS cloud storage. WEB server is responsible for the communication between the Hadoop cluster and the WEB interface as well as receiving sensor signals from the front end and displaying all kinds of optimized data.
A Hadoop cluster with multiple nodes is responsible for solving parallel processing tasks. Each node has a copy of MapReduce java program and MapReduce is responsible for the large-scale parallel computing of industrial big data according to a certain algo‐rithm. As a distributed file system, HDFS is used to store big data generated in all parts and provides high throughput for accessing application data, ensuring seamless transfer of data among servers and improving the reliability of the whole system.
More info at Jawest.