Skip to:

Distributed Processing of Big Data for Military Applications

Vojislav Stojkovic, Morgan State University

Distributed Computing (Fig 1) is a computing technique of dividing a problem into small sub problems, each of which is solved by one or more processors. For distributed computing, different independent processors are brought together into a cluster to solve a problem. Data are computed on distributed way across the cluster. The nodes communicate with each other through the network with message passing. All nodes in distributed computing have own memory. Machine failure is one of the main problems in distributed computing because most often, inexpensive processors and memories are bringing together. They may fail any time. Distributed computing becomes fault tolerance when data are replicated and saved for future reference.

Big Data is an extremely large and complex, exponentially growth data set that it is difficult to process and analyze using traditional data processing software tools and techniques.

Military Data grow daily and consequently. Military big data – databases are larger and larger. Military big data are data sets such as:

  • images taken from a satellite-drone, surveillance camera, radar, etc.
  • data recorded by sensors
  • positions of mobile phones
  • e-mail messages
  • equipment and spare part of any ship, airplane, etc.

Data Science is the extraction of knowledge from data. It employs techniques and theories drawn from many fields within the broad areas of computer science, information technology, mathematics, statistics, including computer programming, high performance computing, machine learning, visualization, pattern recognition and learning, data mining, data engineering, data warehousing, data compression, predictive analytics, uncertainty modeling, probability models, statistical learning, signal processing, etc.

Data Science – Big Data may help the military to process data on rational, faster, and flexible way.

Data Analytic may help military experts:

  • to create a common operating picture (COP) – a single unified image
  • to analyze and correctly interpret COP

It is a daunting task to process databases using serial algorithms-operations. Distributed technology, such as MapReduce, speed, time and memory are aptly optimized with remarkable increase in Floating Point Operations per second.

MapReduce is a programming model that is utilized for processing big data. It consists of the Map function - breaking data into processable bits - and the Reduce function - the aggregation of the consequent processed results.

Hadoop (Fig 2) is a reliable and scalable tool for distributed computing, data storage and processing. It is open source software for writing and executing applications in a cluster. Hadoop is an implementation of the MapReduce model. Hadoop was originally implemented in Java. Hadoop was first created by Doug Cutting in 2004.

Goal of this research is to help the US Army to process big data on rational, faster, and flexible way.

Sub goals:

1. Build, configure, test, and use a fully-distributed Hadoop cluster of many nodes

2. Develop distributed MapReduce (Fig 3) database processing sort, search, append, split, cut, copy, generate, insert, etc. algorithms for fast database processing

3. Implement the distributed MapReduce database processing algorithms in Java/Hadoop programming language

4. Apply the distributed MapReduce database processing algorithm implemented in Java/Hadoop for solving military problems

Expected Results:

MapReduce is

  • a highly flexible cost effective technology that can be used easily within any level of computing infrastructure
  • a very efficient and effective performance infrastructure with automated data handling and fault tolerance.

References:

Huo, H., Lin, S., Yu, Q., Zhang, Y., and Stojkovic, V., A MapReduce-based Algorithm for Motif Search, 2012 International Workshop on High Performance Data Intensive Computing, 2012 IEEE IPDPS HPDIC, Shanghai, China, 2012.

Stojkovic, V. and Huo, H., Exploring MapReduce with Functional Programming Languages, Third International Conference on the Virtual Computing Initiative, Research Triangle Park, North Carolina, USA, 2009.