Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy.
The new buzzword is “big data” and the associated solutions encompass a range of technologies and techniques that allow you to extract real, useful, and previously hidden information from the often very large quantities of data that previously may have been left dormant and, ultimately, thrown away because storage was too costly.
The term “big data” is also being used to describe an increasing range of technologies and techniques. In essence, big data is data that is valuable but, traditionally, it was not practical to store or analyze it due to limitations of cost or the absence of suitable mechanisms.
Big data typically refers to collections of datasets that, due to size and complexity, are difficult to store, query, and manage using existing data management tools or data processing applications.
Big data solutions aim to provide data storage and querying functionality for situations such as this. They offer a mechanism for organizations to extract meaningful, useful, and often vital information from the vast stores of data they are collecting.
In most big data circles, these are called the four V’s: volume, variety, velocity, and veracity. (You might consider a fifth V, value). Many vendors often described as a solution to the three or four V’s problem:
MSDN microsoft website uses the following definitions:
Big data solutions typically store and query hundreds of terabytes of data, and the total volume is probably growing by ten times every five years. Storage must be able to manage this volume, be easily expandable, and work efficiently across distributed systems. Processing systems must be scalable to handle increasing volumes of data, typically by scaling out across multiple machines.
It’s not uncommon for new data to not match any existing data schema. It may also be semi-structured or unstructured data. This means that applying schemas to the data before or during storage is no longer a practical proposition.
Data is being collected at an increasing rate from many new types of devices, from a fast-growing number of users, and from an increasing number of devices and applications per user. The design and implementation of storage must be able to manage this efficiently, and processing systems must be able to return results within an acceptable timeframe.
Oracle website uses the definitions below :
Big data has also been defined by the four Vs:
The amount of data. While volume indicates more data, it is the granular nature of the data that is unique. Big data requires processing high volumes of low-density, unstructured Hadoop data—that is, data of unknown value, such as Twitter data feeds, click streams on a web page and a mobile app, network traffic, sensor-enabled equipment capturing data at the speed of light, and many more. It is the task of big data to convert such Hadoop data into valuable information. For some organizations, this might be tens of terabytes, for others it may be hundreds of petabytes.
The fast rate at which data is received and perhaps acted upon. The highest velocity data normally streams directly into memory versus being written to disk. Some Internet of Things (IoT) applications have health and safety ramifications that require real-time evaluation and action. Other internet-enabled smart products operate in real time or near real time. For example, consumer eCommerce applications seek to combine mobile device location and personal preferences to make time-sensitive marketing offers. Operationally, mobile application experiences have large user populations, increased network traffic, and the expectation for immediate response.
New unstructured data types. Unstructured and semi-structured data types, such as text, audio, and video require additional processing to both derive meaning and the supporting metadata. Once understood, unstructured data has many of the same requirements as structured data, such as summarization, lineage, auditability, and privacy. Further complexity arises when data from a known source changes without notice. Frequent or real-time schema changes are an enormous burden for both transaction and analytical environments.
Data has intrinsic value—but it must be discovered. There are a range of quantitative and investigative techniques to derive value from data—from discovering a consumer preference or sentiment, to making a relevant offer by location, or for identifying a piece of equipment that is about to fail. The technological breakthrough is that the cost of data storage and compute has exponentially decreased, thus providing an abundance of data from which statistical analysis on the entire data set versus previously only sample. The technological breakthrough makes much more accurate and precise decisions possible. However, finding value also requires new discovery processes involving clever and insightful analysts, business users, and executives. The real big data challenge is a human one, which is learning to ask the right questions, recognizing patterns, making informed assumptions, and predicting behavior.
Whatever big data challenges your organisation faces, we can provide you with the strategic guidance you need to succeed.