Skip to main content

What is Big Data ?

Many people believe Big Data is simply a large amount of data, but it is defined by more than just size.
Gartner Definition of Big Data is :" Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

Big Data is described within the Gartner definition based on the three Vs:

  •  Volume: Size of data (how big it is)
  •  Velocity: How fast data is being generated
  •  Variety: Variation of data types to include source, format, and structure

In terms of the three Vs, the Gartner definition effectively says that:
"There is a lot of data, it is coming into the system rapidly, and it comes from many different sources in many different formats."

IT companies are investing billions of dollars into research and development for Big Data, Business Intelligence (BI), data mining, and analytic processing technologies. This fact underscores
the importance of accessing and making sense of Big Data in a fast, agile manner.
Big Data is important; those who can harness Big Data will have the edge in critical decision making. Companies utilizing advanced analytics platforms to gain real value from Big Data will grow faster than their competitors and seize new opportunities.

To support Big Data, modern analytic processing tools must .

  • Shift away from traditional, rearward-looking BI tools and platforms to more forward-thinking analytic platforms.
  • Support a data environment that is less focused on integrating with only traditional, corporate data warehouses and more focused on easy integration with external sources.
  • Support a mix of structured, semi-structured, and unstructured data without complex, time Consuming IT engineering efforts.
  • Process data quickly and efficiently to return answers before the business opportunity is lost.
  • Present the business user with an interface that doesn't require extensive IT knowledge to operate.

Comments

Popular posts from this blog

Defination of the essential properties of operating systems

Define the essential properties of the following types of operating sys-tems:  Batch  Interactive  Time sharing  Real time  Network  Parallel  Distributed  Clustered  Handheld ANSWERS: a. Batch processing:-   Jobs with similar needs are batched together and run through the computer as a group by an operator or automatic job sequencer. Performance is increased by attempting to keep CPU and I/O devices busy at all times through buffering, off-line operation, spooling, and multi-programming. Batch is good for executing large jobs that need little interaction; it can be submitted and picked up later. b. Interactive System:-   This system is composed of many short transactions where the results of the next transaction may be unpredictable. Response time needs to be short (seconds) since the user submits and waits for the result. c. Time sharing:-   This systems uses CPU scheduling and multipro-gramming to provide economical interactive use of a system. The CPU switches rapidl

What is a Fair lock in multithreading?

  Photo by  João Jesus  from  Pexels In Java, there is a class ReentrantLock that is used for implementing Fair lock. This class accepts optional parameter fairness.  When fairness is set to true, the RenentrantLock will give access to the longest waiting thread.  The most popular use of Fair lock is in avoiding thread starvation.  Since longest waiting threads are always given priority in case of contention, no thread can starve.  The downside of Fair lock is the low throughput of the program.  Since low priority or slow threads are getting locks multiple times, it leads to slower execution of a program. The only exception to a Fair lock is tryLock() method of ReentrantLock.  This method does not honor the value of the fairness parameter.

How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service?

 How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service? Answer: Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. A clustered system is less tightly coupled than a multiprocessor system. Clustered systems communicate using messages, while processors in a multiprocessor system could communicate using shared memory. In order for two machines to provide a highly available service, the state on the two machines should be replicated and should be consistently updated. When one of the machines fails, the other could then take‐over the functionality of the failed machine. Some computer systems do not provide a privileged mode of operation in hardware. Is it possible t