Data is being gathered everywhere on the web nowadays. For example, when you submit your personal information on a website while buying any product online, that website has already collected data from you in the form of your name, email, phone number and address.
If that website is so popular that it is able to sell a product almost every second or even every minute then the data (personal information of customers) it collects is in high volume, velocity and variety. Such data is often termed as Big Data.
Other examples could be railways and flights, where tickets are booked online almost every second. As a result, railway and flight booking systems are collecting data digitally at a very fast pace and in enormous quantity; that is what makes 'Big Data' different from normal data. The major difference is the Five Vs, as mentioned here in more detail.
Normal data is collected at a very slow pace over a long period of time, and so is easy to manage in different formats like spreadsheets, MySQL databases, etc. This is not usually the case with Big Data, though, as it is often terabytes in size and so difficult to handle and process using traditional applications/tools.
Hadoop is the traditional database management system for storing and processing Big Data. A vast amount of raw data is stored in HDFS, the major component of Hadoop, but aggregated/summarised data is sent to MySQL for analysis.
But does your business really need Hadoop for data analytics?
Many businesses don't really 'need' Hadoop, unless they are actually dealing with Big Data. If your inflow of data is slow then a MySQL database can easily do the job. When you buy a web hosting package for your business website, whether shared or a dedicated server, you already get the database - you can access it using PHPMYADMIN in the control panel of your web hosting.
It is possible to hire a PHP (a programming language) programmer who can develop scripts to store your data in a MySQL database and then perform the data analytics on it, as per your company's requirements. Data analytics is nothing but analysing the data as desired and then sorting it to get the benefit out of it in some way.
Suppose you are running an e-commerce website to sell your products online. Let us assume that you receive four to five orders every day, on average. Since you are receiving a very low number of daily orders, your inflow of data, in the form of customers' information, will also be at a slow pace, thus not requiring Hadoop; a simple MySQL database will do the job.
Now, if you have been collecting data at this pace for past two years and you need to perform analytics on it, then it will still be data analytics and not Big Data analytics, as the MySQL queries can still work for such small data.
Hadoop is only required when MySQL queries don't work to analyse the tremendous amount of data collected over the years (say five to 10 years), or the inflow of data is at a fast pace. In that case, you need to switch to Big Data analytics.
Suppose the query is to find the product which is receiving the maximum number of orders from a particular city. If you need to perform this query on terabytes of data, then MySQL may not be able to perform the query and in that case, you will need the help of the more advanced system, Hadoop, which is built for Big Data analytics.
If you are at the starting phase of your business you should first consider MySQL database for your data analytics needs; as you progress, and when the time comes when MySQL is unable to handle your queries or inflow of data, then you can make a decision to switch to Big Data system.
How do you assess the result of an invisible policy?
Data is both battleground and weapon
Urgent action required to control murky funding, data misuse and misinformation says Electoral Commission
Campaign materials should have a digital imprint, fines should be larger and social media firms need to do more, says hard-hitting report
Sooraj Shah asks how Intel's $740m investment in Cloudera has altered the dynamics in the Hadoop data platform market