Big Data-Survey

Big Data-Survey

March 2016 | PSG Aruna Sri, Anusha M
Big data refers to large and complex data sets that are difficult to process using traditional data handling applications. It encompasses challenges in analysis, capture, storage, sharing, transfer, visualization, and privacy. Big data is characterized by the 5 Vs: volume, velocity, variety, veracity, and value. It has the potential to help organizations improve operations and make faster, more intelligent decisions. The term "big data" has become common in IT businesses, and it refers to massive volumes of structured and unstructured data that are too large or too fast to process with conventional methods. Hadoop is an open-source framework that allows for distributed processing of large data sets across clusters of computers. It is designed to handle hardware and software failures, and it is scalable and distributed. Hadoop uses the MapReduce framework, which is inspired by Google's MapReduce. Hadoop includes HDFS, a distributed file system that stores data across multiple machines. Hadoop is used for various applications, including web search, email spam detection, recommendation systems, and genome sequencing. Other tools used for big data include Apache Pig, which is a platform for processing large data sets using a high-level language called Pig Latin. Apache Hive is a data warehouse system built on top of Hadoop for data analysis and querying. Apache HBase is a distributed database that runs on top of HDFS. Despite the advancements in big data technologies, there are still challenges in security, data processing, and data integration. Researchers are working to improve the security and reliability of big data systems, and to develop new methods for data analysis and processing. The future of big data is expected to involve more complex data sets, faster processing, and more advanced analytics.Big data refers to large and complex data sets that are difficult to process using traditional data handling applications. It encompasses challenges in analysis, capture, storage, sharing, transfer, visualization, and privacy. Big data is characterized by the 5 Vs: volume, velocity, variety, veracity, and value. It has the potential to help organizations improve operations and make faster, more intelligent decisions. The term "big data" has become common in IT businesses, and it refers to massive volumes of structured and unstructured data that are too large or too fast to process with conventional methods. Hadoop is an open-source framework that allows for distributed processing of large data sets across clusters of computers. It is designed to handle hardware and software failures, and it is scalable and distributed. Hadoop uses the MapReduce framework, which is inspired by Google's MapReduce. Hadoop includes HDFS, a distributed file system that stores data across multiple machines. Hadoop is used for various applications, including web search, email spam detection, recommendation systems, and genome sequencing. Other tools used for big data include Apache Pig, which is a platform for processing large data sets using a high-level language called Pig Latin. Apache Hive is a data warehouse system built on top of Hadoop for data analysis and querying. Apache HBase is a distributed database that runs on top of HDFS. Despite the advancements in big data technologies, there are still challenges in security, data processing, and data integration. Researchers are working to improve the security and reliability of big data systems, and to develop new methods for data analysis and processing. The future of big data is expected to involve more complex data sets, faster processing, and more advanced analytics.
Reach us at info@study.space