The paper "Big Data-Survey" by PSG Aruna Sri and Anusha M from K L University, Andhra Pradesh, India, provides an overview of big data, its challenges, and solutions. Big data refers to large, complex datasets that are difficult to process using traditional methods. The authors highlight the need for larger datasets to address issues such as business trends, disease prevention, and crime detection. They discuss the fiveVs of big data: Volume, Velocity, Variety, Veracity, and Value, emphasizing the importance of handling these aspects effectively.
The paper delves into the Hadoop architecture, a key tool for managing big data. Hadoop, developed by Doug Cutting and Mike Cafarella, is an open-source system designed for reliable, scalable, and distributed computing. It uses the MapReduce framework, which divides tasks into smaller, manageable parts and processes them in parallel across multiple servers. Hadoop is particularly useful for handling complex data and has applications in web search, email spam filtering, recommendation systems, and genomics.
The authors also cover other tools like PIG, HIVE, and HBase, which are built on Hadoop and enhance its capabilities. PIG simplifies data processing by providing a high-level language, HIVE offers a data warehouse structure for analysis, and HBase is a column-family database service that runs on HDFS.
Despite significant advancements, the paper acknowledges ongoing challenges, including security issues and the need for more advanced techniques to handle advanced threats. The authors conclude by emphasizing the ongoing growth of big data and the importance of continued research and development to meet future demands.The paper "Big Data-Survey" by PSG Aruna Sri and Anusha M from K L University, Andhra Pradesh, India, provides an overview of big data, its challenges, and solutions. Big data refers to large, complex datasets that are difficult to process using traditional methods. The authors highlight the need for larger datasets to address issues such as business trends, disease prevention, and crime detection. They discuss the fiveVs of big data: Volume, Velocity, Variety, Veracity, and Value, emphasizing the importance of handling these aspects effectively.
The paper delves into the Hadoop architecture, a key tool for managing big data. Hadoop, developed by Doug Cutting and Mike Cafarella, is an open-source system designed for reliable, scalable, and distributed computing. It uses the MapReduce framework, which divides tasks into smaller, manageable parts and processes them in parallel across multiple servers. Hadoop is particularly useful for handling complex data and has applications in web search, email spam filtering, recommendation systems, and genomics.
The authors also cover other tools like PIG, HIVE, and HBase, which are built on Hadoop and enhance its capabilities. PIG simplifies data processing by providing a high-level language, HIVE offers a data warehouse structure for analysis, and HBase is a column-family database service that runs on HDFS.
Despite significant advancements, the paper acknowledges ongoing challenges, including security issues and the need for more advanced techniques to handle advanced threats. The authors conclude by emphasizing the ongoing growth of big data and the importance of continued research and development to meet future demands.