Plan du site  
pixel
pixel

Articles - Étudiants SUPINFO

Hadoop

Par Nezha EL GOURII Publié le 31/08/2016 à 21:47:14 Noter cet article:
(0 votes)
Avis favorable du comité de lecture

Introduction to Hadoop

So what is Hadoop?

Only recently has this previously obscure platform become immensely popular.

In a nutshell, Hadoop is an open source platform used for the processing and storage of Big Data applications. Written in Java, the core distribution of Hadoop is pretty simple and this is not that hard to learn.

Hadoop has achieved major buzzword status and to many people, Hadoop means Big Data. Since Hadoop is at the center of many Big Data applications, there are many jobs out there available to developers who understand it.

Apache Hadoop is a software framework that consists of components and other tools for analytical and large data warehousing applications, is based on the concept of divide and conquer, it uses distributed processing for computation and storage across different nodes in a cluster.

Hadoop is heavily used by Yahoo, Facebook and many other fortune 500 companies. Originally created in 2005, Hadoop has become immensely popular and has become synonymous with Big Data.

Hadoop is an open source top-level Apache project, like other top-level projects such as Linux, Hadoop is ever evolving as the global community of developers and contributors are constantly adding features and adding improvements.

So what does all this mean for the software developer?

Component of Hadoop

There are two central components to the Hadoop framework.

First the MapReduce framework is designed to compute large volumes of data in parallel. This requires clustering data across multiple machines.

A MapReduce task involves two steps – mapping and reducing. The algorithm for MapReduce is not new, but has become popular because of Hadoop.

First a mapping program reads and performs filtering and sorting, such as sorting employees by last name into queues, then a reduce program then acts as an aggregation step, perhaps summarizing names into like names and counting them.MapReduce operates in parallel over many MapReduce nodes.

Mappers and reducers are unaware of each other and operate independently. Next is the Hadoop Distributed File System, or the HDFS is designed to be used by large file, the default file size of an HDFS segment is 64 megabytes.

The HDFS has a transfer rate of 100 megabytes per second, which has a seek time of about 10 milliseconds. It makes use of two daemons called the master and the name node to manage files and maintain their metadata across the cluster.

Hadoop was originally developed from the Google file system and Google MapReduce, being open source, Hadoop is free and usage is covered under the Apache 2.0 license.

One of the unique design features of Hadoop is it works on the principle of moving computation near data rather than the traditional way of transferring data to programs.

Hadoop users

So who is using Hadoop?

Organizations use Hadoop for many purposes, to state a few :

  • Mobile communication and data management, mobile data storage can be processed and analyzed to identify customer preferences Hadoop can be used for predictive analytics for finding airline tickets or the cheapest hotels near a location. Search engines will analyze an individual's browsing history and produce unique results customized to the use.

  • Google Earth uses Hadoop to obtain, save and analyze satellite pictures. These pictures are categorized by street and Google Earth panoramic views. Google also uses Hadoop to store, analyze and display user reviews from the history of other web sites. In addition, Google leverages Hadoop to display travel suggestions and directions for user entered addresses complete with traffic conditions.

  • E-commerce industries for analyzing customer purchasing history and to recommend products for customers.These customer records will also be used by retailers and branded organizations for knowing the current buying trends.This information could drive new product offerings, forecasting, pattern recognition and deep historical data analytics, these techniques can be used to forecast the weather, identify patterns of fraudulent activity and perform deep historical analysis of customer-purchasing trends. Also in image processing, satellite images can be stored and analyzed.

  • Amazon uses Hadoop to store the purchase history of all of its users so as to suggest to the new buyer the purchasing frequency of items, suggesting new purchases, thus driving new sales. Amazon also stores and analyzes the page history of all of its users providing the feature "what other items do customers buy after viewing this item?" , With the notion of suggesting other products to customers.

  • PayPal uses Hadoop to ensure the security of customer data. Purchasing history is saved and analyzed for specific behavioral patterns for each customer. This eventually helps in the identification of odd patterns to find fraud and theft. This results in improved security of customer data and provides a high-quality data security service.

In IT infrastructure and security, Hadoop is used to gather, analyze, identify, and understand cyber malware attack patterns in order to improve security.

Hadoop has created whole new disciplines in the Big Data field, new job categories are appearing in the fields of data analytics, predictive modeling, data mining, and cloud database development.

.

Conclusion

We will soon live in a world where everything that we do, from eating breakfast to choosing which cellphone plan to buy, we will be predicted based on an enormous amount of data stored and processed by Hadoop.

A propos de SUPINFO | Contacts & adresses | Enseigner à SUPINFO | Presse | Conditions d'utilisation & Copyright | Respect de la vie privée | Investir
Logo de la société Cisco, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société IBM, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Sun-Oracle, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Apple, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Sybase, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Novell, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Intel, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Accenture, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société SAP, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Prometric, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo de la société Toeic, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management Logo du IT Academy Program par Microsoft, partenaire pédagogique de SUPINFO, la Grande École de l'informatique, du numérique et du management

SUPINFO International University
Ecole d'Informatique - IT School
École Supérieure d'Informatique de Paris, leader en France
La Grande Ecole de l'informatique, du numérique et du management
Fondée en 1965, reconnue par l'État. Titre Bac+5 certifié au niveau I.
SUPINFO International University is globally operated by EDUCINVEST Belgium - Avenue Louise, 534 - 1050 Brussels