Thursday, November 24, 2016

Why Big Data?


  • Web and Cloud application created the need to store and process huge amount of data.
  • Traditional RDBMSs do not fit the role.
                        - Only good for numbers, structured and clean data.
                        - Scaling required very expensive hardware.
                        - Fault tolerance was again expensive.
  • Existing processing techniques can not scale without extensive code development.

What is Big Data?

Broad general term for data sets so large and complex that traditional data processing and storage techniques are inadequate.

  • Traditional RDBMS and Business Application.

- Volume ( TB, PB )
- Variety ( Web, Photo, Video, Audio, Unstructured Data, Mobile, Social )
- Velocity ( Batch, Periodic, Real Time )
- Veracity ( Quality of Data - Dirty )

Which programming languages need to know for Big data or Data Science?


  1. First of all need to know R Programming Language for Query from different Data Source.
  2. Python Programming Language must require for Data Science or Big Data. You can use the Language for Data Analyst and Apache Spark.
  3. Scala Programming Language is also require as Prerequisite of Apache Spark. The Language is like Java Programming Language. If you know Java then you will capture the Language easily. So that both Language (Scala and Python) are need to know for Apache Spark