Data Science – The Definition

640px-Minard[1]

Charles Minard’s 1869 chart showing the number of men in Napoleon’s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. Lithograph, 62 x 30 cm

What is ‘Data Science’? The term ‘Data Science’ was first coined by Peter Naur, in 1960. But, in the present context the significance of the term has increased many folds. Here I have listed some of the recent definitions of Data Science.

  • “Hot new Gig in Tech” – Fortune
  • “The ability to take data, to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-That’s going to be a hugely important skill” – Hal Varian, Google’s Chief Economist, NYT, 2009
  • “Data Science is the civil engineering of data. Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible” – Mike Driscoll, CEO of metamarkerkets
  • “Data Science refers to an emerging area of work concerned with the  collection, preparation, analysis, visualization, management and preservation of large collection of information.” – JeffreyStaton, Syracuse Universithy School of Information Studies
  • “Data Science is about Data Products, not just answering questions, Data Products empowers others to use the data, may help communicate your results, may empower others to do their own analysis” – Bill Howe, University of Washington

My perspective of the term ‘Data Science’ is as follows: The term ‘Data Science’ comprises of two words, ‘Data’ & ‘Science’. Let’s define each word separately.

  1. Data:
    1. ‘Facts and statistics collected together for reference or analysis.’
    2. ‘Raw information’
  2. Science:
    1. The intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment.
    2. a systematically organized body of knowledge on a particular subject.

So combining these definitions, we can define Data Science as:

‘A systematic study of the structure and behavior of the facts and statistics, collected together for reference and analysis, through observation and experiment.’

The Beginning – A Bad Start!

Wrong Way

Once I decided to take this journey, the next immediate question was, where to start? Which vehicle to catch? Who can drive me through the course of journey? I started looking for options. There were many institutes, courses to start with. But none of them provided me the convenience of taking the course at my free time and the fees was very high. Then I met found this MOOC ships called ‘Coursera.org’ and ‘EdX.org’. They were just best suited for my constraints of time and money.

The next Question was, which course to start with? I had little bit of programming knowledge, as I am a Computer Science Engineer. I found this course called ‘Mining of Massive Datasets’ which was just started. The course title itself propelled me completely. The buzz of ‘Big Data’ was still in the air and the course title was similar to that. And the course syllabus, the crew and content everything were the best. But before starting I forgot to ask myself one question ‘Am I ready for this course?’ Yes, this was the mistake. I started the course. Everything started to flow over my head.

So I can say, ‘My Journey with Data Science’ was began with a Bad Start. There was nothing wrong with the course. It is me who was not ready. I am planning to take the same course again when I am ready.

But next step was perfect, I will tell about that in the next blog.

Before the Beginning

mygraph

My First Graph

Before I start this long journey, it would be better to answer the question that, Why am I taking this Journey with Data Science? The reason is the above picture! Yes, the same picture.
I work for a bank, as Information Technology officer. I will get access to the huge amount of data. As a matter of interest I tried to analyse the data. I thought Data Interpretation is very simple as we studied some concepts like Mean, mode, median & std. deviation, charts like Bar-Char, Line-Chart, Pie-Chart etc. in Our High School Statistics. I collected <some> data for a period of 45 days and plot it on a Line Chart. The outcome was the above picture. What to interpret from this? This is very confusing and cumbersome.
Then I understood that, today’s real world Data Analysis challenge is not as simple as I thought. Further studies into the area put more light on the area of Data Analysis, Interpretation and I come to know that Data Interpretation is no more a high school statistics subject. It has evolved into an area known as ‘Data Science’ and there many interesting research are going on under this. Scientists consider that data will be the fuel for the growth of 21st century and HBR states that, the Data Science is the ‘Sexiest Job’ of this century. All these facts forced me to take this area seriously and my intuition suggested that, this might the area that can full my desire of doing some quality research. Hence started ‘My Journey with Data Science’.

Note: The work <Something> I have used purposefully to protect the confidentiality of the data. You can imagine any common statistics their. It won’t make any difference

Journey Tools, Guides & Road Map

map_tools

In this Journey I am using following MOOC courses as my Guides and Road Map.

  1. Foundations of Data Analysis – UT Austin – EdX.org
  2. Machine Learning – Stanford University – Coursera.org
  3. Data Science Specialization – JHU – Coursera.org (Series of 9 Courses)
  4. StatLearning – Stanfor University
  5. Patter Discovery in Data Mining – Illinois – Coursera.org
  6. Artificial Intelligence – BerkeleyX – Edx.org
  7. MMDS – Stanford University – Coursera.org

So obviously you will see most of contents from these course only.

The tools which I might be using is:

  1. R – Programming
  2. Octave programming
  3. Python – where required
  4. MySql – Where Required

That’s it for now!