DATA SCIENCE
INTRODUCTION
Data
Science is the science by which we can solve the problems by using data. The
problem could be decision making such as identifying which email is spam and which
email is not spam, So the core job of a Data Scientist is to understand the
data and extract all the information out of it then apply this in solving the problems.
They need to
collect all the data which can help to solve the problem. Data collection is a
systematic approach to gather relevant information from a different variety of sources.
Depending on the problem statement, the data collection method is broadly
classified into two categories.
At first, when
you have some problem which is unique and no any related research is done on
the subject. Then you need to collect new data. This method is called as
primary data collection. There is no public data available for these. But you
can collect the data through various methods such as survey, interviews of
employees and by monitoring the time spend by employees.
Another method is
to use the data which is readily available or collected by someone else. These
data can be found in the internet, news articles, all the government census and
so on. This method is called as secondary data collection. This method is less
time-consuming than the primary method.
DATA QUALITY CHECK AND REMEDIATION
The process typically involves detecting and correcting corrupt or inaccurate records by replacing, modifying or deleting the “dirty” data. It can be performed manually, with cleansing tools, as a batch process, through data migration or a combination of these methods.
After collecting
all the data, most people start the analysis on it. Often, they forgot to do a
sanity check on the data. Because if the data is of bad quality, it can give
misleading information.

EXPLORATORY DATA ANALYSIS
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
You can use descriptive statistics such
as central value measures and variability measures. Also, visualisation methods
such as graphs and plots can be used for analysis.
EDA
is primarily used to see what data can reveal beyond the formal modeling or hypothesis
testing task and provides a provides a better understanding of data set
variables and the relationships between them. It can also help determine if the
statistical techniques you are considering for data analysis are appropriate.
DATA MODELLING
Data modelling
means to formulate every step to achieve the solution which we required. We
needs to list down the flow of the calculations which is nothing but modelling
steps to the solution. The main factor is how to perform the calculations. There
are various techniques under Statistics and Machine Learning that you can choose
based on the requirement.
CONCLUSION
Data science education is well into its formative stages of development; it is evolving into a self-supporting discipline and producing professionals with distinct and complementary skills relative to professionals in the computer, information, and statistical sciences.


Comments
Post a Comment