1. Introduction to Data science


Data Science

Data science is a field that combines techniques from mathematics, statistics, computer science, and domain specific knowledge to extract insights and knowledge from the data. The outcomes of data science are effectively utilised in decision making, problem solving, innovation, and optimisation. 

This represents the position of data science as the subject field

Data science lifecycle

It is a framework which outlines the key stages involved in a data science project. It provides a structured approach of developing data-driven solutions. The life cycle of data science consists of five key steps: Business understanding, Data gathering, Data preparation and Exploratory data analysis (EDA), Feature Engineering, Model development and Evaluation.  

this illustrates the basic stages in a data science project



Business understanding : This is the process of identifying the business problem or research question that the project is intended to address. The outcome of this stage ensures that the data science solution aligns with the business goals which can provide values to the stakeholders. It is important to understand the objectives, constraints, and success criteria of the project in the context of the business. 

Data gathering : It is the stage of collecting required data from various resources to be used in data science project. Data gathering consists of identifying the relevant data sources, determining the data collection method and collecting the data set. The common types of data sources are publicly available databases, private datasets, web data, transactional data, and sensor data. The required data can be collected through various methods such as surveys, interviews, observations, experiments, sensor data, web scrapping, and transactional data. Moreover, considerations on collecting a high quality data set by following an ethical and legal standards improve the validity of the project. 

Data preparation and EDA : The collected dataset may consists with missing values, outliers, or erroneous values. Therefore, the raw dataset needs to be cleaned, transformed and formatted before analysing. Data cleaning, and transformation, integration, reduction, and discretisation are basic data pre-processing steps. EDA, involves exploring and summarising the data to obtain insights through identifying the existing patterns and relationships. EDA can be basically categorised into graphical vs non-graphical and univariate vs multivariate. 

Feature engineering : It is the process of transforming the raw data into a set of covariates which can be used in machine learning models. Feature engineering is a combination of feature extraction, transformation, selection, and creation. Since the performance of the machine learning models significantly impact from the quality and relevance of the selected feature set, feature engineering is a critical step in the life cycle of data science. 

Model development and evaluation :  Model development involves selecting an appropriate algorithm, and optimising its parameters to improve the model in performing at its best. The algorithm selects based on the nature of the problem, available resources, and the type of data. The most common machine learning algorithms are linear regression, logistic regression, decision trees, random forest, support vector machines, and neural networks. The performance of the created model access at the model evaluation. The common metrics of model evaluation are accuracy, precision, recall, F1 score, and ROC-AUC score. 

 Data Science Tools

There are many data science tools available that used to perform various tasks in the data science lifecycle. Some of them are open-source while the others are commercial tools. The following chart summaries common tools in data science.  

The common tools used in the field of data science are listed here

References: 

Cover photo : https://www.dreamstime.com/photos-images/data-science.html


@Anu_data 







Comments

Popular posts from this blog

Python libraries for data science