Programme:                                 B. Tech              Semester:              SIX      Course:   DATA SCIENCE

DATA SCIENCE

Course objectives:

● Knowhow and expertise to become a data scientist.

● Essential concepts of statistics and machine learning that are vital for datascience;

● Significance of exploratory data analysis (EDA) in data science.

● Critically explore and analyze data visualizations presented on the dashboards

● Suitability and limitations of tools and techniques related to data science process

 

Course Outcomes:

● Describe the steps involved in Data Science process and the technologies needed for a data scientist. ● Identify suitable ML techniques for data modelling and apply them for decision support.

● handle large datasets with distributed storage and processing system

● use appropriate tools for data collection, EDA and model building for specific types of data 5. build a prototype application of Data Science as a case study.

 

SYLLABUS

January 2025:

Introduction to Data science, benefits and uses, facets of data, data science process in brief, big data ecosystem and data science

February 2025:

Data Science process: Overview, defining goals and creating project charter, retrieving data, cleansing, integrating and transforming data, exploratory analysis, model building, presenting findings and building applications on top of them

 Applications of machine learning in Data science, role of ML in DS, Python tools like sklearn, modelling process for feature engineering, model selection, validation and prediction, types of ML including semi-supervised learning

March 2025

Handling large data: problems and general techniques for handling large data, programming tips for dealing large data, case studies on DS projects for predicting malicious URLs, for building recommender systems

NoSQL movement for handling Bigdata: Distributing data storage and processing with Hadoop framework, case study on risk assessment for loan sanctioning, ACID principle of relational databases, CAP theorem, base principle of NoSQL databases, types of NoSQL databases, case study on disease diagnosis and profiling

April 2025:

Tools and Applications of Data Science: Introducing Neo4j for dealing with graph databases, graph query language Cipher, Applications graph databases, Python libraries like nltk and SQLite for handling Text mining and analytics, case study on classifying Reddit posts

Data Visualization and Prototype Application Development: Data Visualization options, Cross filter, the JavaScript Map Reduce library, creating an interactive dashboard with dc.js, Dashboard development tools, Applying the DS process for respective engineering problem solving scenarios as a detailed case study.

 

Textbook:

1. Davy Cielen, Arno D.B. Meysman, and Mohamed Ali, “Introducing to Data Science using Python tools”, Manning Publications Co, Dreamtech press, 2016

2. Prateek Gupta, “Data Science with Jupyter” BPB publishers, 2019 for basics

Reference Books:

1. Joel Grus, “Data Science From Scratch”, O Reilly, 2019

2. Doing Data Science: Straight Talk from the Frontline, 1 st Edition, Cathy O’Neil and Rachel Schutt, O’Reilly, 2013