Programme: B. Tech Semester: SIX Course: DATA SCIENCE
DATA SCIENCE
Course objectives:
● Knowhow and expertise to become a data scientist.
● Essential concepts of statistics and machine learning that are vital for datascience;
● Significance of exploratory data analysis (EDA) in data science.
● Critically explore and analyze data visualizations presented on the dashboards
● Suitability and limitations of tools and techniques related to data science process
Course Outcomes:
● Describe the steps involved in Data Science process and the technologies needed for a data scientist. ● Identify suitable ML techniques for data modelling and apply them for decision support.
● handle large datasets with distributed storage and processing system
● use appropriate tools for data collection, EDA and model building for specific types of data 5. build a prototype application of Data Science as a case study.
SYLLABUS
January 2025:
Introduction to Data science, benefits and uses, facets of data, data science process in brief, big data ecosystem and data science
February 2025:
Data Science process: Overview, defining goals and creating project charter, retrieving data, cleansing, integrating and transforming data, exploratory analysis, model building, presenting findings and building applications on top of them
Applications of machine learning in Data science, role of ML in DS, Python tools like sklearn, modelling process for feature engineering, model selection, validation and prediction, types of ML including semi-supervised learning
March 2025
Handling large data: problems and general techniques for handling large data, programming tips for dealing large data, case studies on DS projects for predicting malicious URLs, for building recommender systems
NoSQL movement for handling Bigdata: Distributing data storage and processing with Hadoop framework, case study on risk assessment for loan sanctioning, ACID principle of relational databases, CAP theorem, base principle of NoSQL databases, types of NoSQL databases, case study on disease diagnosis and profiling
April 2025:
Tools and Applications of Data Science: Introducing Neo4j for dealing with graph databases, graph query language Cipher, Applications graph databases, Python libraries like nltk and SQLite for handling Text mining and analytics, case study on classifying Reddit posts
Data Visualization and Prototype Application Development: Data Visualization options, Cross filter, the JavaScript Map Reduce library, creating an interactive dashboard with dc.js, Dashboard development tools, Applying the DS process for respective engineering problem solving scenarios as a detailed case study.
Textbook:
1. Davy Cielen, Arno D.B. Meysman, and Mohamed Ali, “Introducing to Data Science using Python tools”, Manning Publications Co, Dreamtech press, 2016
2. Prateek Gupta, “Data Science with Jupyter” BPB publishers, 2019 for basics
Reference Books:
1. Joel Grus, “Data Science From Scratch”, O Reilly, 2019
2. Doing Data Science: Straight Talk from the Frontline, 1 st Edition, Cathy O’Neil and Rachel Schutt, O’Reilly, 2013