DATA SCIENCE MATERIAL
INTERDUCTION
Data science also known as data-driven science.
• Data science is an inter-disciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured.
• Data science is a concept to unify statistics, data analysis and their related methods in order to understand and analyze actual phenomena with data.
• Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems.
• We define Data science as managing the process that can transform hypotheses and data into actionable predictions. For example: who will win an Election, what products will sell together, which loans will default, or which advertisements will be clicked on.
• The Data science field employs mathematics, statistics, and computer science disciplines, and incorporates techniques like Machine Learning and Artificial Intelligence.
• The main advantage of enlisting Data science in an organization is the empowerment and facilitation of decision-making.
• For any company that wishes to enhance their business by being more data-driven, Data science is the secret sauce.
Data Science – Development of Data Product:
A “Data Product” is a technical asset that: utilizes data as input, and processes that data to return algorithmically-generated results.
The classic examples of a Data Product are:
a. Amazon’s product recommendation engine(systems)- which ingests user data, and make personalized recommendations based on that data, it suggests items for you to buy, determined by their algorithms.
b. Gmail’s spam filter is a Data Product – an algorithm behind the scenes, which processes incoming mail and determines if a message is junk or not.
Some more examples:
Google’s advertisement valuation systems
LinkedIn’s contact recommendation system
Twitter’s trending topics
Walmart’s consumer demand projection systems
Banking institutions are mining data to enhance fraud detection.
Streaming services like YouTube, Netflix mine data to determine what its sers are interested in, and use that data to determine what TV shows or films to produce. Data-based algorithms are also used at YouTube & Netflix to create personalized recommendations based on a user’s viewing history.
Shipment companies like DHL, FedEx use Data science to find the best delivery routes and times, as well as the best modes of transport for their shipments.
Popular Restaurants & Departmental stores use Data science to improve their businesses.
Fields like Medical, Health care, insurance, etc. and various Public and Private sectors implementing Data science for analysis, decision making and predictions.
ROLES of DATA SCIENTIST
Data science is not performed in a vacuum. It’s a collaborative effort that draws on a number of roles, skills, and tools. In the data science process, the roles must be filled in a successful project.The roles of the data scientist can be shown in the following figure:
a. Data Engineer:
A Data Engineer is a person, fully equipped with knowledge of hardware, databases, data processing at scale and computer engineering and who can build data infrastructure, manage data storage and use and Implement production tools.
b. Data Scientist:
A data scientist is responsible for pulling and cleaning data, designing experiments, analyzing data and communicating result. He should have stronger statistics and presentation skills than a data analyst and data engineer. A data scientist would have strong skills of Inferential Statistics, Machine Learning, Data
Analysis, Data Communication.
c. Data Science Manager:
A Data Science Manager is a person who builds a data team, manages the whole data science process, set goals and priorities and interact with other groups and higher management. He should be strong knowledge of software and hardware, knowledge of roles, strong communication and he knows what can and can’t be achieved. A Data Manger can be any background like: Data science plus management skills or Data engineering plus management skills or Management skills plus got certain training in data science.
d. Data Architect:
A Data Architect understand all the sources of data and responsible for integrating, centralizing and maintaining all the data. He has strong knowledge of how the data relates to the current operations and the effects that any future process changes will have on the use of data in the organization. The role may include things like designing relational databases, developing strategies for data acquisitions, archive recovery, and implementation of a database, cleaning and maintaining the database by removing and deleting old data etc.
e. Data Analyst:
Data analysts need to have a good understanding of programming, statistics, machine learning, data managing, and data visualization. The Analyst may not have the mathematical or research background to invent new algorithms, but they have a strong understanding of how to use existing tools to solve problems and get new useful insights from data.
f. Business Analyst:
Business Analyst performs the task of understanding business change needs, assessing the business impact of those changes, capturing, analyzing and documenting requirements and supporting the communication and delivery of requirements with relevant stakeholders. The business analyst role is often seen as a communication bridge between IT and the business stakeholders. Business analysts must be great verbal and written communicators, tactful diplomats, problem solvers, thinkers and analyzers – with the ability to engage with stakeholders to understand and respond to their needs in rapidly changing business environments.
g. Software Engineer:
Software engineers are also needed in data science team because Software is the generalization of a specific aspect of a data analysis. If specific parts of a data analysis require implementing or applying a number of procedures or tools together then we need to build a piece of software to reduce the repeated work.
Comments