30 Sep Think of an SCM-related business process in an organization that interests you which contains at least one decision task that is not presently automated bu
Limit: 250 words
Estimated time: About 1.5 hours (after completing the readings)
Deliverable: Based on your reading of the assigned articles, especially “Embed Analytics in Decision Processes”, compose a response to the following questions.
Think of an SCM-related business process in an organization that interests you which contains at least one decision task that is not presently automated but that could potentially be. Let us call it the "focal decision task".
(For example, refer to the in the insurance claim process in the reading “Embed Analytics in Decision Processes” illustrated in Figure 7-1.)
- Describe the business process and the focal decision task and provide a truncated diagram of the business process (e.g., labeled boxes and arrows).
- Suggest a possible automation for this decision task. Include what data will be used and whether the decision should be fully automated, automated with exception/overrides, or assisted after the change – and why.
- Describe what technology(ies) and/or algorithm(s) could be put to use to inform the decision process your suggested above.
- What improvements do you expect to see in the outcome of the business process?
Tip: Be direct in your writing. Avoid overly descriptive and/or redundant statements. Use the diagram to compliment a relatively streamlined answer to point (1) of this assignment. Make sure your text and diagram symbols are legible and high-resolution. For now, your diagram should be simpler than Figure 7-1 in the case. Be specific in addressing the underlined portions of the prompt.
www.it-ebooks.info
www.it-ebooks.info
Praise
“A must-read resource for anyone who is serious about embracing the opportunity of big data.”
— Craig Vaughan Global Vice President at SAP
“This timely book says out loud what has finally become apparent: in the modern world, Data is Business, and you can no longer think business without thinking data. Read this
book and you will understand the Science behind thinking data.” — Ron Bekkerman
Chief Data Officer at Carmel Ventures
“A great book for business managers who lead or interact with data scientists, who wish to better understand the principals and algorithms available without the technical details of
single-disciplinary books.” — Ronny Kohavi
Partner Architect at Microsoft Online Services Division
“Provost and Fawcett have distilled their mastery of both the art and science of real-world data analysis into an unrivalled introduction to the field.”
—Geoff Webb Editor-in-Chief of Data Mining and Knowledge
Discovery Journal
“I would love it if everyone I had to work with had read this book.” — Claudia Perlich
Chief Scientist of M6D (Media6Degrees) and Advertising Research Foundation Innovation Award Grand Winner (2013)
www.it-ebooks.info
“A foundational piece in the fast developing world of Data Science. A must read for anyone interested in the Big Data revolution."
—Justin Gapper Business Unit Analytics Manager at Teledyne Scientific and Imaging
“The authors, both renowned experts in data science before it had a name, have taken a complex topic and made it accessible to all levels, but mostly helpful to the budding data scientist. As far as I know, this is the first book of its kind—with a focus on data science
concepts as applied to practical business problems. It is liberally sprinkled with compelling real-world examples outlining familiar, accessible problems in the business world: customer
churn, targeted marking, even whiskey analytics! The book is unique in that it does not give a cookbook of algorithms, rather it helps the
reader understand the underlying concepts behind data science, and most importantly how to approach and be successful at problem solving. Whether you are looking for a good
comprehensive overview of data science or are a budding data scientist in need of the basics, this is a must-read.”
— Chris Volinsky Director of Statistics Research at AT&T Labs and Winning
Team Member for the $1 Million Netflix Challenge
“This book goes beyond data analytics 101. It’s the essential guide for those of us (all of us?) whose businesses are built on the ubiquity of data opportunities and the new mandate for
data-driven decision-making.” —Tom Phillips
CEO of Media6Degrees and Former Head of Google Search and Analytics
“Intelligent use of data has become a force powering business to new levels of competitiveness. To thrive in this data-driven ecosystem, engineers, analysts, and managers
alike must understand the options, design choices, and tradeoffs before them. With motivating examples, clear exposition, and a breadth of details covering not only the “hows” but the “whys”, Data Science for Business is the perfect primer for those wishing to become
involved in the development and application of data-driven systems.” —Josh Attenberg
Data Science Lead at Etsy
www.it-ebooks.info
“Data is the foundation of new waves of productivity growth, innovation, and richer customer insight. Only recently viewed broadly as a source of competitive advantage, dealing well with data is rapidly becoming table stakes to stay in the game. The authors’ deep applied
experience makes this a must read—a window into your competitor’s strategy.” — Alan Murray
Serial Entrepreneur; Partner at Coriolis Ventures
“One of the best data mining books, which helped me think through various ideas on liquidity analysis in the FX business. The examples are excellent and help you take a deep
dive into the subject! This one is going to be on my shelf for lifetime!” — Nidhi Kathuria
Vice President of FX at Royal Bank of Scotland
www.it-ebooks.info
www.it-ebooks.info
Foster Provost and Tom Fawcett
Data Science for Business
www.it-ebooks.info
Data Science for Business by Foster Provost and Tom Fawcett
Copyright © 2013 Foster Provost and Tom Fawcett. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected].
Editors: Mike Loukides and Meghan Blanchette Production Editor: Christopher Hearse Proofreader: Kiel Van Horn Indexer: WordCo Indexing Services, Inc.
Cover Designer: Mark Paglietti Interior Designer: David Futato Illustrator: Rebecca Demarest
July 2013: First Edition
Revision History for the First Edition:
2013-07-25: First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449361327 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Many of the designations used by man‐ ufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. Data Science for Business is a trademark of Foster Provost and Tom Fawcett.
While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
ISBN: 978-1-449-36132-7
[LSI]
www.it-ebooks.info
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1. Introduction: Data-Analytic Thinking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Ubiquity of Data Opportunities 1 Example: Hurricane Frances 3 Example: Predicting Customer Churn 4 Data Science, Engineering, and Data-Driven Decision Making 4 Data Processing and “Big Data” 7 From Big Data 1.0 to Big Data 2.0 8 Data and Data Science Capability as a Strategic Asset 9 Data-Analytic Thinking 12 This Book 14 Data Mining and Data Science, Revisited 14 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data
Scientist 15 Summary 16
2. Business Problems and Data Science Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining. From Business Problems to Data Mining Tasks 19 Supervised Versus Unsupervised Methods 24 Data Mining and Its Results 25 The Data Mining Process 26
Business Understanding 27 Data Understanding 28 Data Preparation 29 Modeling 31 Evaluation 31
iii
www.it-ebooks.info
Deployment 32 Implications for Managing the Data Science Team 34 Other Analytics Techniques and Technologies 35
Statistics 35 Database Querying 37 Data Warehousing 38 Regression Analysis 39 Machine Learning and Data Mining 39 Answering Business Questions with These Techniques 40
Summary 41
3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation. 43 Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection. Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction. Models, Induction, and Prediction 44 Supervised Segmentation 48
Selecting Informative Attributes 49 Example: Attribute Selection with Information Gain 56 Supervised Segmentation with Tree-Structured Models 62
Visualizing Segmentations 67 Trees as Sets of Rules 71 Probability Estimation 71 Example: Addressing the Churn Problem with Tree Induction 73 Summary 78
4. Fitting a Model to Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Fundamental concepts: Finding “optimal” model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions. Exemplary techniques: Linear regression; Logistic regression; Support-vector machines. Classification via Mathematical Functions 83
Linear Discriminant Functions 85 Optimizing an Objective Function 87 An Example of Mining a Linear Discriminant from Data 88 Linear Discriminant Functions for Scoring and Ranking Instances 90 Support Vector Machines, Briefly 91
Regression via Mathematical Functions 94 Class Probability Estimation and Logistic “Regression” 96
* Logistic Regression: Some Technical Details 99 Example: Logistic Regression versus Tree Induction 102 Nonlinear Functions, Support Vector Machines, and Neural Networks 105
iv | Table of Contents
www.it-ebooks.info
Summary 108
5. Overfitting and Its Avoidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Fundamental concepts: Generalization; Fitting and overfitting; Complexity control. Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization. Generalization 111 Overfitting 113 Overfitting Examined 113
Holdout Data and Fitting Graphs 113 Overfitting in Tree Induction 116 Overfitting in Mathematical Functions 118
Example: Overfitting Linear Functions 119 * Example: Why Is Overfitting Bad? 124 From Holdout Evaluation to Cross-Validation 126 The Churn Dataset Revisited 129 Learning Curves 130 Overfitting Avoidance and Complexity Control 133
Avoiding Overfitting with Tree Induction 133 A General Method for Avoiding Overfitting 134 * Avoiding Overfitting for Parameter Optimization 136
Summary 140
6. Similarity, Neighbors, and Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation. Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity. Similarity and Distance 142 Nearest-Neighbor Reasoning 144
Example: Whiskey Analytics 144 Nearest Neighbors for Predictive Modeling 146 How Many Neighbors and How Much Influence? 149 Geometric Interpretation, Overfitting, and Complexity Control 151 Issues with Nearest-Neighbor Methods 154
Some Important Technical Details Relating to Similarities and Neighbors 157 Heterogeneous Attributes 157 * Other Distance Functions 158 * Combining Functions: Calculating Scores from Neighbors 161
Clustering 163 Example: Whiskey Analytics Revisited 163 Hierarchical Clustering 164
Table of Contents | v
www.it-ebooks.info
Nearest Neighbors Revisited: Clustering Around Centroids 169 Example: Clustering Business News Stories 174 Understanding the Results of Clustering 177 * Using Supervised Learning to Generate Cluster Descriptions 179
Stepping Back: Solving a Business Problem Versus Data Exploration 182 Summary 184
7. Decision Analytic Thinking I: What Is a Good Model?. . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines. Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison. Evaluating Classifiers 188
Plain Accuracy and Its Problems 189 The Confusion Matrix 189 Problems with Unbalanced Classes 190 Problems with Unequal Costs and Benefits 193
Generalizing Beyond Classification 193 A Key Analytical Framework: Expected Value 194
Using Expected Value to Frame Classifier Use 195 Using Expected Value to Frame Classifier Evaluation 196
Evaluation, Baseline Performance, and Implications for Investments in Data 204 Summary 207
8. Visualizing Model Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results. Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves. Ranking Instead of Classifying 209 Profit Curves 212 ROC Graphs and Curves 214 The Area Under the ROC Curve (AUC) 219 Cumulative Response and Lift Curves 219 Example: Performance Analytics for Churn Modeling 223 Summary 231
9. Evidence and Probabilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Fundamental concepts: Explicit evidence combination with Bayes’ Rule; Probabilistic reasoning via assumptions of conditional independence. Exemplary techniques: Naive Bayes classification; Evidence lift.
vi | Table of Contents
www.it-ebooks.info
Example: Targeting Online Consumers With Advertisements 233 Combining Evidence Probabilistically 235
Joint Probability and Independence 236 Bayes’ Rule 237
Applying Bayes’ Rule to Data Science 239 Conditional Independence and Naive Bayes 240 Advantages and Disadvantages of Naive Bayes 242
A Model of Evidence “Lift” 244 Example: Evidence Lifts from Facebook “Likes” 245
Evidence in Action: Targeting Consumers with Ads 247 Summary 247
10. Representing and Mining Text. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining. Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models. Why Text Is Important 250 Why Text Is Difficult 250 Representation 251
Bag of Words 252 Term Frequency 252 Measuring Sparseness: Inverse Document Frequency 254 Combining Them: TFIDF 256
Example: Jazz Musicians 256 * The Relationship of IDF to Entropy 261 Beyond Bag of Words 263
N-gram Sequences 263 Named Entity Extraction 264 Topic Models 264
Example: Mining News Stories to Predict Stock Price Movement 266 The Task 266 The Data 268 Data Preprocessing 270 Results 271
Summary 275
11. Decision Analytic Thinking II: Toward Analytical Engineering. . . . . . . . . . . . . . . . . . . . 277 Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available. Exemplary technique: Expected value as a framework for data science solution design.
Table of Contents | vii
www.it-ebooks.info
Targeting the Best Prospects for a Charity Mailing 278 The Expected Value Framework: Decomposing the Business Problem and
Recomposing the Solution Pieces 278 A Brief Digression on Selection Bias 280
Our Churn Example Revisited with Even More Sophistication 281 The Expected Value Framework: Structuring a More
Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.
Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.