Advances in Biotech/Biopharma & the Boston Chapter of the ASA with Wenting Cheng & Weidong Zhang @Pod of Asclepius May 10, 2022

Wenting and Weidong discuss how the statistical challenges in the biopharm industry have proliferated with the unique demands of biotech and related life science industries.

34m

Ruda Zhang | Gaussian Process Subspace Regression May 10, 2022

Ruda Zhang | Gaussian Process Subspace Regression Ruda Zhang (Duke University) walks us through "Gaussian Process Subspace Regression for Model Reduction" by Zhang, Mak, and Dunson. To keep the topic interesting for both the early career & advanced audience we recap key points at a high level so that no one gets lost. This episode involves a presentation, so you may prefer to watch the YouTube version here: https://youtu.be/IPtqUUG4XcY Ruda's website: https://ruda.city/ The paper: https://arxiv.org/abs/2107.04668

1h 9m

Ruda Zhang | Math-Science Duality Apr 14, 2022

Ruda Zhang | Math-Science Duality Watch it on... Youtube: https://youtu.be/GoDwen-RGZg Podbean: https://dataandsciencepodcast.podbean.com/e/ruda-zhang-math-science-duality/ Statistics is thought to reside at the interface of science and mathematics. Ruda Zhang (Duke University) discusses the friction at this interface and the role that both mathematical formalism & observational/data-driven intuition play in scientific discovery. A great topic for anyone interested in statistics' role in scientific discovery. #datascience #ai #science #mathematics Topic List 00:00 COMING UP... 2:44 Ruda Zhang's compendium of cool ideas + a Gaussian process PSA 7:08 Is intuition undervalued in scientific research? 10:16 Mathematics vs observational science. Rigor vs intuition. 14:07 Intuition & discovery precedes mathematical rigor 21:58 Mathematics vs empirical science & the complexity of induction 30:24 Abstract thinking & the cost/benefit of discovery 37:25 The efficient frontier / Pareto Front of knowledge 42:55 Pragmatism and competence 50:24 Math /science dualism 1:15:52 AI making scientific discoveries 1:19:15 Statistical & scientific debate

1h 22m

Simon Mak | Integrating Science into Stats Models Apr 06, 2022

Simon Mak | Integrating Science into Stats Models #statistics #science #ai It’s a common dictum that statisticians need to incorporate domain knowledge into their modeling and the interpretation of their results. But how deeply can scientific principles be embedded into statistical models? Prof. Simon Mak (Duke University) is pushing this idea to the limit by integrating fundamental physics, physiology, and biology into both the models and model inference. This includes Simon’s joint work with Profs. David Dunson and Ruda Zhang (also of Duke University). Scientific reasoning AND stats. What more could we ask for? Enjoy! Watch it on.... YouTube: https://youtu.be/bUbZO7R4z40 Podbean: https://dataandsciencepodcast.podbean.com/e/simon-mak-integrating-science-into-stats-models/ 00:00 - COMING UP….Scientists & Statisticians 02:09 - Introduction - Integrating scientific knowledge into AI/ML 06:08 - How much domain knowledge is sufficient? 09:15 - Choosing which prior knowledge to integrate into a model 14:49 - Black box & gray box optimization 19:50 - Non-physics examples of integrating scientific theory into ML models 22:45 - Scientific principles & modeling at different scales 27:20 - Correlation is one just way of modeling linkage 36:37 - Conditional independence & different-fidelity experiments 39:40 - Innovation vs incorporation of known information in the model 42:52 - Aortic stenosis example 52:49 - Which mathematics can be used to represent scientific knowledge 57:09 - How to acquire scientific domain knowledge 1:02:45 - Complementary approaches to integrating science 1:06:48 - Gaussian process & integrating priors over functions 1:12:48 - A topic for statisticians and scientists to debate:science-based vs data-based learning. Simon Mak's Webpage: https://sites.google.com/view/simonmak/home

1h 19m

Martin Goodson | The UK’s AI Roadmap Mar 16, 2022

Martin Goodson | The UK's AI Roadmap #ai #datascience #startups Martin Goodson (Evolution AI) describes the key aspects of the UK's AI Roadmap & responses to the document by members of the Royal Statistical Society. In particular, Martin describes the disconnect between the priorities of AI startups and industry practitioners on one side, and government and academia on the other. Martin also outlines which skills early career data scientists should focus on while in school versus after entering the workforce. Also available on.... YouTube: https://youtu.be/T9qRl6Hclhg Topic List 0:00 COMING UP: Scientific culture & AI 1:25 The UK AI Roadmap 8:44 Who is a data science “practitioner”? 12:53 Data science in AI startups 20:36 Is there a disconnect between practitioners & academia? 25:09 Key skills for new data science graduates 32:03 Coding & production level data science 39:30 Learning the right data analysis skills at the course-level. 45:32 AI leadership 58:40 AI from academia & OpenSource initiatives 1:05:37 Large institutions' impact on the AI field 1:08:24 Back to the UK AI roadmap 1:12:16 Building an AI community 1:13:15 AI in our lifetime: Moonshots & realistic goals 1:14:31 Scientific debate

1h 16m

Jack Fitzsimons | Data Security, Privacy, & Artificial Intelligence Mar 01, 2022

Dr. Jack Fitzsimons (Oblivious AI) gives a high-level introduction to the technologies that can either exploit or protect your data privacy. If you'd like to survey the landscape of data privacy-preserving technologies (from someone who's building the tech) this is a good place to start! #datascience #privacy #ai 0:00 - Coming up... 3:24 - Introduction 6:20 - Data privacy and privacy enhancing technologies 13:00 - History of privacy enhancing technologies 19:54 - Differential privacy: Hiding the influence of a single data point 22:52 - Trading data utility for data privacy 38:32 - Tracking algorithms and how they decide user preferences 42:04 - Preserving privacy: Anonymizing data & VPNs 50:17 - Exploration vs Exploitation: Combining best of multiple domains to tackle problems 54:13 - Federated learning, input and output privacy of data 58:45 - Balancing data privacy vs data-driven personalization 1:05:50 - What should data scientists/statisticians debate?

1h 14m

Chris Tosh | The piranha problem in statistics Feb 22, 2022

The piranha problem (too many large, independent effect sizes influence the same outcome) has received some attention on Andrew Gelman’s blog. But now it’s a paper! Chris Tosh (Memorial Sloan Kettering) talks about multiple views of the piranha problem and detecting the implausible scientific claims that are published. The butterfly effect makes an appearance. If you enjoyed the science-vs-pseudoscience topics, you’ll enjoy this one. 0:00 - Coming up in the episode 2:35 - What is the Piranha Problem? 19:54 - Confusing effect sizes 23:11 - The "words & walking speed" study 26:22 - Declaration of independent variables 30:58 - Piranha theorems for correlations 37:07 - Piranha theorems for linear regression 40:37 - Piranha Theorems for mutual information 44:13 - Bounds on the independence of the covariates 46:12 - Applying the piranha theorem to real data 50:12 - Applying the piranha theorem across studies 54:05 - A Bayesian detour 1:00:12 - The butterfly effect & chaos 1:04:26 - Applying the piranha theorem to cancer research

1h 9m

Chris Holmes | AI, Digital Health, & The Alan Turing Institute Feb 09, 2022

Chris Holmes is Professor of Biostatistics at the University of Oxford and Programme Director for Health and Medical Sciences at The Alan Turing Institute. Chris’ research interests include Bayesian nonparametrics (which is the right kind of nonparametrics), statistical machine learning, genomics, and genetic epidemiology. 0:00 - Intro 1:38 - Chris Holmes, Professor of Biostatistics at Oxford University 3:28 - UK Biobank & designing a valuable dataset 8:42 - Healthcare charities in the UK 11:16 - Digital Health: prioritizing research questions 19:55 - Bayes, nonparametrics, and Bayesian nonparametrics 23:30 - Model prediction is at the heart of Bayesian inference 28:00 - Prioritization in model building for biology 33:09 - Model constraints to generate valid inference 37:34 - Hypothesis driven science in statistical learning versus deep learning 43:30 - Developing models in genomics & clinical informatics 48:37 - Building stable, generalizable and robust models 52:41 - Important questions to think about 54:05 - Causal reasoning and clinical risk prediction 57:50 - What topic should the statistical community debate?

1h 3m

Charlotte Deane | Bioinformatics, Deepmind’s AlphaFold 2, and Llamas Feb 01, 2022

Charlotte Deane | Bioinformatics, Deepmind's AlphaFold 2, and Llamas #datascience #ai Charlotte Deane (Oxford University) talks about statistical approaches to bioinformatics, the evolution of Google Deepmind's AlphaFold 2 & its place in protein informatics deep learning landscape. She also describes humanizing antibodies, and the increasing role of software engineers in statistical research groups. The topic of llamas, camels, and alpacas (and their unique place in proteomics research) makes a surprise visit. [Note: This episode was originally published in January 2022, but the file contained a buffering error, which prevented the full interview from being played. This version, published Feb 1, 2022 contains the full interview.] Topics 0:00 Intro / An important topic to debate 3:50 What is a protein? Why are proteins foundational? 13:32 Immunotherapies, humanizing antibodies, & creating an scientific databases 16:04 Translating in silico research into immunotherapies 21:03 Nanobodies, camels, alpacas, & llamas. 25:05:00 Databases and data knowledge bases 33:21:00 Targeted therapies 39:45:00 Statistical modeling in proteomics 45:40:00 DeepMind AlphaFold's evolution 55:28:00 Software engineers in academic research groups 1:03:21 The adventure of science 1:07:42 Oxford Blues hockey & scientific debate

1h 16m

Charlotte Deane | Proteomics, AlphaFold 2, and Llamas Jan 24, 2022

Charlotte Deane | Proteomics, AlphaFold 2, and Llamas #datascience #ai Charlotte Deane (Oxford University) talks about statistical approaches to proteomics, the evolution of Google Deedmind's AlphaFold 2 & its place in proteomics' deep learning landscape, humanizing antibodies, and the increasing role of software engineers in research groups. The topic of llamas, camels, and alpacas (and their unique place in proteomics research) make a surprise visit. Topics 0:00 Intro / An important topic to debate 3:50 What is a protein? Why are proteins foundational? 13:32 Immunotherapies, humanizing antibodies, & creating an scientific databases 16:04 Translating in silico research into immunotherapies 21:03 Nanobodies, camels, alpacas, & llamas. 25:05:00 Databases and data knowledge bases 33:21:00 Targeted therapies 39:45:00 Statistical modeling in proteomics 45:40:00 DeepMind AlphaFold's evolution 55:28:00 Software engineers in academic research groups 1:03:21 The adventure of science 1:07:42 Oxford Blues hockey & scientific debate

1h 16m

Eric Schwitzgebel | Consciousness, Zombies, & First Person Data | Philosophy of Data Science Dec 02, 2021

The philosophical community continuously aims to reconcile differing views on first person data and the consciousness of the mind. Is it possible to live without consciousness? Can one conceive thoughts without matching images to them? In this episode, Eric Schwitzgebel of the University of California tries to dissect such topics and questions to help us better understand the philosophical world. Keywords: philosophy, epistemic data, first person data, stimulus error, imageless thought, consciousness

1h 13m

Starting a Statistics Consultancy | Janet Wittes Nov 22, 2021

Starting a Statistics Consultancy | Janet Wittes The following interview was a keynote fireside chat with Janet Wittes (Statistics Collaborative, Inc.) titled "Statisticians as Entrepreneurs". It was recorded for the BBSW 2021 Conference (Nov 3 - 5 in Foster City, CA). References: BBSW 2021 Conference: https://www.bbsw.org/bbsw2021 Topics: 0:00 Janet's background prior to founding Statistics Collaborative, Inc. 3:00 Janet's initial research interest as a consultant 4:10 Why did Janet start her own business as opposed to joining a company or university. 5:45 Who were Janet's first clients? 8:00 What did Janet want to instill in her company? 15:50 Earning enough money to hire people 18:55 Initial ratio of clients to employees 22:42 Janet's company's statistical tech stack 25:00 Different challenges at different stages of the company 27:28 Growing a company but not taking on every possible client or project 28:13 Statisticians as entrepreneurs 37:00 Choosing the right people

38m

Philosophy of Data Science | Jingyi Jessica Li | Advancing Statistical Genomics Nov 16, 2021

Jingyi Jessica Li | Advancing Statistical Genomics Watch it on…. YouTube https://www.youtube.com/channel/UCkEz2tDR5K6AjlKw-JrV57w/videos Podbean https://podofasclepius.podbean.com/ Jingyi Jessica Li (UCLA) describes common statistical pitfalls in genomic data analysis & the statistical reasoning required to correct these mistakes. Common themes throughout include: __ __ Episode Topics 0:00 A major advancement in genomic data leads to new statistical techniques 2:15 Hypothesis-driven science & hypothesis-free data analysis 2:55 A ChIP Seq Example 8:00 Misformulation of sampling variability 16:55 A false analogy: the permutation test 19:03 Losing my p-value religion: the value of statistical packaging 24:30 The Clipper Framework for false discovery rate control 31:50 Non-parametric developments 37:55 Inferred covariates 46:00 PseudotimeDE: inferences of differential gene expression along cell pseudotime 47:10 Selective inference 49:25 What biological/physiological data will be incorporated in the future? 52:30 Statistics, computer science, data science, ML, biology 57:05 Machine learning and prediction 1:01:30 Sophisticated models vs sophisticated research 1:07:45 Peer review in science 1:13:05 Hypothesis-driven science vs cutting intellectual corners 1:18:12 What topic should the statistics community debate?

1h 22m

Mine Çetinkaya-Rundel | Advancing Open Access Data Science Education Nov 09, 2021

Mine Çetinkaya-Rundel | Advancing Open Access Data Science Education #datascience #statistics #education Mine Çetinkaya-Rundel (Duke University) describes the current and future states of statistics and data science education. Then she discusses the process of building open access learning material. 0:00 - Introduction 1:40 - Prioritizing topics in curricula 9:07 - Teaching with intent to test 11:22 - Statistics without computing 17:52 - What should be taught? How do we teach it? 19:07 - Computational thinking is valuable (to 31:45) 23:47 - Self reinforcing academics / positive feedback (to 31:45) 31:08 - Data science vs statistics (the computing angle) 37:55 - Statistical collaboration / technical collaboration 39:45 - Common language / imputation under ignorance 41:12 - Are some topics better for hands on or computational learning? 45:32 - Learning computation through visualization 52:40 - Video cut option before she gives an example 52:42 - Let them eat cake first. 56:08 - What is open source education? Open source vs open access. 59:36 - Advancing open source text books 1:03:55 - Economics of open source 1:07:55 - The open education ecosystem 1:12:17 - Modularizing & parallelizing learning topics 1:16:52 - Favorite dataset on OpenIntro.Org? 1:18:14 - What topic should the statistics community debate?

1h 20m

Jingyi Jessica Li | Statistical Hypothesis Testing vs Machine Learning Binary Classification Sep 20, 2021

Jingyi Jessica Li | Statistical Hypothesis Testing versus Machine Learning Binary Classification Jingyi Jessica Li (UCLA) discusses her paper "Statistical Hypothesis Testing versus Machine Learning Binary Classification". Jingyi noticed several high-impact cancer research papers using multiple hypothesis testing for binary classification problems. Concerned that these papers had no guarantee on their claimed false discovery rates, Jingyi wrote a perspective article about clarifying hypothesis testing and binary classification to scientists. #datascience #science #statistics 0:00 – Intro 1:50 – Motivation for Jingyi's article 3:22 – Jingyi's four concepts under hypothesis testing and binary classification 8:15 – Restatement of concepts 12:25 – Emulating methods from other publications 13:10 – Classification vs hypothesis test: features vs instances 21:55 - Single vs multiple instances 23:55 - Correlations vs causation 24:30 - Jingyi’s Second and Third Guidelines 30:35 - Jingyi’s Fourth Guideline 36:15 - Jingyi’s Fifth Guideline 39:15 – Logistic regression: An inference method & a classification method 42:15 – Utility for students 44:25 – Navigating the multiple comparisons problem (again!) 51:25 – Right side, show bio-arxiv paper

55m

Gualtiero Piccinini | What Are First-Person Data? | Philosophy of Data Science Aug 30, 2021

Gualtiero Piccinini | What Are First-Person Data? First-person methods (and its associated data) have been scientifically and philosophically contentious. Are they pseudoscientific? Or simply pushing the bounds of scientific methodology? Obviously, I have no idea… so Prof. Gualtiero Piccinini (University of Missouri – St. Louis) provides a helpful introduction to the topic covering the key points of its history and the philosophical/scientific debate. 0:00 Why cover first-person methods & data? 2:26 First-person methods vs first-person data? 7:10 Are first-person data legitimate at all? 11:50 Phenomenology 13:26 First-person data is extracted from human behavior 18:25 Skepticism & arguments against first-person data 25:40 Psychophysics, introspectionists, behavioralists, cognitivists, and the origins of first-person data 35:20 Using new instruments & methods in science 46:00 Is this where the philosophers roam? #datascience #statistics #science

51m

David Dunson | Advancing Statistical Science | Philosophy of Data Science Aug 17, 2021

David Dunson | Advancing Statistical Science | Philosophy of Data Science Series A fundamental question in the philosophy of science is "what does it mean to make scientific progress?" We will have a series of episodes centered around this question for statistics and data science. In our first episode in the series, David Dunson (Duke University) discusses important advances in Bayesian analysis, big data, uncertainty, and scientific discovery. Topic Timestamps 0:00 Intro to David Dunson 1:54 What does it mean to advance data science and statistics? 6:14 Industry & Optimization, Science & Uncertainty 8:14 Prediction & Discovery / Bayesian Modeling 14:13 What is “complex” data? 22:49 Big Data, Bayes, and Nonparametrics 33:50 Ad hoc approaches vs principled methods 37:08 Should Machine Learning Publications Refocus on Scientific Discovery? 39:50 Mathematically principled data science & statistics 51:40 Do Bayesians just use priors as regularizers? 55:16 Bayesian Priors and Tuning Inference Methods 1:00:00 Prioritize the Most Important Work in Data Science 1:07:07 Good Practices of Star Grad Students 1:13:17 The Science in Statistical *Science* #datascience #science #statistics

1h 17m

Martin Kuldorff | Spatiotemporal Models of Disease Outbreaks Aug 03, 2021

Note: This conversation was recorded June 25, 2021. Martin Kuldorff | Spatiotemporal Models of Outbreaks Martin Kuldorff (Harvard Medical School) talks about the integration of biological & demographic information (and general reality) in the spatiotemporal models used to detect disease outbreaks. He also discusses how these methods can be applied to non-infectious diseases like cancer. 0:00 - Spatio-temporal modeling of outbreaks 6:02 - Important features of spatio-temporal outbreak models 12:20 - Which diseases wouldn't you track for modeling? 19:02 - Multiple comparison adjustments of alarms 25:15 - Domain knowledge of outbreak features 29:30 Competing hazards & risks 34:30 Comparing hemispheres 37:00 - Bridging the gap for infectious diseases to cancer 45:10 - Retrospective data correction / changing monitoring 57:00 - Competing risks & statistics 1:01:30 - Deducing risks & affects through knowledge of immunological mechanisms 1:09:00 - Future scientific convos #datascience #science

1h 8m

Jason Costello | Data Science vs Software, Academia vs Industry Jul 19, 2021

Interested in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across various scientific domains. Jason Costello | Data Science vs Software, Academia vs Industry Jason Costello (Hypervector) describes his (non-trivial) transition from academic research into big tech and then the healthcare industry. He outlines a strategy to find the cool research problems that you get in academia while still delivering value to your company. We then talk about the interface of data science / machine learning and software. 0:00 Deploying Data Science into the Real World 8:24 Transitioning from Academic to Industrial Data Science 16:56 First step to delivering value to industry 21:38 Toy example of high value data science 25:28 Deep technical challenges are real and useful too! 29:59 Formalized logic in machine learning solutions 32:54 Data Science & Machine Learning Projects can fail. 38:50 Getting to the cool data science projects 47:21 Putting Machine Learning Models into Software 56:21 Software and Deduction, Machine Learning and Induction 1:06:06 Is Software A Deductive Complex System?

1h 8m

Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science Jun 14, 2021

Interesting in Data Science? Learn Data Science and Statistics from experts as they cover key topics in the field. The Data & Science podcast focusses on teaching data scientists how to think critically in order to solve data analysis problems across various scientific domains. Eric Daza | N-of-1 Science & Causal Inference | Philosophy of Data Science Much of our scientific inference revolves around the identification and replication of patterns in data. So what can be done when N=1? Eric Daza gives us a statistician's perspective on the ideas behind N-of-1 studies, its best examples, and strongest critiques. 0:00 - The purpose of N-of-1 & generalizability 3:30 - Successes and challenges in N-of-1 9:30 - A lightbulb moment 18:00 – Anomalies, Compliance, & Recurring Patterns 23:00 – Best Critiques of N-of-1, Safety, Efficacy 41:20 - Causal Inference 54:30 – Increasing the number of data scientists 1:03:30 – Biostatistics’ changing place in data science / statistical thinking

1h 12m

Edward McFowland III | Anomalous Pattern Detection & Model Building Jun 01, 2021

#datascience #statistics Edward McFowland III | Anomalous Pattern Detection & Model Building Edward McFowland III (Harvard Business School) describes the differences between "anomalies" and "anomalous patterns". Edward describes how this informs modeling strategies, in particular, when to use an off-the-shelf model versus building a bespoke model from scratch. He then covers how to draw inspiration from different scientific and technical fields. 0:00 Edward: Live in Conference 2:00 Outliers vs Anomalies vs Anomalous Patterns 9:30 Strategy to Identify Anomalous Data Patterns 19:15 Adding Complexity to Models 25:00 Building Blocks vs Comprehensive Models 39:05 New Pieces of Evidence 40:40 Deciding Data Science Strategies 52:30 Connecting the Technical Dots 58:40 Interdisciplinary Interests

1h 2m

Data Science Job Search | Advice + Q&A May 26, 2021

#datascience #jobs #career #jobsearch #statistics The Statistical Consulting Section of the ASA invited me to give a presentation on the data science job search followed by a Q&A. They were kind enough to let me post it here (with minor edits). My drawing of "cumulative cost" is wrong. It should intercept the "current cost" line at time = 0. 0:00 – Humility, Goals, & Human Data Points 5:00 – Play the Numbers Game 12:40 – Job vs Career 18:18 – Nonsensical Data Science Job Descriptions 25:40 – Technical Review & Presentation 30:00 – The Advantages of Early Career 37:25 – Save Job Descriptions / Industry vs Academia 46:10 – Career vs Job Clarification 53:10 – Bachelor’s vs Master’s vs Doctorate? 56:10 – Delivering Value Over Time 1:08:10 – Product vs Service 1:11:10 – Comments From an Academic Perspective 1:116:43 – Get Your Foot in the Door / Doing What You Love 1:25:50 – Future Q&A’s

1h 30m

Mike Evans | Statistical Reasoning & Evidence | Philosophy of Data Science Series May 19, 2021

Mike Evans | Statistical Reasoning & Evidence | Philosophy of Data Science Series Mike Evans (University of Toronto) describes his approach to statistical reasoning. Mike outlines how to recognize and address problems that are statistical in nature and why these approaches should be grounded in our ability to measure statistical evidence. Watch it on YouTube at: https://youtu.be/Q7JpGZxHxXU 0:00 Statistical Reasoning 2:30 The Basic Problem: Reasoning on Statistical Problems 13:00 Rules of Statistical Inference 19:30 Bias (The Controversial Bit?!?!) 24:10 Steps of Statistical Reasoning 25:50 Connection to Philosophy of Science 27:35 Measuring Evidence (Frequentist vs Bayesian vs Loss Function) 29:49 Problems with the p-values 32:00 Choosing & Checking Priors 49:25 Idealism, Good Plans, Bad Plans 54:45 Describing Your Reasoning 59:20 Critiques of the Principle of Evidence 1:04:00 Data-Driven Science vs Hypothesis Driven Science

1h 9m

Deborah Mayo | Statistics & Severe Testing vs Pseudoscience May 13, 2021

Deborah Mayo | Statistics & Severe Testing vs Pseudoscience Watch it on… YouTube https://youtu.be/MVHoE9V_X5g Podbean In our fourth episode of the “science vs pseudoscience” mini-series, Deborah Mayo (Virginia Tech) specifies several necessary criteria to be scientifically rigorous. She gives several examples of how statistical thinking is essential to scientific thinking and why she believes that the “I’ll know it when I see it” approach to delineating science from pseudoscience is not a good approach. Looking to catch up with the earlier “Science vs Pseudoscience” episode? You can watch them here: Intro Episode 1 Episode 2 Episode 3

1h 35m

Kristin Morgan | The Data Science of Sports Injury May 10, 2021

Description: In the world of biomechanics, engineers continuously aim to innovate and create new models for better understanding of their research. In this episode, Kristin Morgan (University of Connecticut) returns to the show as she explains how they use gait as a form of diagnostic tool in maximizing human performance. Having experiences on sports herself, Morgan presents how they use gait to measure recovery from physical impairment, specifically for ACL-related injuries. Aside from this, however, she also explains how they use the same tool to measure recovery from cognitive impairment. An insightful episode for all! Keywords: biomechanics, models, metrics, gait, engineering, statistics, cognitive impairment, physical impairment 0:00 - Intro 03:01 - Creating models for performance optimization 07:23 - Why gait is an effective diagnostic tool 11:38 - Maximizing gait in creating models for post-ACLR 17:35 - Manifestation of different injuries & models 22:01 - Modeling motor control 26:28 - Applying other models in biomechanics 30:50 - Using asymmetric walking for recovery 39:30 - Understanding cognitive impairment recovery 44:19 - Moving forward with gait as diagnostic tool 45:40 - Taking inspiration from other fields / Statistics in Engineering 47:45 - Engineering and statistics hand in hand 52:50 - Limitations of modeling in biomechanics 54:20 - Starting a career in biomechanics 58:20 - Including cognitive impairment 1:00:20 - Tailoring models to specific cases 1:05:33 - Applying the models to injuries other than ACL

1h 10m

Michael McRoberts | Football Analytics and Data-Driven Decisions May 05, 2021

Michael McRoberts | Football Analytics and Data-Driven Decisions Michael McRoberts (Championship Analytics Inc.) uses Monte Carlo simulations to provide strategy analytics to college and NFL football teams. Topics include communicating data-driven recommendations, the need to create counterfactual data, and asymmetric decision rewards. 0:00 The challenge of sports analytics 5:00 Analytics recommendations 16:00 Communicating data-driven recommendations 24:35 Vegas Odds & Ancillary Data 30:00 Football is way behind / Data science projects with a "runway" 41:25 Creating experiments and counterfactuals 49:30 Implementing data science insights 56:15 Asymmetric decision rewards 58:50 How to start in sports analytics 1:10:00 Data science vs analytics vs statistics

1h 14m

Andrew Gelman & Megan Higgs | Statistics' Role in Science and Pseudoscience Apr 30, 2021

Andrew Gelman & Megan Higgs | Statistics' Role in Science and Pseudoscience #datascience #statistics #science #pseudoscience Our science vs pseudoscience discussion continues with Andrew Gelman (Columbia) and Megan Higgs (Critical Inference LLC). Andrew and Megan describe two critical roles that statistics plays in science.... but also how statistics can add the air of scientific rigor to bad research or help statisticians fool themselves. From there the conversation goes on in a way that only a conversation with Andrew and Megan can! A very fun episode. 0:00 - Two roles of statistics in science 4:50 - Many models were intended for designed experiments 10:30 - The biggest scientific error of the past 20 years 15:00 - Feedback loop of over-confidence / Armstrong Principle 21:00 - Science is personal 25:00 - The value of different approaches / Don Rubin Story 34:40 - Statistics is the science of defaults / engineering new methods 45:00 - The value of writing what you did 52:27 - Math vs science backgrounds + a thought experiment 1:01:20 - Fooling ourselves

1h 11m

Irina Gaynanova | Replicability, Reproducibility, Responsibility, and Optimism for the Future of Science Apr 27, 2021

Irina Gaynanova (Texas A&M) describes why she thinks that replicability is a prerequisite for reproducibility in science and how scientists can (personally) start improving the replicability of research. We also discuss how the concepts of replicability/reproducibility can differ according to the domain-specific context and the methods used. Please forward to any students or colleagues who would find this of interest!

1h 2m

Science vs Pseudoscience | Neil Manson | Philosophy of Data Science Apr 19, 2021

#datascience #science #pseudoscience #criticalthinking #reasoning We each like to think of ourself as scientific. I'm yet to meet someone who would embrace being called "pseudoscientific". But what makes the difference? In this episode, Neil Manson talks about the fallout from Thomas Kuhn's 1962 book "The Structure of Scientific Revolutions" and how this created a playbook for many modern critiques/attacks on scientific activity. We have a new series that centers on the discussion of science vs. pseudoscience. Guests of different backgrounds share their insights on what really constitutes science and the highly-contested pseudoscience. The implications for data scientists and statisticians is very interesting, since many of the examples around this debate involved the conflicts between hypothesis-driven science vs data-driven science. 0:00 - Intro 0:43 - Science vs Pseudo/Bad/No Science 05:52 - Demarcation problem of science 12:07 - Incentives in science 13:00 - Glen forgets the word for "book" 13:40 - Luminiferous aether & lunch tables 18:19 - Keeping “good science” out of the science category 22:49 - Aiming to define science in relation to Kuhn’s theory 29:06 - Kuhn’s theory in action in various scenarios 32:53 - Logical fallacies in the world of science 46:17 - Intelligent design theory as science 51:14 - Distinction of different sciences

1h 15m

Science vs Pseudoscience | Dien Ho | Philosophy of Data Science Apr 08, 2021

We have a new series that centers on the discussion of science vs. pseudoscience. Guests of different backgrounds share their insights on what really constitutes science and the highly-contested pseudoscience. In today’s episode, we talk to Professor Dien Ho, PhD, a Professor of Philosophy and Healthcare Ethics, of the Massachusetts College of Pharmacy & Health Science University. Discover how philosophical ideas and theories are applied in hopes of understanding what really counts as science and what pseudoscience really is. 00:03 - Introductions 5:33 - What is pseudoscience? 08:53 - Legitimacy of other sciences 12:11 - What qualifies as science? 19:00 - Inductivism and empirical falsifiability 26:22 - Positivism and the importance of assumptions 31:36 - Assumptions and observations for data scientists 42:34 - The pursuit of science 49:17 - Scientism and revolutionary scientists 54:43 - Pinning down what science is

58m