1. An interview with David Purdy

In our first edition of the Behind Data Science podcast, we talked to David Purdy; Chief Scientist of Ghost and ex Sr. Data Science Manager for Safety at Uber.

 Kit Feber: Hello, listeners. I’m very pleased to introduce the guest on this episode, David Purdy. Hello, David.

David Purdy: Hello Kit! Thank you very much for having me today.

Kit Feber: It’s a pleasure, David, where are you based for benefit of the listeners?

David Purdy: So I’m in the San Francisco Bay area and have been working both in Silicon Valley and San Francisco for a number of years.

Kit Feber: And could you give people an overview of your background, David?

David Purdy: Sure. So I started a little over 20 years ago in what is now called data science. I began my career at the National Security Agency, but I’ve worked in everything from natural language processing and web search, to high frequency trading, medicine and led and started a variety of teams at Uber and also have worked in autonomous vehicles. More recently, I’ve been looking to advise a variety of companies and founders on how to optimise and develop their data science strategy. Using a lot of experiences and insights I’ve gained over this time on, how do you form the team? How do you set the goals? And then how do you accelerate and reach the highest possible velocity in generating ideas that may work, validate them and deploying them. So really, it’s how do you achieve the speed of light in your technical development.

Kit Feber: And we’re going to talk about getting a strategy right behind data science as well on the podcast David, so for the benefit of the people listening, could you give us some insight into your inspiration and people have inspired you over the years?

David Purdy: Sure. So first off in beginning my career at the NSA, one thing that that is very notable is the history of work in urgent contexts to develop the talents and tools to achieve research breakthroughs. This is notable in Bletchley Park, then, in looking back, even further Thomas Edison, one of the original founders of an industrial research laboratory. In each of these cases, there was something unprecedented that had to be achieved and a real sense of urgency and very fascinating problems of fascinating and talented teams. So it’s a real joy when you can work on something that was unprecedented, hard and also have to think very carefully on how do we get to the end result as quickly as possible.

Kit Feber: I think a lot of people may have seen the imitation game David and I think the Bletchley Park is dear to yours and many people’s hearts in the UK. Could you just give people inside as to why they’re touring? And Bletchley Park has interested you

David Purdy: Sure. So for those who’ve seen the imitation game, it’s spelled out fairly quickly early on, which is that the idea of approaching something as monumental as breaking the enigma code using just human means. It’s impossible and something that turning and others really recognizes that you have to create a platform for being able to test ideas, to deploy those algorithms, so in this case, they were testing various, I guess, for code briefing and then generate on that. But you also have to align the teams around that, and as a result, they were able to do something that had never been done before. It’s a great example, if you will, of how the platform can enable substantial breakthroughs on and I highly recommend the movie.

Kit Feber: It’s quite a good point to make distinction David, about the difference in data science between the problems you’re trying to solve. Could you just explain to the listeners about a platform versus a singular model, versus analysis?

David Purdy:  So the thing about data science is it’s often presented as if data science is ambiguous. The reality is, you have different kinds of products or deliverables. So a lot of data science is about delivering insights and analytics to the to leadership on corporate strategy, industrial strategy, organizational strategy, this kind of thing. And for that you may leverage models and you may leverage a variety of statistical techniques.

Another type of deliverable is the actual model that may go into production somewhere that’s deployed for all sorts of purposes it could be for spam detection. It could be for driving a car. But in order to support these efforts deployments of models, deployments of experimental frameworks. It’s very, very empowering to have a platform. So you have to think very carefully. What are we developing? Are we developing something where the data scientists are the users, where the consumer is a computer or where the consumer of the product is someone who’s charged with making strategic decisions. Once you identify those goals, then you can work backwards to how do we develop the practices and tools to move quickly.

Kit Feber: I think it’s fair to say, knowing you, David, that you’ve built both platforms and also trained and developed thousands of models across your career, Is that correct?

David Purdy: That’s right. In fact, all three of these avenues that I’ve worked on, I have developed reporting systems and processes for COOs and CFOs and so on, of course, have deployed thousands of models in different contexts and have in leveraging those experiences, designed tools and frameworks first for myself and then, designed frameworks and systems for use by hundreds of data scientists so that they could also be deployed thousands of models. In the end, these have led to billions of dollars of incremental revenue, as well as greatly accelerated the process of bringing things to market.

Kit Feber: And where would be a good example within your career David, of where you’ve been working on trying to achieve the speed of light research?

David Purdy: So I think a key example was when I worked in a high frequency trading. They’re the market evolves very, very rapidly. There’s really no other environment where information comes in, and your goal is to decide whether to act and how to act on it as quickly as possible. This is for the model that is in production. In addition though, markets evolve quickly. So there’s this this research challenge of how do we set up an environment, where we’re generating ideas that have a high chance of succeeding they’ve tested, testing those ideas very quickly and then having virtually no friction from the development of the idea to the deployment of the idea and then ensuring that the deployment is highly highly reliable. In high frequency trading, things can break very, very quickly as a number of companies have unfortunately experienced. And so when you think about that, you want to have simple, reliable, scalable tools. You also want to have a research agenda that allows you to examine what has happened with models with algorithms and strategies you’ve deployed and then use those to generate hypotheses on subsequent iterations. And, for example, when I worked at Goldman Sachs,  the effort that I lead on high frequency trading and interest rate products, I was able to deploy 30 model bundles in 30 weeks. And so we were developing and testing many models and strategies frequently than once a week and choosing the very best to deploy. And when we did so, we were able to show increasing returns on a daily basis. And so this was an example that I was very enthusiastic about. And really, as I’ve gone on in my career, I wanted to be able to create those opportunities for others. It’s exciting to, have your work have an impact, and to have a real clarity on what you’re researching, what you’re developing and dig into the really the deeper points of the system that you’re trying to produce, rather than the more painful points of managing deployments or systems in production that may have unresolved issues.

Kit Feber: Did the environment, i.e. with the stakes being so high within trading David, have any impact on the research agenda or the culture of research?

David Purdy: Actually, if I’ve been fortunate, not only is trading does it have its own sort of relatively high stakes, but I’ve also worked in national security, medicine, safety. Everything is important, really. In these environments, you should expect people will review your work. Really, science is about reproduce ability. So if you have a system that is clunky that is not well engineered, not well-architected well, then you’re not actually going to spend as much of your time on developing high quality research. If you have a research process that is not well organized, it’s maybe not going to be aa easy to reproduce. And so it’s actually the totality of these experiences, realizing that I wanted to be able to create systems that were inspectable, that could be shared with others and reviewed by others have more eyes on the problem, that’s that’s something that was very motivating.

Kit Feber: Are there red flags or things that you would speak to, to highlight that you see as being inhibitors on achieving efficient past research in iteration, David?

David Purdy: So at a high level, I tend to think about 2 goals that everybody should really pursue. 1 is clarity, the other is velocity and the thing about clarity is, are you working backwards from the goal, are you working backwards from the strategic objective, the business objective? And then what do you need in terms of the technical stack, what you need in terms of the team to build a technical stack and this technical stack is, what data feeds do we need, what algorithms do we need, what infrastructure do you need, what kinds of models or last functions we need all these kinds of things. And then so are were building the team to build a technical stack to solve the business problem.

So one inhibitor is just really lacking clarity and working in the wrong direction. So if you start off by build a team, then figure out the technical, deliverables and then try to find a um you know, a buyer for this thing that you’ve made your going to have a really slow go on it. The others velocity and velocity is, as we learned in high school is really its speed and direction. Everybody has the same number of hours in the day, and you can work very hard and you can have high speed, nut if you keep changing direction, or if you’re not clear on the direction that you’re going to go ultimately going to go, you’re just going to bounce around. And so if you think of the speed of light, it’s really the fastest speed from from point A to point B. And if you look back over the course of a project or an effort, you think, well, how often are we moving in this direction and given what we know, at what points did we ask the right questions? Did we make the right investments and what would have guided us earlier? So, it’s how do you set up the process?, that’s important, and then also, how do you equip the process with of this industrial research that will move you forward. In terms of inhibitors, so one is, like I mentioned, what are you working towards? So what you working towards on business goals? Were you working towards on the technical staff? That’s very important.

Another is really alignment across the organization. So you have data scientists, product managers, engineers, various executives. How are they all empowered and aligned? And when you look at a company, say, like Goldman Sachs, it’s an environment where there is a strong cultural push towards what I call lateral awareness so that people can move very, very quickly and are very empowered to do so and think that’s that’s critical. It’s not possible for one person to know everything, and it’s not what possible for one person to solve every potential hitch along the way. So you have to think about it as a culture you have to think about its practices.

Kit Feber: Moving from Goldman’s into what was a much smaller uber than it is today David, how was the culture different for you and how did the approaches differ in the relevant companies for you?

David Purdy: Well, so that’s very interesting. So I joined over when it was about 1700 employees and grew to roughly 20,000 to 25,000 by the time that I left last year. In that environment, Uber first off, it was legitimately data first. Data is the basis of everything in terms of matching writers and drivers, pricing, routing, etcetera of trips and then also managing the entire user experience, whether that’s writer, driver that subsequently everything from Uber eats to Uber freight.

One similarity between the two, it’s a broker of transportation services. You know Goldman has been at that for more than a century. Tthat was important for being able to set up the right data frameworks within Uber to recognize that you’re dealing with the life cycle of a customer and the life cycle of a transaction.

In terms of the differences, Uber was being much smaller, necessarily had to cobble together and leverage the best resources outside of Uber. That’s that’s very important and of course it was in a massive growth phase and in that environment, you need a very high degree of reliability for things that have never been done before. So that was certainly a very interesting situation to be in. There’s tolerance for things breaking that just as I said, that comes with things having never been done before. But we got there and that is a great environment.

Kit Feber: David, one of the flagship projects that you worked on a Uber was Michelangelo. Could you talk about that for us and and maybe about the strategy behind it and getting that right, as part of the key parts of process?

David Purdy: Sure. So when I arrived, I formed the first team of data scientists leveraging machine learning within Uber and quickly saw that there was not a single common tool. There was no effort yet invested in making machine learning available and integrated into the products, and it was very clear that if you’re going to go this route, you need this for everything from customer turned, to fraud, to ETA estimation and machine learning would be critical for the company. So looking back on the fact that I developed, machine learning tool kits and frameworks for myself and my team’s in previous employment, I pushed for having a machine learning platform. And so there is one difference here, which is that this is an enterprise machine learning platform. The idea is, it’s not just how did we get a single model developed, or how does a single user develop models and deploy them. It is about how do we create value across the entire company? And so there are entities, writers, drivers and trips, for instance, that many different teams have an interest in and what you want to do is have an environment where the underlying infrastructure and to the greatest extent, the underlying data is as reliable as possible and that data scientists can focus on developing ideas on predictors, on models, on response variables and investigating those as they look across Uber’s hundreds of markets across the many different products and can create that value in the most efficient possible way.

So rather than here’s a library that a person could download and they could build their own models and deploy them, there’s a resource that they could leverage that others could leverage. Then as people develop something of value, they’re putting that it into this platform. So an example would be predictors on customer turn. Maybe that those were useful for some form of marketing, it could be that they’re used for some form of product selection. So if one person has developed this, then they put this in there into a common feature story, and others could then leverage it. So with Michelangelo, the the idea was to capture that which is complex, that which it needs to be reliable, and that which is really not within the expected responsibility set of a federated group of users and put them into one environment and take out all of those frictions, all those impediments and all those things that if you look at the course of the year, you really don’t want your data scientists to have to spend time on.

And for that there is a very slight friction of routing your data, routing your models through my financial. But the payoff of being able to have this reliability in the payoff of being able to use others’s contributions is very, very significant.

Kit Feber: Would it be fair to classify it as a sort of machine learning as a service style platform David?

David Purdy: Well, it is, except I would put the word enterprise at the top because there are machine learning as the service paradigms. And again, their value proposition is that they can help a user developed a model and then deploy that model, right? So Michelangelo does all of that, but it also is an environment where people who are not responsible for models can also inspect the models. If there is a model where this behaviour is changing or breaking and an engineer or a product manager can go and look at various charts and diagnostics on and about the models about the output of the models, the inputs of the models, this kind of thing. There are abilities to have alerts and warnings and stuff. So the enterprise needs this and the goal is really on, yes, of course, you’re delivering machine learning, but it is integrated into the output of the enterprise, so it’s integrated in products. It’s integrated into analytics, this kind of thing, and so I think that’s that’s very, very important, just realise it’s not an activity by itself. It’s an activity that drives the value for the company.

Kit Feber: And I know because I know you that Michelangelo was has been incredibly successful. Is it still functioning in Uber David and are people still using it on a daily basis?

David Purdy: Yes, there’s actually a lot of usage. Hundreds, hundreds of users and thousands of models have been developed and deployed with it. And, you know, it’s very exciting. They’ve had significant results. They’ve really been able to have all sorts of results all over the world that have been very, very beneficial. I had a lot of use of Michelangelo. Its results, and the Michelangelo team was an extraordinary partner in the development of these really complex applications.

Kit Feber: David, just because people might not immediately think about safety when they think about Uber or a ride sharing company, why is the safety element so vital to Uber and can you speak to what your team did?

David Purdy: Sure, so at a high level. Safety is in the interests of everybody writers, drivers, regulators, the public, employees, investors and these are about rare events that are not anticipated in general. What you want to do is try to make an environment that is both measurably safer and from the experience and perception of users you can commit to or facilitate a stronger sense of safety. There have been incidents around the world in fact, from London to the United States to other other countries and municipalities, requests of Uber to report on and address, safety issues and Uber had an amazing safety report really released at the end of last year. To my knowledge, no other company has ever been as transparent about the nature and volume of incidents that have occurred in their in their ecosystem or their platform.

Kit Feber: So how big is Uber’s global safety team or is that an impossible question?

David Purdy: I guess I would put it as in there are hundreds of people who considered a primary responsibility, but really thousands of people who are thinking about it and contributing to it and many, many partners around the world. Lots of nonprofit organizations, government organizations that are that are involved.

I’s truly hard to say and that’s actually one of the things that was really amazing. I began my career at Uber thinking I’m going to develop a team that develops machine learning on my fourth, what we call Uberversary, I was with a wonderful team of people from around the world involved in safety operations and that’s truly remarkable in that, as I said, there are these these things that happened as Uber has more than five billion trips, rides per year.

You have these interpersonal moments where things over the vast majority of time, that everything is just as expected. Very occasionally there are there are issues that arise and working with these, these folks from many different backgrounds, whether it’s electrical engineering or consulting or operations, research or psychology, and developing systems and processes in data science that could support these goals was very, very exciting. I’d like to say what we were doing was making rare events rarer and as I mentioned earlier about the product goals we were delivering on platforms, algorithms and analytics, and that’s that’s all of these are absolutely crucial.

Kit Feber: David, I have to ask because we’ve spoken about this off air, you interviewed a huge number of aspiring Uber data scientists in your tenure at Uber. Obviously, not all of them were successfully hired. But could you just explain for listeners who are maybe, you know, interested in securing a job at a  top tech company within data science, how you interviewed people at Uber and provide any insights or tips for people on how to get a job there?

David Purdy: So there’s a number of things that are important. First off, it’s all of this is a growth process. I was fortunate throughout my career to have a number of mentors, so having a mentor and getting feedback is very valuable. Another is, to just continue to understand the problems base that you’re working on. The technical stack is around methods, tools, infrastructure code, all of these things. But really you’re working towards, how do you have an impact? And so that’s means learning the business as well as the technology.

Find those opportunities to work with engineering with product, with some folks working on marketing or even design user experience finance get inside their heads. What questions are they asking? What problems are the seeking to being inside on and working on strategy, getting involved into the nuts and bolts? So reliability, scaling. At first, it’s no fun but the reality is that when things break, it’s often in these foundational things. So instead of just sitting apart from folks, dive in and learn from from your partners and engineering and elsewhere.

Working on writing and speaking is very important. You have to get used to explaining things simply. So listen to the people who can and try to explain things to people who aren’t like you. If you make it an assumption that a person has a strong understanding of the topic that you’re presenting and you’re not stretching yourself, you’re not trying to get into where they’re coming from, and in some sense you’re not necessarily mastering the material that you’re trying to communicate because you’re already assuming people understand it. It really goes from there, depending on the path that person wants to pursue. There’s a lot of different technical and organizational considerations, but I’d start with realizing that you’re trying to learn and communicate with others.

Kit Feber: That’s really useful. David and I know that we’ve spoken in the past about maybe there’s a kind of gap emerging between people that are able to design and build models, versus actually to deploy them and kind of make them work in production. Are you seeing that as a trend in the market that employers are wanting data scientists that have almost, software engineering level coding skills?

David Purdy: Yes. And that’s actually the wrong mindset. Really, if you take a data scientist who hasn’t done much software engineering and you say, look, we’re going to put you on the critical path for a key release and so what you’re going to do is you’re going to learn a bunch of things you’ve never done before. You’re going to try and make them work. It’s gonna be lightly reviewed, and then it’s gonna be pushed out. You have no experience at the whole life cycle of deploying code of monitoring and debugging, but that’s that’s on you even though it’s your first time. That’s not a good place to start. That’s sort of the first iteration that starts with, well, you have somebody developing models, and then they throw things over the wall and somebody else re implements them and deploys that right? So you’re moving some of the work back to the data scientist.

Instead, really and Google, Uber and others have noticed user engineers for what they’re really good at. Building reliable, scalable systems. Think about what are those things that need to be done when you have somebody engaged for thousands of hours and many people engaged for tens or hundreds of thousands of hours on those algorithmic deliverables. So a lot of it around the release process can be simplified, instrumented, automated and just as that’s done for software engineering and softer releases. It can be done for model releases.

Kit Feber: That’s really interesting. We’re seeing more of a demand for full stack data scientists in inverted commas or requests for people that have written, production C++, or they can code within a date science position. So it’s interesting to hear you talk about that David. Do you have any other words of advice or input to people that are within a data science field looking for a job at a bigger tech company?

David Purdy: Well, so regarding the production Python and C++, it doesn’t mean that you can remove programming from a requirement. It is the requirements around the deployment process that really need attention. In terms of the work of data scientists, it’s a very reasonable expectation, but a data scientist can implement mathematical code in any given language, it is not hard to pick up.

Once you’ve worked in one language, you’ll see a lot of patterns that can be translated to others, and that’s very reasonable. The relationship of different services, of different infrastructure pieces that, if not well architected will just lead to a lot of misery for everybody frankly.

In terms of other advice, really try different things. During the down time, try to think about what you could do more quickly. Try to think of how do you set up a research agenda looking out weeks, months, quarters to solve different problems and then working with stakeholders, to ask what do we need to do work backwards from those goals, to investigate different models, different sources of data and then how do we develop the tools so that you can, generate ideas, test them, and then deploy them and you’re on your way to solving problems.

David Purdy: Thank you David. I was going to ask you about deep learning specifically because it’s all anyone talks about at the moment in this field. What’s your view on it, David? And are you seeing your networks actually, being fit for purpose and solving lots of problems, or do you think it may be a little bit over hyped? Could you shed some light on it for us?

Kit Feber: Sure, absolutely. It is an area that I would highly recommended, but it’s important to think of, how do you ask questions of data period. If you’re thinking of I have ideas, I throw them against the wall to see what sticks, you can do that with deep learning you could do that with traditional machine learning. That’s not asking questions. That’s not developing really a research agenda. With deep learning and teams that I’ve led with both Uber and autonomous vehicles, this has been a key component of visual based systems.

There are a lot of traditional works in computer vision. These have cut into different measures of utility and quality and so forth. Sometimes you can develop a product friendly quickly using these standard libraries, drop predictors extracted from them into different machine learning models and sort of be off to the races.

The thing is, is that these systems aren’t necessarily these tools and traditional computer vision aren’t necessarily optimized for your prediction goal or your inferential goal. By incorporating deep learning, you’re working more closely with the source material, images, videos, text, this kind of thing and working directly towards the prediction of the inference problem that you’re trying to tackle. It does take time. It’s also very useful if you have somebody that you can consult. So at Uber, for instance, the Uber AI Labs team is an upstanding partner to program and product teams throughout the company as they’re adopting deep learning. So they have a number of experts and they provide consulting and support capabilities. It does take time to develop that awareness. But overall, if somebody’s already committed to data science and machine learning then I would definitely say seek to go further

Kit Feber: You mentioned consulting David, obviously, that’s a focus of yours at the moment. How are you finding life as a consultant? Have you encountered any interesting companies? Do you think it’s the right kind of path for you moving forward?

David Purdy: This is something that I I’ve been approached by a number of friends and companies over the years about advice on X or Y or Z, and if they’re starting with, machine learning platforms such as the service at Uber, that was something that was very natural and again when starting the Michelangelo team, I filled in as the product manager speaking as the voice of the customer, i.e. data scientists.

Here’s what we need to develop, right? And so that’s an environment that is exciting because it’s about not just solving the problem within one company, so within one enterpise. But they’re developing systems for the benefit of obviously as many companies as they can attract and that’s exciting. I’ve worked in a number of different industries, and so I would love to help folks benefit from that.

There are various companies that are further along in their development of data science organizations and strategy and reaching that point of how do we improve the cadence? How do we improve the velocity? How’d we improve the career growth for data scientists? All these kinds of things. And on that i’m excited to help them out on that. We go through everything from very technical to very strategic and leadership and interests. Along the way, there are also some fascinating applications and startups in everything from consumer oriented to sass focused ventures that I’m advising and very excited about what these will be able to help people do.

Kit Feber: I was going to ask you about interesting applications of machine learning. You’ve obviously got your finger on the pulse, David. Are there any problem spaces or challenges or applications of machine learning that you think are particularly exciting over the next couple of years?

David Purdy: I can’t quite get into some of these areas given that I want to help some of these individuals and organisations get ahead, but there are verticals. This is industry, if you will and there are activities. So something that may be coming to many different companies, regardless of their industry that are not leveraging data much less machine learning, much less various advances in interfaces at all. It’s not even that we’re talking about going from like version seven to version eight. It is that there are just key holes in a lot of different, spaces where companies exist to either help other companies or help consumers.

This is very exciting. I, from a standpoint of taking a large space and then decomposing it and instead of thinking, ok, I want to make a known thing better, in just finding massive holes is very exciting. That was that was something that happened for me at Uber, there was no tooling for data scientists. It wasn’t that I want to build a better mousetrap on machine learning platforms, there was none, there was no tooling. Similarly, there had been no system for predicting in real time the demand and supply around the world. After the Michelangelo team was up and mature, I moved over to start the real time forecasting and there are some exciting, things where entire industries just a Uber filling a gap in transportation, there are entire industries, where they’re really not leveraging data and machine learning. Then there are a lot of activities done by people that really are fairly primitive. I think we can do a lot to help them do either work or fight or engage with other companies or other people much more efficiently. I had to get into the specifics, but I think that there’s really a lot of very foundational stuff that has yet to be touched.

Kit Feber: We are increasingly talking to candidates that are interested in working on machine learning or AI both for good or social impact David. Are you seeing a trend for that in your network, or is it becoming a hot topic?

David Purdy: Honestly speaking, this is something that has been around for a very long time. This is motivated a lot of statistics over the course of more than a century. How do you improve everything from agriculture to education, medicine, to social services? How do you develop the right quantitative awareness? How do you develop the rate forecasting?

In these environments there’s real consequences for individuals and there’s also a real shortage of high quality data, of actionable data  and talents and so I think having more minds on this is very, very valuable. There’s a lot first world problems that people want to solve through clever advances in machine learning, but there’s a lot of social issues that could really are unaddressed across the board. It’s not just by data science in machine learning, but I think this is one of more stimulating and rewarding areas that one could go into.

Kit Feber: David I wanted to ask you about conversion courses or online machine learning courses. Being someone who’s been in machine learning for over 20 years, what’s your perception of them and are they a good avenue for people to go down that are looking to move into a machine learning, would you say?

David Purdy: Certainly they give you a lot of ability to work at your own pace and autonomously so there’s no competition for having that flexibility, that flexible path from learning something. But at the end of the day, you’re trying to develop a talent for solving problems, right? So focusing too much on what past people would call book knowledge, is not necessarily understanding how to solve a problem and you can’t just look up in a book. That’s what I would say it is. Think about projects. Think about opportunities to collaborate. Think about hackathons and this kind of thing.

Even if you’re starting with something that has a very humble but time limited goal of a project, whether it’s a hackathon and it’s just a couple of days or a project that runs for a couple of weeks or a month, I think that’s important. You’ll quickly reveal all of the things that you realize you didn’t know. As you look at that and it goes back to what is a technical stack? What do we need to deliver that technical stack? Just getting more fluency with your stack. So instead of just watching videos about say SQL, database you can then leverage it in some sort of an application. I think that’s very, very important and you can tell quickly if a person has a lot of experience and the mindset for solving problems or if they’re gonna need a lot of guidance on how to use the tools they already know in order to solve new problems.

Kit Feber: Interesting. So I guess the message there is applying the learning and getting the leads and breaking stuff is the key part.

David Purdy: Absolutely so use the resources, but they shouldn’t be your only investment of time.

Kit Feber: Have you had any interesting books recently David?

David Purdy: I’ve actually been reading some work on renaissance architecture lately. I try to keep up with a lot of different aspects of data science and machine learning, but these days I’m thinking a lot about how people should communicate their learnings in the past over the years and that’s very dear to my heart.

Kit Feber: Interesting David. Are there any data science and machine learning books that you hold in particularly high regard?

David Purdy: I’ve always loved the elements of statistical learning. Honestly, a lot of what really interests me comes down to practice and with what folks have learnt from their practices, and I think I’ve just collected readings from Google to decades or centuries ago.

A couple of books that I also like are advice for a young investigator. It’s quite gated. It was written in the late 18 hundreds so its comments on social behaviours and norms and self worth is a little more reflective of its time than the current the current year. However, its advice on best practices by junior scientists is outstanding, and there’s not a lot of books in this domain.

Another book that I’ve often shared with colleagues is called the unwritten laws of engineering and it starts with the premise that when you begin your career, you master the technical expectations of your of your role. Nobody teaches you how to work with others. You have to work with others, and it has a lot of precise and useful advice on out to manage projects, work with others, execute and communicate etc.

Kit Feber: They all sound very interesting. I think we’ll link those books with the podcast so that listeners can get their hands on them. I just wanted to ask if you wanted to summarise or go back to any key points, David, that you think listeners should pay particular attention to before we wrap up?

David Purdy: Well, in general, you’re going to try to solve a lot of problems in this world and so as you’re you’re doing this not only for the issue that you’re tackling but to reflect on it over time and what could you have done better, both in the process, in the algorithms, in the data used. Be humble. It’s better to make many small iterations that, overall are marching towards some goal than to take forever, especially earlier in your career, on something where you’re not going to see the outcome for a long time. You’re going to learn a lot along the way. Try to find the people that can give you the feedback and try to take humble, incremental steps, but always reflecting on what you could do better.

Kit Feber: David. It’s been a pleasure. I know how much you know and how much you have to talk about in this field, so I’m gonna drag you back onto the pod again in the near future and we should cover some other interesting areas, but for now, thank you very much.

David Purdy: Thank you Kit. It’s always a pleasure to talk to you.

Leave a Comment