Deep Learning ‘Spoken’ with Dr. Jon Krohn
Nov 04, 2019
We’ve done our best as imposters to talk about machine learning and in this episode we have an expert join us. Dr. Jon Krohn is the Chief Data Scientist at untapt and is the author of Deep Learning Illustrated.
Jon provided us with a ton of insights about the space and also about what deep learning is not. We think you’ll learn, laugh, and sometimes be surprised in this show.
Want to support our podcast? Two things anyone can do: subscribe and give us a review on Apple iTunes.
Mentoring with Reshama Shaikh
Oct 21, 2019
In this episode, we talk about mentoring with Reshama Shaikh. She explains some of the do’s and don’ts of mentoring.
Reshama is an independent data scientist/statistician and MBA with skills in Python, R and SAS. She is an organizer of the meetup groups NYC Women in Machine Learning and Data Science and NYC PyLadies.
If you’d like to learn more about Reshama or her work, including the post about mentoring, check out her blog.
Did you get my message? Messaging and Queues
Oct 07, 2019
Antonio asserts that this is one of those topics that is rarely talked about and less rarely taught in colleges and universities but is a staple of any enterprise system.
In this episode, Antonio tries to explain the concept to Jordy. Don’t worry, he’ll quiz him later.
Our sponsor is the University of San Francisco. To learn more about their Master’s in Applied Economics , go to usfca.edu/dsi
How does Manufacturing evolve? With software and data, of course.
Sep 30, 2019
We know nothing about manufacturing. Okay, maybe Jordy knows a bit more than Antonio but we, collectively, still know nothing.
In this episode, we talk with Adam Montoya and Christian Gonzalez who give us a lesson in manufacturing and also tell us about Bright Machines. Bright Machines’ tagline is Software-Defined Manufacturing and we want to know what that is.
If you’re down here, we know you’re still listening. We are still actively looking for a venue for a one-day data science conference. Email us if you’d like to partner with us.
Starting an Online Company: Talk with Moonllight.com Founders
Sep 23, 2019
In this episode, we talk to Stefan Woort-Menker and Kaitlyn Wojtaszek, founders of Moonllight.com, about their journey to start an online company.
This is an episode about the challenges of getting your product to market, how the idea was formed, the pitfalls to avoid, and the hype versus reality.
I would say more but then you’d never listen to the episode.
We are working hard to create these episodes and provide appropriate content. We’d love to hear from you about how we are doing and would appreciate any leads for potential advertisers. Email us at Antonio or Jordy @ datascienceimposters.com – Thanks for the support.
NVIDIA takes the driver’s seat
Sep 16, 2019
Rules first. Here we go …
What does the manufacturer of video cards have to do with autonomous or driver-less cars? Find out from our next guest, Nico Koumchatzky.
Nico, Director of AI Infrastructure, helps us navigate through our understanding of this technology. He’s even got advice for those of you starting your career in data science, artificial intelligence, or a related field.
Looking forward to hearing from you on this question: What happens to the passenger side front seat when cars are completely autonomous? Do they become a relic of the past?
Complex Systems Do Not Add Up
Aug 26, 2019
The Complex Systems Society has this on their website ‘The most famous quote about Complex Systems comes from Aristotle who said that “The whole is more than the sum of its parts”‘
In this episode, we invite Dr. Bruno Gonçalves to explain it to us, the data science imposters. Bruno gives us a nice mix of examples blended with the academic references associated to them.
If you deal with data that has relationships, is interconnected, or networked in some way you need to understand the concepts of Complex Systems.
The Art of Digital War: Enter the Blockchain
Jul 30, 2019
Would you give up Control for Convenience? Blockchain will protect you.
Vikal Kapoor joins us on this conversation. Vikal is an experienced entrepreneur whose latest venture as CEO of Dapps is solving customer experience problems using Blockchain technology. Specifically, they are building customer relationship management (CRM) software on top of an Enterprise Blockchain computing platform.
Vikal reminds us that for every convenience that we are afforded by technology, we may be losing a bit of control – control of ourselves and our data. Our Twitter poll taken by 12 people shows that more would give up control for convenience. We think that percentage is much higher (our poll was just very limited).
Vikal paints Blockchain as the hero we didn’t know we needed. Is he right? Only time will be able to tell.
Democratize your data. Do this first to get real results…. (w/ Leigha Jarett)
Jul 15, 2019
Your company wants to catch up with the buzz of data science, machine learning, or artificial intelligence. What do you do first? Hire a PhD in Physics to build artificial intelligence algorithms? Hire a team of consultants?
Leigha Jarett join us to give us her view. As an employee of Looker, she’s had this conversation with her clients many times. Democratize your data.
As always, we get to know our interviewee and hopefully Leigha teaches Antonio how to perform a hockey stop because we’re tired of him crashing to stop.
In this data science podcast, we mix technology, data science, AI, and a bunch of other data in a casual conversation.
How do I become a serial tech entrepreneur? (with Elissa Shevinsky @ElissaBeth)
Jul 01, 2019
Elissa Shevinsky, CEO of Faster Than Light, gives us some insight about entrepreneurship in technology. As a successful entrepreneur, she’s seen a few things and has gained much knowledge on entrepreneurship which she shares on this episode.
Join us as she walks us through the journey of starting multiple technology companies.
Give us 1 star.
Jun 17, 2019
Antonio and Jordy talk about the company in California that’s so fed up with Yelp that they have requested 1 star reviews – they’ve almost made it mandatory given that they give 50% off their otherwise very expensive pizza pies.
Their conversation then devolves into other tangents.
What’s a p-value and can you trust it?
Jun 10, 2019
The notes below are inside look on how we structured this week’s episode …
If your p-value is higher than .05, you can’t publish research.
Antonio: What is a p-value?
Jordy: The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true
Antonio: Okay, so if you’re like me … not a statistician … you want to have simpler language even if the explanation is longer. I reread that and think man … what is this null hypothesis? That’s really what got me hung up. How about you Jordy?
Jordy: ….
Antonio: So, the null hypothesis is that whatever you are trying to test has no significant difference from the population. Your hypothesis, whatever you are trying to prove,is called the alternative hypothesis.
Okay, so let’s make up an example. I’ll use Penn State’s List of 7 steps.
Define null hypothesis
So, let’s make up a null hypothesis – College freshmen students gain an average of 10 lbs during their first year of college. Let’s say we have the standard deviation of 3 lbs. Our null hypothesis is that there will be no significant difference between this population and our sample.
2. Define alternative hypothesis
Our alternative hypothesis is that students that are given an electronic scale at the beginning of their college year will impact their weight by the end of the year.
3. Set probability / alpha
.05 – 5% ?
Why would you make this number lower than 5% ? What if you get it wrong?
4. Collect Data
Experimental or Observational
5. Calculate the test statistic
Okay, I think where the Statistician magic happens the most. Essentially, this statistic measure compares the data that we collected or observed compared to our overall population.
I say that the most magic happens here because you need to know a bit about the distribution of the data and the pluses and minuses of each statistic. You then need to plug in the data to the formula – even I can do that part.
6 / 7 – Based on that measure – which is sometimes just how many standard deviation a our data is from the population – we then need to measure the likelihood of that
Now you have the likelihood on it and then you compare if that’s lower than your alpha value. If it is, you can now reject the Null Hypothesis.
….
It sounds decent to me so why is there such issue with p values? Well, I think when people are given one measure for success, they’ll figure out how to beat it, fudge it, or bend the rules a bit.
Inflation bias, also known as “p-hacking” or “selective reporting,” is the misreporting of true effect sizes in published studies (Box 1). It occurs when researchers try out several statistical analyses and/or data eligibility specifications and then selectively report those that produce significant results [12–15].
Common practices that lead to p-hacking include:
conducting analyses midway through experiments to decide whether to continue collecting data [15,16];
and stopping data exploration if an analysis yields a significant p-value [18,19].
recording many response variables and deciding which to report postanalysis [16,17],
deciding whether to include or drop outliers postanalyses [16],
excluding, combining, or splitting treatment groups postanalysis [2],
including or excluding covariates postanalysis [14],
According to one paper we found,
The Extent and Consequences of P-Hacking in Science, while p-hacking is probably common, its effect seems to be weak relative to the real effect sizes being measured.
270 authors work to repeat 100 experiments. ‘Even with all the extra steps taken to ensure the same conditions of the original 97 studies only 35 of the studies replicated (36.1%), and if they did replicate their effects were smaller than the initial studies effects.’
Climate Change – how can we use data?
Jun 03, 2019
Two days after recording this, we saw the article ‘Trump Administration Hardens Its Attack on Climate Science’ – https://www.nytimes.com/2019/05/27/us/politics/trump-climate-science.html
Climate Change is a topic that requires arguments based on data not rhetoric. Science not politics. Facts not opinions.
During our conversation, we referenced the following:
Here’s our simple, straightforward guide to starting a podcast. Yes, you can do it.
For our hardcore data science listeners, we’ll be back next week.
How does New York City use data?
May 13, 2019
Dr. Alaa Moussawi joins us to talk about how the NYC Council is leveraging data and how this data is helping New Yorkers along the way.
NYC makes their data available so you too can explore it and answer the questions that you care about – https://opendata.cityofnewyork.us/
To learn more about Alaa’s Data Science Team, visit https://council.nyc.gov/data/
Predicting crimes without bias and prejudice – is it possible?
May 06, 2019
Predictive policing companies have received some unwanted attention. Police departments are minimizing their use, if not eliminating these programs altogether.
Predictive policing aims at leveraging more data when determining crime areas, identifying perpetrators, and allocating resources.
What do you think? Tell us on Twitter @dsimposters or email us at jordy@datascienceimposters.com
DSI Week in Data Science-y News with Miguel Garcia
Apr 29, 2019
Our friend, Miguel Garcia, pops in on this episode. We always love having Miguel on the show. This is especially true today when we get his take on the latest news events we are covering in this episode.
While we have Miguel on he talks to us about his role as a Senior Solutions Engineer at Looker. Miguel also tells us about his journey from a risk analyst to his new role and what helped him make that transition.
We discuss the Professional Engineer’s Creed and whether the Data Science industry needs a similar oath.
Interview with Kenneth Reitz
Apr 22, 2019
Our agenda with Kenneth was to have an honest, unscripted conversation and we succeeded.
If you’ve never heard of Kenneth, you may have heard of the several Python tools that he has helped to authored requests and pipenv. (According to https://pypi.org/project/requests/2.9.1/, Requests gets 43 million downloads per month.) Or you’ve read his book Hitch Hiker’s Guide to Python. Or you’ve listened to his song – don’t worry, we have a sample of one of his tracks – ‘Push and Pull’ – at the end of the episode.
Listen to the episode then check out his site: kennethreitz.org
Containers & Japan with Brian de Heus
Apr 15, 2019
Brian de Heus is the Chief Technology Officer at Adgorithmics in Tokyo, Japan. We leveraged social media to reach out to him and ask him to join our show remotely. The first challenge was getting over the 14 hour time difference. The second hurdle was making sure that we stayed on topic since Antonio has always wanted to visit Japan and here was Brian telling him how great it was.
Brian explains to us what his company does, the problem that existed for his company, and how he solved it with containers. That’s really how you should approach technology. You have a problem, you look for a solution – new or old technology, and you apply it. Don’t find a technology and then look for a problem – we’ve said it before and now we’re writing it down.
Listen to this episode to learn more about Container technology, Brian, Adgorithmics, and living in Japan. We hope you enjoy and please send us a note, give us a review, or send us a tweet. Until next time …
Economists do Data Science Differently w/ Dr. Peter Lorentzen
Apr 08, 2019
If all you know about economics is ‘supply and demand’, you may be in for a good awakening with our next interviewee. Dr. Peter Lorentzen is professor in the Economics Department at the University of San Francisco and he is starting a new graduate program in Applied Economics.
With the recent teams of Economists that firms like Uber, Apple, and other Silicon Valley employees are hiring, we wanted to get more details. After listening to Peter talk to us about the vision for his program, we fully understand the motivation by these companies. These companies are not the first to hire economists; Bell Labs has done this previously. According to Peter, the requirements to obtain an economics degree have not changed on the way in and USF is trying to change the skill sets economists get on the way out.
If you’re interested in reading more, you should also read these articles:
1.Why Tech Companies Hire So Many Economists – 2019 https://hbr.org/2019/02/why-tech-companies-hire-so-many-economists
In 2011, Marc Andreessen wrote a fantastic article Why Software Is Eating The World. This motivated our title for this podcast. In this podcast we talk about software bugs that have fatal consequences, major financial, and explore why these bugs happen so frequently.
Which software are you worried about in your life? Your car? Your plane? Your money stored in digital form? Were you worried about Y2K or thinking about the Epoch Bug?
Public Service Announcement: If you haven’t read Marc’s article, please do.
GDPR, huh, what is it good for? Absolutely something?
Mar 25, 2019
If you’re outside of the EU, you may not know about GDPR. In this episode, we invite our friend Maria to discuss the General Data Protection Regulation. We discuss the What? Who? When? Where? We also talk about the business impact of these rules, what it means to implement them, and what is still to come. If you’re in the USA, you should know that some rules are coming to California then – we should all think about the importance of these rules – as individuals and as business owners.
Natural Language Processing with Prem Ganeshkumar
Mar 11, 2019
How do you make a computer program understand a language, a natural language, like English? Can we create a program that can create sentences, paragraphs, articles, or stories?
In this episode, we explore the basics of Natural Language Processing from Prem Ganeshkumar who is a Lead Natural Language Processing Research Engineer at Agolo in New York City.
After this episode, you should be able to pass our Data Science Imposters Natural Langugage Processing Quiz – https://docs.google.com/forms/d/e/1FAIpQLSdmgFmnoLfyxfu0tc_xjk4HS4u3ZxaAi4X-XxuoYfABLPkWmg/viewform
Fight or Flight?
Mar 04, 2019
At Antonio’s latest visit to the Motor Vehicle Commission, he finds out two things. First, sometimes it’s hard to change data once a mistaken occurs. Second, he will take flight before a fight.
We then talk about Amazon and the second headquarter that never was. Apparently, Amazon feels like Antonio – they also took flight.
With or without Amazon, NYC’s tech scene continues to flourish.
An interview with Dr. Noemi Derzsy
Feb 19, 2019
Dr. Noemi Derzsy is currently a Senior Inventive Scientist at AT&T Labs within the Data Science and AI Research organization. In addition to getting to know what her position entailed, we wanted to know how she got there (especially given her Physics background), how she thought about the data science discipline, and where she saw the future of the field.
Dr. Derzsy gives us her take on data engineering vs. data science teams. Antonio and Jordy seemed to agree. She gives advice to those looking for opportunities in the field. She uses her words more carefully than we do and used the term ‘geographically isolated’ for one of the places that we three had in common. We could not agree more with her assessment.
We are in awe with how much Dr. Derzsy has done in the field thus far and will certainly keep tuned for newer developments.
Ep. 49 – Interview with Dr. Haftan Eckholdt – Chief Data Officer & Chief Science Officer at Understood.org
Feb 01, 2019
In this episode, we interview Dr. Haftan Eckholdt – Chief Data Officer & Chief Science Officer at Understood.org. Understand’s goal is to help the millions of parents whose children, ages 3–20, are struggling with learning and attention issues.
Haftan takes us through his journey from rigging computers for distributed computing to generating models associated to the insurance industry and then onto technology companies (including Plated and Audible).
Haftan has plenty of ideas about how to integrate data and technology in his latest endeavor. He discusses direct and indirect data and examples of how data can be leveraged in positive ways. We certainly look forward to hearing about his success.
Cheap Noodles, Beers, and Bayesian Econometrics with Jim Savage
Jan 17, 2019
Jim Savage is the Head of Data Science at @lendableinc He joins us on the show to take us to school about statistics, his company, the work he’s been doing, and cheap noodles.
Jordy and Antonio have Jim make sense of this blog post for them: https://khakieconomics.github.io/2017/01/01/Building-useful-models-for-industry.html
Along the way, we learn how Jim’s life really revolves around pizza. (Please note that he had pizza for dinner). He also talks about commitment and how that translates to short term goals (30 days) and long term goals (6 years).
We made a new friend and we hope you enjoy the conversation as much as we did.
S.O.S. – 5G is coming
Jan 08, 2019
As data science imposters, we get to think about the history of things that came before 5G or whatever new whizzbang technology. In this episode, we talk about the origins of Morse code and related items.
Did you know that you could play songs using an old phone? Check out this site for some ideas: http://www.yak.net/carmen/phone_songs.html
We are excited about some of our future interviewees, we’d love to hear from you – rate us, like us, and keep listening.
Happy Holidays!
Dec 25, 2018
Holiday wishes from the DSI crew.
Logic, data, beers, and free talk
Dec 17, 2018
Fueled by a few cold IPA beers, a microphone, and a quiet room provided by a friend of the show – the Data Science Imposters grab the mics and have a good conversation. Join the conversation on Twitter @dsimposters
How much would you pay for the first auctioned AI-generated portrait? Make your own with GANs
Nov 26, 2018
Can a computer be artistic? If so, how valuable can their pieces of art become? The Obvious Group certainly thinks so and they have made their first sale of a computer generated portrait for $432,000. They’ve done this by doing some great promotion but also leveraging code published on the internet built on GANs (generative adversarial networks).
No.43 – Prisoner’s Dilemma and the Nash Equilibrium
Oct 29, 2018
Have you seen the movie a beautiful mind? Have you ever been facing jail time if you cooperate against an accomplice?
Well, let’s focus on the movie first – there’s a scene where five guys, one of those guys being John Nash, and they are at a bar when a group of women walk in …
In the movie, the focus of the men is on the blonde – one of the women.
They then go on to quote Adam Smith – “In competition, individual action serves the common good” – we will return to that in a second. Here’s a bit about Adam Smith from Adamsmith.org if you have never heard of him – ‘Adam Smith (1723-1790) was a Scottish philosopher and economist who is best known as the author of An Inquiry into the Nature and Causes of the Wealth Of Nations (1776)…’
In the movie, John Nash claims that Adam Smith needs revision. He says it’s what’s best for the individual and the group. He then lays out that if they all went for the blonde, they’d all get rejected and having a chance with her friends would then be impossible since no one wants to be second choice.
So, is the movie trying to depict the Nash equilibrium and if so does it do correctly?
Okay, that’s not fair – read the definition from Wikipedia, ‘In game theory, the Nash equilibrium, named after the late mathematician John Forbes Nash Jr., is a proposed solution of a non-cooperative game involving two or more players in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy.’
What does that mean to you?
Okay, let’s see if this helps. The prisoner’s dilemma –
There are two people convicted of a crime – let’s say Antonio and Jordy. You’ll normally see this example in a table format so imagine this in your head.
Jordy has two choices and Antonio has two choices. Keep quiet or betray their partner. The person that confesses against their partner gets out Scott free. If they both confess, they both get five years. In one confesses and the other doesn’t then one gets 10 years and the other gets nothing.
So, what do you do?
Nash showed for the first time in his dissertation, Non-cooperative games (1950), that Nash equilibria must exist for all finite games with any number of players. Only 27 pages!!!
I like to think of it as the point when all players have played their best strategy knowing what other players can do.
Adam Smith does write ‘It is not from the benevolence of the butcher, the brewer, or the baker that we expect our dinner, but from their regard to their own interest.’ – Wealth of Nations
Cooperative vs. non cooperative games – in cooperative games there is some sort of binding agreement.
So, let’s return to the Beautiful Mind movie. What scenario would be a Nash Equilibrium?
#gametheory #gametheory
Entrepreneurship and Deep Space with Jake Kramer of Fed Tech
Oct 22, 2018
Jake Kramer joins Jordy and Antonio to display his love for deep space, give background and context of his company, share his thoughts on entrepreneurship, and wow them with some of the technologies he’s been able to explore. Jake Kramer is a managing partner at Fed Tech, a unique private venture program, funded by federal agencies and corporate partners, that connects entrepreneurs to technologies developed across DoD, NASA, DOE and other laboratories.
Jake received his MBA in Entrepreneurial Management from the Wharton School of the University of Pennsylvania and his BBA in Computer Information Systems from Hofstra University.
Most importantly, Jake listens to Data Science Imposters and you should too.
Check out Fed Tech’s website – https://www.fedtech.io
Summer Break is Over! We’re Back!
Sep 24, 2018
Jordy and Antonio catch up after a long hiatus. Come catch up with us.
Jordy even commits to reading, nay listening, to a book for our next episode. Antonio discusses the latest book that he finished ‘When’ by Thomas Pink.
Welcome back, we’ve missed you.
Reading Rainbowish Episode.
Apr 22, 2018
The data science imposters have been doing these book reviews for a while now and Jordy was finally able to find time to read and discuss a book. We are all starting to doubt he was ever on the show Reading Rainbow.
Jordy tells us about:
Guns, Germs, and Steel: The Fates of Human Societies by Jared Diamond
Antonio and Jordy talk commuting in the NYC metro area, thats where all the reading time comes in.
Antonio discusses:
Black Edge – Inside information, Dirty Money, and the Quest to Bring Down the Most Wanted Man on Wall Street.
Web of data, companies, private information, Facebook, and those friends of yours that signed up for that questionnaire
Apr 09, 2018
In this episode, we discuss what we have found out about the Facebook ‘data breach’, Cambridge Analytica, and the connections from companies to people to political parties to Facebook and ultimately to you (or one of your friends).
Education, Technology, and Analytics with Josh Powe
Mar 27, 2018
Josh Powe, CEO and co-founder of LinkIt!, joins us this week and he takes us back to school (I know, bad dad joke). He gives us an understanding of his industry, how technology has evolved in K-12 education, and he tells us what we may expect to see in the coming years.
LinkIt! is a robust K-12 assessment platform with powerful reporting and longitudinal data analysis tools.
The name of the IBM machine that beat Ken Jennings on live TV
Mar 19, 2018
‘Who is Watson? ‘ … That’s correct for 200. In this episode, we talk about Watson and Jeopardy as we review “Final Jeopardy: Man vs. Machine and the Quest to Know Everything” by Stephen Baker. We also have some fun by throwing in a few Jeopardy answers throughout the show.
Before we get into that, we talk about March Madness!!! It may be too late for this year, but next year we need to get those data science models in place to make our bracket picks. Is the stock market easier to predict than the NCAA championship?
Antonio finds this article which explains information gain and entropy but has a hard time explaining it to Jordy. Check out the article and let us know if you could do better: saedsayad.com/decision_tree.htm
Entrepreneurship has no roadmap – Interview with Jorge Nuñez
Feb 25, 2018
Jorge Nunez is an entrepreneur, investor and as Forbes puts it, an “idea man”. He shares his story of starting his company Remote Reactivation, from early days in event promotions to becoming a player in the world of dentistry. Entrepreneurship has no roadmap and Jorge combines technology, data, and his perseverance to succeed in an otherwise neglected field.
He leverages relationships, and the benefits of networking. He describes keys to his success, such as understanding who the decision maker is, his belief in the 80-20 rule, and leveraging data to improve his client’s work-life balance.
Story time: ATMs Spitting Cash, Crypto Paradise in Puerto Rico and Uber Making Deals
Feb 12, 2018
Stories about ATMs Spitting Cash, Crypto Paradise in Puerto Rico and Uber Making Deals
Antonio reminisces about his high school days and attending 2600 meetings at the Citi Corp building while discussing the ATM hacks or jackpotting.
Did you know the “whistle” provided in the Captain Crunch cereals could be used to make phone calls?
Did Antonio find a glitch in the Starbucks Card method?
…
Last couple days have been tough on Crypto Currency, but didn’t stop the NYTimes from publishing some information on folks trying to make Puerto Rico a Crypto Utopia.
Are these Crypto Currency folks going to the Rockefeller’s of our time?
…
Did Uber make the right call? They made a deal with a Hacker, and are paying the price for it. Maybe they should have just done a “bug bounty”
Can you read and understand this better than Alibaba or Microsoft’s AI? If you get sick will you call 311 or Yelp first?
Jan 28, 2018
Microsoft outscores Alibaba which outscores humans in reading and answering questions! Well, the truth is that humans also outscore Microsoft and Alibaba. It depends if you are using the Exact Match (EM) or F1-score scoring methodology. While we discuss some of the technical aspects of this, we do not lose sight of the socio-economic impact that these technologies can have on society.
In data science, we often try to use other data to gain more insight into a particular problem or situation. In our second segment, we spend some time exploring an article where they use Yelp as a proxy for identifying food borne illnesses in NYC.
Rebooting … Same purpose, different format
Jan 15, 2018
We start the new year with a book review of Peter Thiel’s Zero to One. As always, we add some levity by reading the 1-start reviews on Goodreads. We ask ourselves if there’s a Zero to One idea within Data Science. We come up short. Antonio loves libraries and defends them as one of the best institutions in our country. Do you agree? We finish the episode with a discussion of David Robinsons blog post shared by one of our listeners – thanks Diane!
Episode has been archived here: http://traffic.libsyn.com/datascienceimposters/DSI_Episode_33.mp3
Are you misbehaving?
Dec 10, 2017
Imagine that you go into a sportswear store to buy a snowboard to join your friends for a trip in Vermont. You see a used snowboard for $100 and you know the salesperson from high school and they tell you that you can find the exact same thing 15 minutes down the road for $30 dollars less. What would you do? Imagine if the snowboard is $500 instead of $100; would you drive 15 minutes to save $30 dollars? Antonio spent one flight from Dublin to New York and the following week finishing ‘Misbehaving’ by Richard Thaler. Antonio shares what he gained with Jordy.
Building a Machine Learning Platform – Interview with Dr. David Purdy
Nov 26, 2017
We start this episode addressing the ‘disease’ (our words, not his) of Impatient Data Science and how to cure it with a platform. We are excited to have David share his thoughts and practical experience building machine learning platforms.
Dr. David Purdy is currently a Senior Data Science Manager at Uber. Most recently, David architected Uber’s Machine Learning Platform and its real-time spatiotemporal forecasting platform which are the basis for driving Uber’s competitive advantage. Throughout his career, David has led the architecture of five such platforms. David holds a PhD in Statistics from UC Berkeley, and his career in data science and machine learning spans multiple industries including: finance, personalized medicine, transportation, and web search.
One idea that resonates well with us is the thought that you go from zero to something and iterate between something and the nth somethings until you get it right.
If you are interested in more details about Uber’s Michelangelo Machine Learning Platform, you can visit here: https://eng.uber.com/michelangelo/ In addition, you can send us any questions that you may have.
We invite a friend to join us this week (actually two weeks ago) to relive some of our favorite episodes. We invite you to join this casual conversation.
Next week we are excited to have a special guest on the show to discuss his experience building Machine Learning Platforms.
We got an IDEA, actually we got lots of ideas – Part II @RPI
Oct 26, 2017
In this episode, Dr. Bennett takes us back to school and teaches us a few things about machine learning, artificial intelligence, data analytics, and visualization. Along the way, we discuss how to incorporate teaching of these topics in colleges and high schools and some of the moral issues that may arise with artificial intelligence.
‘Dr. Kristin Bennett is the Associate Director of the Institute for Data Exploration and Application and a Professor in the Mathematical Sciences and Computer Science Departments at Rensselaer Polytechnic Institute. Her research focuses on extracting information from data using novel predictive or descriptive mathematical models and data visualizations … to support decision making … in science, engineering, public health and business. She has 25 years of experience and over 100 publications in these areas.’ Read more about Dr. Bennett here.
On the road: @RPI Homecoming 2017 (Part I)
Oct 22, 2017
We return to Rensselaer Polytechnic Institute (RPI) for the 2017 Homecoming Weekend. We share our experience and reminisce about good ol’ RPI. This episode is less structured and less ‘data-sciencey’ than most of our other episodes. We hope you enjoy this casual episode and tune back to Part II when we jump back into the depths of data science …
Here’s to old R.P.I. Her fame may never die, Here’s to old Rensselaer, she stands today without a peer, Here’s to those olden days, Here’s to those golden days, Here’s to the friends, we made at dear old R.P.I.
Alice and Bob have a secret
Oct 16, 2017
Alice and Bob have a secret that they want to share but they don’t want you or anyone else to know what it is. In this episode we talk about cryptography at a high-level. We touch on symmetry, hashing, and even steganography (not to be confused with calligraphy). How would you hide your secret? Is there a hidden message in this message?
The 7 things we had to discuss this week with @MickeyGarciaD
Oct 09, 2017
This week we are joined by Miguel Garcia. Miguel is a friend of the show and an avid listener. It’s good that he joined us since we had lots of topics to discuss including:
The Google Machine Learning tool – https://teachablemachine.withgoogle.com/
Language processing and Forensic Linguistics – Have your cake and eat it too – https://en.m.wikipedia.org/wiki/Manhunt:_Unabomber
The Trolley Problem and Driverless Cars – Inspired by Radiolab episode – check it out – http://www.radiolab.org/story/driverless-dilemma/
Tattoos that shows dehydration and Knockout Bands – https://news.harvard.edu/gazette/story/2017/09/harvard-researchers-help-develop-smart-tattoos/
Pixel Head Phones – Live Action Translation – Here’s Google’s page: https://store.google.com/us/product/google_pixel_buds?hl=en-US
A little bit about out guest this week …
Miguel is a data analytics young professional from Puerto Rico currently working with Looker in NYC. Prior to this he worked at Etsy where he helped start the finance data analytics team and as an analyst at Goldman Sachs. He enjoys learning about data pipelines and data science workflows. He likes to spend his spare time cycling, hiking, and cooking. Find him on twitter as @MickeyGarciaD.
Programmer’s Paradise (in OOP)
Oct 01, 2017
As I walk through the valley of the shadow of code
I take a look at my programming style and realize there’s none of it ‘Cause I’ve been copyin’ and pastin’ so long That even my momma thinks that my mind is gone But I ain’t never crossed a problem that didn’t deserve it Me be treated like a newbie, you know that’s unheard of
This week, I get as nerdy as I can without losing half of my audience. Jordy seemed exhausted after this conversation. Let us know how you think we did.
Your credit score and identity may be in danger.
Sep 25, 2017
Jordy and Antonio meet to discuss the massive data breach at Equifax. In the United States and abroad, Equifax houses some of your most personal information allowing you to be recognized as a financially credible person to banks, companies, and others. With a data breach of this magnitude, what should you do? What happens next? We leave a question for you – is there a better way?
Catching a Fraud
Sep 14, 2017
In this episode, we learn how insurance companies are using a topic we talked about before – graphs – to determine which car insurance claims appear to be fraudulent. We dive into the paper, ‘An expert system for detecting automobile insurance fraud using social network analysis’, published by Lovro ˇSubelj∗, ˇStefan Furlan, and Marko Bajec available on https://arxiv.org/pdf/1104.3904.pdf
Your credit has expired. Please buy more hours or upgrade in order to continue transcribing files.
Episode transcript below
Note: This is an automated transcript and still needs a human touch.
It’s orchestrated fraud. It’s all about it. And the more people you have in collusion scheme and you know in a scheme like that the less likely it’s to happen right. If a hundred 100000 people are in this scheme then you say well is that really a scheme or part of a part of it. Yeah I mean Tony in Germany where the data scientist impostures Jodhi one of the things we we told our listeners we would do is take difficult topics take these complex areas and break them down to a point where they were understandable to people not only you know people who may not have the technical details but maybe people who don’t know they could be data scientists who work in this field but don’t deal with certain technologies or don’t deal with certain algorithms. Oh great that’s one of the beauties of this podcast we try to make it break it down to a point where you can have a conversation over it. So one of the sites that I recently stumbled upon and maybe not so recently was ARC’s of so A.R. x ivied dot org. Okay okay. And what it is it seems like it’s a repository of a number of research papers and all different types of topics. And one of the things that I look for is more let’s see what they have in machine learning. Sure. And I remember in episode 8 we talked a little bit about graphs and networks so that’s going to come up.
And a few new concepts come up but the title of this article that I found is an expert system for detecting automobile insurance fraud using social network analysis. That sounds cool. So right. And they start off with the fact that look in in insurance and insurance is not like a sexy topic right now they’re not. It’s you know one of the things with the content and then the probabilities and statistics. But but the fact that the insurance business is like a trillion dollar industry it is the number of people. Everyone needs insurance insurance even if you aren’t getting the lot you need insurance. The house need insurance. Yeah. And I think the problem with thinks the biggest problem with something like fraud is that it affects all of us. Yes yes. If there’s fraud then you know your particular claim or what you’ve been paying on a monthly basis may go up because some somewhere down the line they had to pay a claim of X millions of dollars. And this this paper is by three three people in Slovenia. Love row Stubblebine and Markoe. Again you can go to our XXIV dot org. The paper number is 11 0 4 dot 39 0 4.
PTF just in case I’m just giving a full reference here but so the first thing I think about this is when they say an expert system so an expert system when I think about expert to some I’m thinking about a couple of things but I want to put it on you charities to maybe give me an idea of when you say and when you hear expert system what do you think about expert system or expert individuals so I’m a little expert system I don’t know because it’s so vague it when I hear experts system I don’t hear in what field are you an expert system in. So I’m not hearing that right. So I would assume they’re just a let’s say if there was a field that they you know what I don’t know. Right. So expert system before before the machine learning AI Crase everything was either exper or decision support system. So people use those terms interchangeably. And also the idea of a rules based system. So if you think about a rules based system what do you think. Well you’re setting some parameters and they can only make decisions based on what the parameters are set and those parameters are usually set by experts and whatever field it is. Got it. And that’s why the idea became an expert system. People who are setting these parameters are no expert so expert system just means that this has been written by a person from the field that knows what they’re talking about. And by a calibrated configured by that idea that you have maybe a human element someone tweaking these knobs and getting it right. So for detecting Auto Mobile it Bill insure insurance fraud using social network analysis that’s what I want to dive into. So a social network analysis. You remember Episode 8 I think we talked a little bit about Dykstra as algorithm a graph or a social network or network. It’s basically when you have components that you somehow relate to each other. Gotcha. Let’s say Jordy we work on that data science and POSIX podcast together.
Jodhi you also work at a you know a certain you went to high school and so you know certain people from that high school I’m connected to those people by the fact that I’m connected to you. And then you know we can build relationships between different things using what we call networks or graphs. You don’t have to only do it by a relationship between people might that something like Facebook would do. And that is what’s called a social network. You could also do it by relationships by you web pages. And that’s what Google is famous for. Yeah we did think of the number so it was like a phone book example like an address book kind of thing. Right. Right. OK. So social network analysis being the fact that look people are connected. And when you have an automobile clean bright let’s say you do have an accident you have a legitimate accident. What happens right. What’s happening in that accident. Well we’ve all been there right. So there’s an accident. And hopefully everyone’s OK. Everyone you know the drivers get out the vehicle they assess the damage. They probably pull over and they exchange insurance information and if they deem it necessary they’ll call the police so that you can file a police report. Everyone gets a copy. Everyone goes on their merry way. The next step is though you call your insurance company and let them know what’s happened. You give them your information. The information of the person you were with they’re probably just going to say what’s Denish who’s or who carries insurance and you give them a report. Right. And so if you were taken out.
Let’s not even talking about the social network aspect of it or the fact that people are connected in some way here. And I’ll talk about that in a second but you have some data associated to this accident. Yes. That data being things like time location. OK. Were you in the back. Were you hit in the front right from the side of a number of passenger number of passengers cars you were in the weather. Yeah the weather is important in the city or state you were in. Witnesses were there. Did the police. Did they show up. Did the police issue a summons or a traffic violation. Maybe someone did an illegal move. Right. Yeah. So all of these all of these attributes and the paper one of the things the paper discusses is that you do have a lot of the systems that you use all of these attributes to make predictions about whether something looks anomalous right something is out of the ordinary. Right. But what you don’t have. And you know it’s interesting in the paper how they label it is you don’t have the social network that you have by connecting the fact that you have two drivers. Let’s say it’s a two car accident. You have two drivers that are now connected in some way. Right. They were in an accident they were in an accident together but not only do you have the drivers. You also have to connect the passengers OK. Right. The passengers were also in this accident together. All right. So if I now connect and this is actually an interesting problem because let’s say I have one car with three people.
My first car has three people and my second car has more people. All of these people were in an accident together. Right. So technically I can connect them all to each other or try to. Yeah. Well no they were. We can say they were in one event together. They are connected by this to that following. Right. So the people themselves the connections are the event got it right or I can say the drivers should be connected directly. And then the passengers of each car should then be connected to the drivers of those cars. Right. Or I could say really I don’t care about the passengers. So even when we’re thinking about this graph we can model model this graph where we can draw in so many different ways. And so that’s all in this paper one of the things that you see if you do get a chance to read it is that they talk a little bit about how you would model this this scenario. Would you actually put the drivers together would you put the passengers together. I’d be curious to know and I don’t know if they talk about it in the paper. What would instigate some this type of investigation. Was there a time where insurance companies were figuring out that you know there is fraud like the driver of car a and the passenger of carby or maybe Cozzens or high school classmates or some sort of. And they go listen these two drivers aren’t related so. All right pay out the claim. But they found out through back channels that there was some collusion that may have occurred.
It was that happening. Yeah I think so. So the fact that what they were looking for were rings of insurance fraud because of the insurance fraud doesn’t only happen in isolation in isolation it has to happen with the number of people. And I think one of the things we’re seeing is that you would have a scenario where it wouldn’t be you know one driver and you know one driver in each car because then you wouldn’t have a witness. Got it. And so what I mean. That’s right. So you have these witnesses who are quote unquote independent or you know. So the fact that collusion may happen right collusion when we think about collusion the fact that someone else is saying something that is not true. You know we we think. All right that likelihood of that is lower in a normal environment. Right. Collusion is one of those things where it sort of orchestrated fraud. It’s all about it. And the more people you have in a collusion scheme and you know in a scheme like that the less likely it’s to happen right. If 100000 people are in the scheme then you say well is that really a scheme or part of a part of the. Yeah yeah. You know you’re all you need is one whistleblower to say hey hey. Right. Which is why you know the more it’s like this is funny and way off track is like Ocean’s 11. It’s like there’s 11 people involved. No one’s going to snitch on anyone else. It’s rare right. Right.
If you’re going to commit a crime you’ve got to keep your circle of trust small as you can. All right well this is not how to commit a crime 101 No no no no no. But it’s funny interesting. So so in in their paper they show the representation of the network. They show how it’s represented and that and that’s pretty interesting. And they talk a little bit about the intention of. So when you represent something in these graphs you have to sort of show the intention you show you direct the graph in a certain way. No what they when they talk about intention one of the things they did is that they showed who was at fault. Yeah. So if one driver was at fault or the other driver was at fault they try to represent that in the graph. Yeah. So that that part of it you can think about it. It’s not an expert system. This is a computer algorithm developing these graphs to helping to get these things setup how did they get this information. I’ve been in a couple of car accidents nothing too major and correct me if I’m wrong and I just don’t remember handing over the information of my passengers that were with me at the time. Yeah. So that while it’s a number of things right. There might be the police report you may have a witness list. They did get this information and they talk about the source of this information at the bottom of this paper so let me just go to that. While you’re scrolling I mean think back. We’re not off base right.
I mean you’ve never have to give that information. I’m lucky that I have not been in an accident in a very long time and so on. I’m looking for that. I don’t recall having to. I don’t recall one last time that I had an accident where it went to insurance while you kind of handle it on the spot. So the data that they use is data from between 1999 and 2008 collected in Slovenia. Wow. Yeah. And it consisted of three thousand four hundred fifty one participant involved and fifteen hundred collisions. And so it’s not only one data set it’s a number it’s two data sets where they combine the data and they basically they use what they had labeled data. So they used some you know some data that they knew was fraudulent. They were using it using some supervised learning techniques here right. They knew some data they knew some people were it was determined through investigation that it was Portugal and were they trying to determine OK these characteristics of the fraudulent data. Let’s see how much of the nonfossil data is in fact fraudulent. Right. So that’s usually what you do in these scenarios. So what they talk about here how I’ve read this is that they formed these networks right.
And then the next thing that they do with these networks says all right you form these networks that already existed out of all of your data and then based on characteristics of what happened during that day you have an expert who will go in and say Yes this seems like let’s say the fact that it was at three o’clock in the morning this seems maybe to have some impact to being more fortunately or less fraudulent. So they would decide yes or no this could be associated to fraudulent activity and they probably scrutinize it a little more. Right. And so they would take all of these characteristics around each claim right around each inch accident. Right. And then they would basically apply that and see well where else did we see these types of scenarios. And that’s usually what you do in these using these algorithms. You have some use case you have some ideas where you know that something was fraud you know that something was a bad actor you know it was anything like this and then you go ahead and you say well let me extrapolate that information to my bigger subset and see opportunities where I have I have not decided that this is fraud but maybe maybe it was and we should look back into it. Gotcha gotcha. And that’s that’s a normal use case that you will see that’s a normal pattern that you will see when dealing with data. And once programming strategy do. Does one use two to go about a study like this by programming strategy. You mean like we’ve spoken about so many different ones you know like what was the one that was one of my favorites. I’m terrible with the names and I should have written this down and when I walked in here when we had the baker genetic was that the genetic got our lives right. Yeah that’s just a strategy and then obviously there’s machine learning which is the the encompassing one. And then there’s a couple right. Right. Yeah.
So for this one they you know they used I think what they. And again I skimmed through this article because it’s the title is great. And the descriptions. But once you get to the mathematics and this is where we hopefully bridge that gap for people. Yeah like I skimmed through some of it because of the fact that it’s sometimes it’s just it’s more than than I need it’s more not theoretical but it just like sometimes mathematics draws us away because it’s just like all these random symbols and all of these random things going on. But essentially they use something like Google’s page rank on them which shows them some sort of you know give some some indication of which of these things is most likely to be insurance fraud. Gotcha gotcha. And really just given things waits and with the probability of those other factors all weighed into this network how do you get to this to this model and this model. Like obviously this is their own model right. When you when they do research like this usually what they’re doing is they’re expanding the existing model which is accessing Mahto but they wanted to take it a step further and use this graph analysis to caution. Interesting stuff. So you know I think I think it’s pretty interesting because it combines what we had talked about before with the graphs. It combines the fact that we had talked a little bit about the fact that there is these systems that don’t work just alone by using machine learning. You know it’s a combination. It’s humans deciding whether something looks fraudulent or not. It’s using historical data that we’ve labeled as fraudulent not fraudulent.
It is using these social network analysis and it’s combining these things together and I think when you think about the iterations of the stings of this technology and and the approach you can only imagine that the next step is as as you get more data that some of the factors that the humans were sort of indicating now become more and more robust and you have a machine learning algorithm helping to guide that decision. Yeah. And to your point you always need that human element to make this work and you need that data and the historical experience to just make these algorithms work and do their job which is pretty awesome. In particular I’m curious if if this support has caused any reaction in the insurance industry to to to change their ways of how they look at things. I think because you know as a research paper. And I think there’s a lot of research like this being done I think that some of the insurance companies around the world have their own similar models and they’re all looking for a different way to approach the problem. Sure. And so sometimes sometimes this approach may be better and sometimes it might not be so I think with the insurance insurance companies I think one of the things that they do have is that they have a lot of data they have a lot of cases where this can be applicable to them and most of the times they’re not looking for that you know that find solution. They’re just looking for a smaller subset to look at. Yeah.
Because if if you have an algorithm that you have a technique that just says hey look at 100 insurance claims instead of the ten thousand that are on your desk to look for fraudulent activity and do a better job looking at these you know these 100 then that’s significant improvement overall for everyone. Yeah but still those nine hundred ninety nine thousand that don’t get the attention that may warrant them right. And and and that might be that and that might be fine or it might not be fine. Right. Because you know one of the things that I’ve had to deal with is getting people to recognize that OK if you gave me a thousand things to do and I skim you know. Yeah. All the 1000 won’t. Why don’t you just give me a hundred things to correct and I’ll do a better job at those. And what I can do going forward is let’s say we learned something from those 100 B because we paid closer attention. We can build that back into the model we build that back into the chain so that next time we don’t miss those and next time we do a better job. And and when I advocate using machine learning that’s how I advocate. I advocate in the loop I advocate with iterating the process to make it better. That makes one makes sense. All right well look that’s that’s really what we wanted to talk about today. It’s a paper on insurance fraud using social network analysis. I found that interesting I will not lie to you I had to read it a number of times and have a few cups of coffee to get through some of the symbology associated to the mathematics behind it.
But it is interesting to think about how they had to do some modeling how they use some humans and how they imbedded expert systems into into this paper. Yep and I’m sure you know doing this research they’ve discovered ways to further look into it or give advice to insurance companies to look at fraud. Things like I’m sure they look east to look at things like how many times this person been in an accident and now that gets added on to other things to look at. Right. Right. So that becomes part of. And that’s a good point. That’s just social network analysis. When you see a thousand when you when you don’t. Every so. So I mentioned something and let me just clarify this. Every accident right. I would label my nodes right. I would label each person as a node and watch driver as a note. But if there was another accident that that node I’m not going to replicate that node that driver still is that driver. Yeah so I can see that if one driver was in 50 accidents out of 1300 then I’d say something like Yeah right. Or that you know the one driver that was in one accident. All of his passengers were in an accident as well. That’s a huge flag and that you could only link. It’s a lot easier to tell once you draw this into a social network 5 percent and these folks should do this stuff by hand and there were the good insurance claims agents and then there were the ones who actually passed to pass it along.
But if you if you put this all in in a tree that may not be the right term but in some sort of system or graph you can pick up on these things which is very cool. All right guys thanks again for listening to another episode of data science imposters. You can always check us out at Data Science impostures dot com. Check our Twitter account at DSN postures and reticent e-mail Torti or Antoniou at Data Science impostures dot com. Thank us Taksin.
What is a terraformer? Interview with Ezinne Uzo-Okoro
Sep 10, 2017
This week we learn about ‘terraformers’ and the three biggest ways that this company leverages data science. Ezinne Uzo-Okoro is the founder of Terraformers.com, an online marketplace that empowers anyone to grow affordable fresh organic food anywhere and donate the excess to local food banks. Terraformers provides an entrepreneurial approach to improved nutrition, and feeding communities. Customers – anyone with at least 100 sq ft of yard or rooftop space – will democratize food access, and improve the economic trajectories of their own communities.
During her 13-year NASA career, Ezinne contributed to multifarious missions including a constellation of micro-satellites, and six launched spacecraft missions – Cassini, the Saturn orbiter; ExPRESS Logistics Carrier (ELC) sitting atop the International Space Station; Global Precipitation Measurement (GPM) observing hurricanes and precipitation on Earth; EFT-1 test mission for human exploration; Neutron star Interior Composition Explorer studying supernovae and neutron stars; and the Transiting Exoplanet Survey Satellite continuing the search for exoplanets. She is a technical leader in flight software development, and spacecraft systems engineering, and has made significant contributions in all phases of the NASA mission life-cycle from concept development and design to deployment and operations. She earned Computer Science and Aerospace Systems Engineering degrees and has received numerous NASA awards.
Listen closely, let me tell you a story #storytelling
Sep 07, 2017
I stepped closer but I didn’t want to make a sound… Storytelling is a powerful tool that we use to explain, warn, and motivate. In this episode, we talk about storytelling and invite our listeners to share their stories. Listen to Jordy’s story. Let us know what you think. Email us if you have a story to share for our story feature. If you need more inspiration for storytelling, check out the The Moth (https://themoth.org/ ) which has been hosting story slams for some time now.
Real Data. Fantasy Football. w/ @TheCoogene
Sep 04, 2017
Football season is right around the corner with preseason games well underway. This week we spend time with @TheCoogene who teaches us all about fantasy football strategies, leagues, and choosing the right picks. He even gives us three ‘sleeper’ picks for those of you that participate in fantasy football. For those of you that care about the #DataScience, listen to get ideas on how to get your winning model. Fantasy football is a big business and using data over the competition will be a big advantage. We also talk a big about gambling and the relation to fantasy football.
Hacking your (wireless) data – Interview w/ @s0lst1c3
Aug 28, 2017
Gabriel (@s0lst1c3), a security engineer, joins us this week to discuss wireless security and his talk at this year’s #Defcon. ‘DEF CON is one of the oldest continuously running hacker conventions around, and one of the largest’. Some great paraphrased snippets from the episode include ‘lawyers are like [law] hackers’, ‘getting domain admin access is like getting a star in Mario Brother’s, ‘grandma gets her own wifi’
Average Lies (Basic Statistics and Not Truths)
Aug 24, 2017
We explore basic statistics and how skeptical you should be about them. 100% of the people that have heard this podcast agree. (Hint: you should ask about our sample size) We look forward to hearing your thoughts on how Statistics have betrayed you.
Are you with me? The journey of a startup
Aug 21, 2017
This week we interview a technology and data company startup entrepreneur. Dwaine explains how a company is born out of an idea, evolves to meet the demands of customers, and survives through the hard work, sacrifice, and dedication of its team. We discuss how the financial crisis of 2007 and 2008, when the world seemed close to collapse, led to some people questioning their futures in the industry and others seeing opportunities. One of our listeners said : “….[I] thought I’d be a deer in headlights listening to you guys. However, I thought it was not only informative, but really engaging & reliable even to someone like me, who isn’t familiar with the lingo and industry.”
Using Evolution and Genomics to Solve Problems
Aug 16, 2017
Take a look at NASA’s evolved antenna picture on our site. This is not your grandfather’s antenna. This design was created by a computer.
If you can agree that in nature, the strongest survive then you could have developed the theory behind this next topic. In the early 1970s, John Hollande used what he knew about evolution and genetics to propose a new way to solve optimization problems. These became known as evolutionary algorithms.
The algorithms go through the following stages:
1. Natural selection – deciding which solutions live and which go away
2. Reproduction or cross over – mixing and combining solutions
3. Mutations – randomly changing components or ordering of solutions
Examples include:
1. Timetabling – scheduling resources across multiple constraints
2. Traveling salesman – example: what’s the best route for a UPS truck to take during any given day?
3. Playing games – can a computer evolve a strategy only knowing the rules of the game?
Becoming a Data Scientist with Renee Teate (@BecomingDataSci)
Aug 14, 2017
Renee Teate – an accomplished data scientist, creator of the ‘Becoming A Data Scientist’ podcast and website, and the voice behind @BecomingDataSci twitter name – joins us to discuss her own journey to become a data scientist, her quest to get others to join the field of Data Science, and what excites her about the future. Renee was an exceptional guest with great stories, good insight, and the stamina to keep Jordy and Antonio focused for an hour.
Here are few hashtags to describe moments in this episode including #HarrisonburgNotHarrison #JMU #RosettaStone #Skynet #AI
Check out these resources if you’re interested in more …
If you tell me who your closest friends are, I can tell you who you are
Aug 09, 2017
We’ve heard sayings like ‘if you tell me who your closest friends are, I can tell you who you are’ – the idea is that you and your friends have such similarities allowing you to form a group or cluster that may be different than strangers.
In this episode, we explore how we would group similar items together (clustering) using similarities (or distances). The unsupervised clustering algorithms (i.e. we do not provide test data indicating an existing relationship) that we discuss are k-Means and DBSCAN.
If Only I Had A Brain – Artificial Neural Networks
Aug 07, 2017
Antonio and Jordy talk about artificial neural networks; these are algorithms that today seem synonymous with Artificial Intelligence. These algorithms, first introduced in the late 1950s, which mimic how the brain works are now being used extensively. We casually explore the history of ANNs, how these algorithms work, and what they can do. We expect that this will be one of many conversations about artificial neural networks.
Keep the conversation going on Twitter @dsimposters.
Ep. 0 – Who? What? Why? The Data Science Imposters Introduction
Aug 02, 2017
We have danced around an introduction of ourselves and the show. Now that we have published 10 episodes, have a few more being edited, and have some great guests lined up we want to introduce ourselves and our thoughts about the show.
@BecomingDataSci said it best. If you haven’t already,check out her Twitter page and site.
Ep. 10 – 1 way to lose $1,000 in a week: Cryptocurrencies
Jul 30, 2017
Antonio plunges into the world of cryptocurrencies and buys enough Ether and Litecoin to lose $1,000 in a week. Bitcoin, Ether, and Litecoin cryptocurrencies are built on blockchain technology. Blockchain technology is like a decentralized ledger. While this is an exciting new frontier for the technology and these cryptocurrencies, there is still room for improvement. Hacks and thefts have threatened these cryptocurrencies and has created some skepticism for the underlying technology. Financial companies are optimistic and betting on blockchain technology’s success to improve their own processes.
Listen to our podcast, read these articles, and let us know what you think about the technology.
Credit: The screenshot for Litecoin pricing is from https://coinmarketcap.com/currencies/litecoin/ and the screenshot for Ether pricing is https://www.coindesk.com/ethereum-price/
Ep. 9 – Web scraping, APIs, and Programming
Jul 24, 2017
Jordy and Antonio discuss web scraping, APIs (application programming interface), and the benefits of programming solutions. They also talk about the challenges of combining data across multiple data sources (even if those data sets are open).
Beautiful soup is the premier solution for web scraping in Python. The following is an article that you will find helpful if you are interested in starting with the library: Web Scraping with Beautiful Soup.
The list goes on and on and only continues to grow.
Episode is archived here: http://traffic.libsyn.com/datascienceimposters/DSI_Episode_09.mp3
Ep. 8 – Getting bad directions from Mr. Dijkstra
Jul 20, 2017
In this episode, Antonio begins by discussing the seven hour car ride to Cape Code, MA and the eight hours spent on the return trip. This leads into a discussion about how Google and other companies use graphs to give us directions. Here’s a simplified graph created with Graph Online.
You can also use a free software package, GraphViz, to create these graphs. It requires a little bit more work but allows a lot more customization.
Book Recommendations
I highly recommend this book. It is technical but accessible to a wider audience than most of these books. (You can skip the coding sections and still get a good amount from this book)
Jason, a friend of the show, joins us in this episode as we discuss the ‘Money Ball’ of Health Care. We also dive into Jordy’s BMI, learn that there are now three categories of obesity within the BMI range, reveal that Antonio is 52 years old according to his ‘Vitality’ age, and find out what Jason gets from City MD that he does not get from his regular doctor. We also wonder how we can contribute to the Project Baseline Study without being one of the 10,000 selected volunteers.