Technical interviews about software topics.
Here's the Latest Episode from Software Engineering Daily – Software Engineering Daily:
FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat.
The Linux operating system includes user space and kernel space. In user space, the user can create and interact with a variety of applications directly. In kernel space, the Linux kernel provides a stable environment in which device drivers interact with hardware and manage low level resources.
A Linux container is a virtualized environment that runs within user space. To perform an operation, a process in a container in user space makes a syscall (system call) into kernel space. This allows the container to have access to resources like memory and disk.
Kernel space must be kept secure to ensure operating system integrity–but Linux includes hundreds of syscalls. Each syscall represents an interface between the user space and kernel space. Security vulnerabilities can emerge from this wide attack surface of different syscalls, and most applications only need a small number of syscalls to perform their required functionality.
gVisor is a project to restrict the number of syscalls that the kernel and user space need to communicate. gVisor is a runtime layer between the user space container and the kernel space. gVisor reduces the number of syscalls that can be made into kernel space.
The security properties of gVisor make it an exciting project today–but it is the portability features of gVisor that hint at a huge future opportunity. By inserting an interpreter interface between containers and the Linux kernel, gVisor presents the container world with the opportunity to run on operating systems other than Linux.
There are many reasons why it might be appealing to run containers on an operating system other than Linux.
Linux was built many years ago, before the explosion of small devices, smart phones, IoT hubs, voice assistants and smart cars. To be more speculative, Google is working on a secretive new operating system called Fuscia. gVisor could be a layer that allows workloads to be ported from Linux servers to Fuscia servers.
Yoshi Tamura is a product manager at Google with a background in containers and virtualization. He joins the show to talk about gVisor and the different kinds of virtualization.
The post gVisor: Secure Container Sandbox with Yoshi Tamura appeared first on Software Engineering Daily.
FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat
Twilio is a communications infrastructure company with thousands of internal services and thousands of request per second. Each request generates logs, metrics, and distributed traces which can be used to troubleshoot failures and improve latency.
Since Twilio is used for 2-factor authentication and text message relaying, Twilio is critical infrastructure for most applications that implement it. The service must remain highly available even in times of peak application traffic, or outages at a particular cloud provider.
When he was at Twilio, James Burns worked on platform infrastructure and observability. James was at Twilio from 2014 to 2017, a time in which the company experienced rapid scalability. His work encompassed site reliability, monitoring, cost management and incident response. He also led chaos engineering exercises called “game days”, in which the company deliberately caused infrastructure to fail in order to ensure the reliability of failover systems and to discover problematic dependencies.
James joins the show to talk about his time at Twilio and his perspectives on how to instrument and observe complex applications. Full disclosure: James now works at LightStep, which is a sponsor of Software Engineering Daily.
FindCollabs Hackathon #1 has ended! Congrats to ARhythm, Kitspace, and Rivaly for winning 1st, 2nd, and 3rd place ($4,000, $1000, and a set of SE Daily hoodies, respectively). The most valuable feedback award and the most helpful community member award both go to Vynce Montgomery, who will receive both the SE Daily Towel and the SE Daily Old School Bucket Hat.
Google’s options for running serverless workloads started with App Engine. App Engine is a way to deploy an application in a fully managed environment. Since the early days of App Engine, managed infrastructure has matured and become more granular.
We now have serverless databases, queueing systems, machine learning tools, and functions as a service. Developers can create fully managed, event-driven, highly scalable systems with less code and less operations.
Different cloud providers are taking different approaches to offering serverless runtimes. Google’s approach involves the open source Knative project and a hosted platform for running Knative workloads called Cloud Run.
Steren Giannini is a product manager at Google working on serverless tools. He joins the show to discuss Google’s serverless projects and the implementation details in building them.
Products are an art form.
As with any art, the world of products includes creators, patrons, fans, business people, and investors. Product Hunt is a place where those different people connect to build and discuss products.
Products are different from other art forms in that they are measured not only through the lens of design and beauty–but also through utility. From software to books to couches to toiletry–we all have products that have improved our lives so much that we feel a deep sense of connection and hope for that product and the people behind it.
Ryan Hoover is the founder of Product Hunt, a product I have found tremendous value and satisfaction from over the years. He is also a host of Product Hunt Radio, a weekly podcast with the people creating and exploring the future.
Ryan joins the show to discuss products, the process of creating something useful, and his investing strategy. Ryan runs the Weekend Fund, an early stage investment fund.
Open source policy has become a business issue as well as a political one.
Businesses like Elastic, MongoDB (the company), and Redis Labs have started to view the open source licenses of the projects they work on as a means for business defensibility against cloud providers offering similar services. It remains to be seen how viable this strategy will be for the commercial open source vendors.
Companies that do not directly sell commercial open source are also grappling with questions around open source licensing. Facebook has become a force in the open source world through projects like React and GraphQL. Facebook leads these projects, but Facebook is not monetizing them other than to the extent that they use the projects to build Facebook.com.
Facebook’s incentives are aligned with the rest of the industry on the quality of the GraphQL and React projects. Proper licensing can help Facebook keep those incentives in alignment.
Joel Marcey, Michael Cheng, and Kathy Kam from Facebook join me for a discussion of the state of open source licensing, and how that impacts Facebook.
The post Facebook OSS License Policy with Joel Marcey, Michael Cheng, and Kathy Kam appeared first on Software Engineering Daily.
Drishti is a company focused on improving manufacturing workflows using computer vision.
A manufacturing environment consists of assembly lines. A line is composed of sequential stations along that manufacturing line. At each station on the assembly line, a worker performs an operation on the item that is being manufactured. This type of workflow is used for the manufacturing of cars, laptops, stereo equipment, and many other technology products.
With Drishti, the manufacturing process is augmented by adding a camera at each station. Camera footage is used to train a machine learning model for each station on the assembly line. That machine learning model is used to ensure the accuracy and performance of each task that is being conducted on the assembly line.
Krish Chaudhury is the CTO at Drishti. From 2005 to 2015 he led image processing and computer vision projects at Google before joining Flipkart, where he worked on image science and deep learning for another four years. Krish had spent more than twenty years working on image and vision related problems when he co-founded Drishti.
In today’s episode, we discuss the science and application of computer vision, as well as the future of manufacturing technology and the business strategy of Drishti.
The post Drishti: Deep Learning for Manufacturing with Krish Chaudhury appeared first on Software Engineering Daily.
Lyft is a ridesharing company with petabytes of data. Within Lyft, many different employees can use those data sets to build useful applications.
A business analyst creates a dashboard to see how driver satisfaction is changing over time. An economist studies the pricing data to ensure that Lyft’s prices are competitive. A data scientist creates a report of how the speed of a ride correlates with 5 star ratings. A machine learning engineer trains a model to detect fraud on the platform.
All of these use cases make sense–and in each of them, the employee at Lyft needs to find the necessary data sets within the company to build their application. Amundsen is a tool for finding and discovering data sets within the company.
Tao Feng and Mark Grover are engineers at Lyft and join the show to talk about the problem of data discovery and the tools they have built at Lyft.
Until Google DeepMind came into the field, protein structure prediction was dominated by academics.
Protein structure prediction is the process of predicting how a protein will fold by looking at genetic code. Protein structure prediction is a perfect field to approach through the application of deep learning, because the inputs are highly dimensional and there is a plentiful array of different sets of labeled data. Protein structure deep learning is a field in which many different approaches are taken, often involving supervised learning and reinforcement learning.
Mohammed Al Quraishi is a systems biologist at Harvard. His background spans computer engineering, statistics, and genetics. In his work, Mohammed explores the interplay between biology and computer systems.
One area of Mohammed’s focus is protein structure prediction. In a blog post last year, Mohammed gave a brief history of protein structure prediction and described the significance of DeepMind entering the field. DeepMind’s AlphaFold technology surpassed all other competitors in the most recent CASP protein structure competition.
Mohammed joins the show to discuss biology, academia, deep learning, and DeepMind.
The post Protein Structure Deep Learning with Mohammed Al Quraishi appeared first on Software Engineering Daily.
Podsheets is a set of open source tools for podcast hosting, publishing, ad management, community engagement, and more.
Podsheets is influenced by our experience managing Software Engineering Daily, a full-time podcast business.
Software Engineering Daily is a podcast that airs 5 times per week. With 4 ads per show and 50 business weeks per year, we have 1000 podcast ads that we manage per year. Podcast ad sales is an inefficient market, and the tooling around managing ads for the individual podcaster is poor.
Difficulties in the podcast ecosystem also exist for new podcasters. It is still very confusing to get started as a podcaster. Podcasting should be as easy as blogging but remains far from it.
Podsheets serves the use cases of all podcasters — from beginners to full-time professionals.
In this document, we outline the landscape of podcasting. We describe the major corporate players, the workflow of a podcaster, and the mechanics of running a podcast business.
Next, we present Podsheets, a set of open source tools for podcasters.
Finally, we give our predictions for how the media landscape will shift in the coming years, creating a need for more open source media publishing tools.
The State of Podcasting
Podcasting has been having a renaissance over the last five years.
In 2014, the launch of Serial coincided with improvements to cellular bandwidth in the US. It was now possible to stream podcasts from your phone easily and cheaply. There was great content to listen to, and the improved infrastructure made it listenable.
Still, podcasting remained in a fractured stalemate for five years after the launch of Serial. This was due to the fact that Apple had first-mover advantage as the dominant podcast index, but Apple opted not to capitalize on that fact. This is understandable, as Apple’s core competency is futuristic hardware, and its resources are allocated to business lines such as self-driving cars and augmented reality glasses.
In 2019, Spotify acquired Gimlet (a source of premium podcast content) and Anchor (the most popular, simplest podcast host). Spotify is making a play at consolidating the market of podcasting. Spotify has a clear path to becoming the “YouTube of podcasts”, which we should all celebrate as a long time coming.
At the same time, we recognize the hunger for decentralized media platforms. The centralization of video content within YouTube has driven conspiracy theory, ad fraud, and other problems that are difficult for a centralized player to solve. These problems manifest for text as well, and Twitter CEO Jack Dorsey has spoken enthusiastically about the potential for blockchain technology to improve trust for applications such as Twitter.
Podcasts have suffered and benefited from the lack of centralization. As the podcast world swings towards a centralized marketplace, there is a need for a countermovement towards open, decentralized infrastructure.
Below is an overview of the types of players in the podcasting ecosystem.
Today’s ecosystem is fragmented and messy. The line between podcast networks, ad agencies, and production studios is blurry.
Individual podcasters range from small, unmonetized podcasts to individuals with millions of listeners.
The staff for an individual podcaster may include ad sales, editing, scheduling, and technical operations such as website management. The individual podcaster might hire these people on full- or part-time basis.
To sell ads, a podcaster often works with a podcast ad agency. The podcaster might join a podcast network to consolidate inventory with other podcasters.
Some podcasts, such as Software Engineering Daily, manage all their ad sales in-house. This requires demanding sales activities such as prospecting, account management, close management, and client services.
Podcast Production Studios
NPR, Gimlet, and Vox Media are examples of podcast production studios. These companies own physical studio space to produce high quality audio content on a regular basis.
Earwolf, Nerdist, and Podcast One are examples of podcast networks. A podcast network combines the inventory available across multiple podcasts in order to sell more efficiently to advertisers.
Spotify is perfectly positioned to become the YouTube of podcasts.
With its acquisition of Anchor, Spotify has a real chance at capturing the creation, distribution, hosting, and automated monetization of podcasts.
With its acquisition of Gimlet Media, Spotify gains high quality content, established production processes, and sales teams. Gimlet was co-founded by Alex Blumberg, whose roots in NPR give Spotify knowledge of the entire history of the modern audio space.
Apple maintains the central index of podcasts.
Other podcast indexes are usually built by scraping Apple. Compared to its available resources, Apple has invested almost nothing into podcasting over the last five years. Its biggest milestone is a detailed analytics tool that lets podcasters see detailed data on when users are dropping off from podcasts.
Apple Podcast Analytics is a seemingly simple tool, but not trivially easy to build and deploy. It could be a bellwether for more vigorous technical investments.
In any case, Apple could have acted sooner. Apple could have poured money into podcasting at any point in the last 5 years. Why hasn’t Apple become the Spotify of podcasts?
- Apple could be apathetic about podcasts. When you are managing the iPhone supply chain and building self-driving cars, it’s hard to care about mp3 files and RSS feeds.
- Apple could be enthusiastic about the fractured state of the ecosystem. Apple is a subtly rebellious company, and podcasts are perhaps the most rebellious content medium today.
- Apple could be marshalling the troops for its own grab at the YouTube of podcasts. Apple is easing into high-quality original content production for TV. If Apple has a desire to expand into a multimedia network, it has that option.
Apple has been historically willing to pursue last-mover advantage. Perhaps we will see more investment in podcasting from Apple in the near future.
YouTube is relevant because searching for individual podcast episodes is currently immature.
As an example, try searching Google for podcasts where Seth Godin talks about his childhood. Seth Godin has been on many podcasts, and has written many blogs. It is easy to find the blogs where he talks about his childhood. But the podcasts are much harder.
Many podcasts are not transcribed, so there is not a good index of what is said on that podcast episode — unless someone proactively transcribes the audio. YouTube automatically transcribes the videos that are uploaded. This allows YouTube to offer contextual ads as well as closed captioning.
In order to get your podcast on YouTube, you need to convert each audio file into a video and upload it to YouTube. Most podcasters don’t want to bother with this.
To summarize: YouTube has the best search engine for individual podcast episodes — except that many podcasts are not on YouTube.
If you are not a YouTube Red subscriber, your video will stop playing when you lock the screen on your phone. In order to exclusively listen to a YouTube video, you need to be a YouTube Red subscriber.
Podcast Ad Agencies
Individual podcasters who do not want to sell ads but do want to have advertisements can enlist the help of a podcast ad agency. The biggest advertisers in the podcast space (Squarespace, Audible, ZipRecruiter, etc) purchase inventory through these agencies.
Agencies have huge informational advantage because they see both sides of the market in real time. They see the volume of inventory in different podcast segments, CPMs, CPCs, and CLTVs.
The big ad agencies are viciously exploitative of their leverage over podcasters. The perversions of podcast ad sales are beyond the scope of this document. But it is important to note that podcast agency ad sales is an inefficient market.
Regarding ad sales, consolidation towards Spotify is a coin flip — it could make things worse for podcasters or could make things better.
Podcasters should have a viable option for managing their own ad inventory.
This explains the eponymous “sheet” in Podsheets. Software Engineering Daily (and many other podcasts) maintains a detailed spreadsheet for managing inventory.
We’ll discuss the importance of the podcast spreadsheet, and our vision for podcast production and ads management later in the document.
Google Podcasts, Overcast, Pandora, Libsyn, and other products in the space also represent important facets of this ecosystem. The above list of players is not comprehensive.
Influencer Marketing Predicts The Future Of Podcast Marketing
The best podcast ads are personal testimonials that are spliced into the episodes.
Podcast ads are a form of influencer marketing. This is in stark contrast to the world of TV commercials, which are produced wholly by the product companies themselves.
Influencer marketing has not consolidated into automated auctions in the way that advertising platforms like Facebook and Google ads have consolidated. Influencer marketing has matured into a fractured, inefficient ecosystem that bears many similarities to the podcast ecosystem.
We can look at the social media influencer marketing space as a leading indicator of where the podcast market is headed, and a few points are worth considering:
- Influencer marketing agencies take an enormous cut from both the advertisers and the influencers.
- Influencers who have been in business for a few years realize they can make more money (and better sponsored content) if they deal directly with brands.
- As influencers have become more savvy, the market for influencer SaaS tools has expanded to serve the needs of influencers. These SaaS tools let an influencer operate like a personalized multimedia platform without the need for agency middlemen.
As of March 2019, the influencer ecosystem is evolving even further, with adoption of technologies such as Patreon and Discord.
How To Manage A Podcast With Ads
Software Engineering Daily has 5 shows per week, 4 ads per show, 50 weeks per year, for a total of 1000 podcast ads. We manage our ad inventory and show production process in a spreadsheet, which is shown below.
As we sell podcast ads, we schedule them onto episodes in our upcoming show calendar. Our inventory (on the right) is updated as we schedule ads (on the left).
Here is a list of complexities we deal with related to advertising:
- Prospecting: how do we find potential companies who want to buy our ads?
- Lead management: how do we work a prospect through the sales process and eventually close a deal?
- Close management: once a deal is closing, how do we issue the contract?
- Accounting: how do we issue invoices and get paid?
- Transparency: how does an advertiser know that we are airing our ads when we say we will?
- Reporting: how do we communicate statistics around our ads and episodes?
- Account management: how do we retain an advertiser and keep them happy?
- Ad scheduling: how does an advertiser pick which episodes to air their ads on?
- Ad insertion: how do we communicate to our podcast editor which ads to insert and at what timestamps?
We address these complexities through a combination of spreadsheets, emails, CRMs, accounting tools, Slack workflows, and contractors. With Podsheets, we hope to consolidate much of this work into a single technology.
We do not expect to replace everything — but we have a clear vision for a set of tools that are custom built for podcasters to manage their media businesses.
At Software Engineering Daily, we work hard to sell our podcast ad inventory. But we are sacrificing a lot of value because we are struggling with our tooling, and struggling with an undeveloped advertising ecosystem.
When we splice in podcast ads, that podcast ad is usually going to be in the podcast audio for the lifetime of the podcast. At Software Engineering Daily, we do not use “dynamic insertion”. But we would like to.
Dynamic insertion allows podcasters to easily change the ads that are spliced into their podcast episodes. Dynamic insertion can mean two things:
- A podcaster manually changes the ads that are spliced into an episode. Example: podcaster releases episode X on January 3rd. 6 months later, podcaster changes the ads which are spliced into that episode. This can be useful for podcasts with “evergreen” content.
Dynamic insertion is an important consideration for the future of podcasting, as we explain later in this post.
Before we discuss Podsheets, one more point around the downsides of podcast market consolidation towards platforms like Spotify and YouTube.
When you listen to a podcast on a naive podcast player, there is usually not a system monitoring what words you are hearing. In its most basic form, a podcast player is a dumb mp3 player reading from dumb RSS infrastructure.
Many podcast consumers like things this way. Podcast feels distinctly like radio or old school television. When you listen to a podcast about topic X on a dumb podcast player, you do not immediately start seeing display ads around the Internet for X.
Let’s understand why this is the case.
There is no transcript that has been aligned with timestamps in the audio. Because of this, there can be no natural language processing over this stream of audio. You cannot see improved search results, ad targeting, or content recommendations based on the content you listen to.
Depending on who you are, this is a benefit or a disadvantage. As blockchain technology improves, the Internet will offer a more granular range of surveillant media outlets. Some people will choose surveillance + convenience, other people will choose privacy.
Until we have blockchain-powered media, podcasting remains one of the few bastions of semi-private media consumption.
Today, podcasting is not surveillant because the infrastructure is immature. Transcription and audio alignment powered by deep learning remains expensive. As these costs drop, podcasting will become increasingly surveilled.
Today’s podcast infrastructure is decentralized, and not heavily surveilled. Market consolidation into companies like Spotify and YouTube will increase that surveillance.
Podsheets is an open source platform for podcasters.
Podsheets will allow a podcaster to accomplish the following:
- Host their podcast’s audio files, website content, and RSS feed on a backing storage medium of their choice
- Manage the audio files to do updates in their ads and other dynamic insertions
- Manage ad inventory and communicate that inventory to potential advertising customers
- Allow for a podcaster to offer subscriptions or donations
- Give every podcast its own social network for its listeners
We believe the following about our users:
- Podsheets should feel like a simple, vertical solution for a nontechnical podcaster
- Podsheets should feel like a modular toolset for a technical podcaster
- Podsheets should feel like decomposable media platform software for a developer
Current State of Podsheets
As of 3/20/19, Podsheets is a simple tool for hosting a podcast. It fulfills the minimum needs for a podcaster.
Here is the interface for publishing and editing an episode.
Here is the public website view.
Our design is minimalistic because podcasting is a minimalistic system. A podcast hosting tool is simply an interface for publishing and editing an RSS feed of text and audio files.
Podsheets is built in React and Node. The audio files are stored in Google Cloud Storage. Our podsheets.com live version of the site is on Heroku.
The rest of this section describes a specification for Podsheets. We will explain the name “Podsheets” and describe the features that an open source podcast hosting platform needs to compete with closed source companies.
We argue that podcasting can be improved by the introduction of open source tools that empower a podcaster to the same extent as closed source platforms.
There are many parallels between the world of open source software and podcasts. Podcasting feels free and open. Podcast content is useful and rebellious.
We hope to build a community of people who are excited about open source podcasting.
To build Podsheets, we need help from many types of people: designers, writers, engineers, podcasters, podcast listeners, and QA testers. If you want to get involved, follow our FindCollabs project and see our roles for contribution.
Spreadsheets: The Ultimate Tool For Media Planning
Advertising is how most podcasters make their living.
Paid subscriptions and donations are also important to the podcasting ecosystem, and we will explore those monetization options later in the post. This section is about the tooling that Podsheets will offer for advertising-based podcasters.
Advertising allows podcasters to run their own media company. One beauty of podcast advertising is that it can connect small audiences with products that those audiences have a genuine need for. Podcasters can find advertisers that do not compromise their views.
Podcasters need a way to manage their own ad inventory. Without the option to self-manage, podcasters will turn to centralized solutions such as Spotify and media agencies.
Why the name “Podsheets”?
We believe that managing the ad inventory of a podcast is best handled by a dynamic spreadsheet view that would allow podcasters to change their scheduled ads by simply changing a row in a spreadsheet-like view.
Advertising is central to the life of a podcaster. A podcaster needs to find ad deals and manage the production of their advertising content. The advertising needs to be harmonious with the podcast content.
Many podcasters are solo entrepreneurs. They do not have employees. They are doing everything themselves. A dynamic spreadsheet allows a podcaster to manage the podcast along with their advertising.
Our perspective on the importance of the podcast spreadsheet comes from personal experience of running Software Engineering Daily. We have scheduled roughly 3000 ads across all of our episodes.
Let’s revisit the spreadsheet we use to plan our podcast episode calendar.
This spreadsheet is used for:
- Planning our show schedule
- Budgeting and scheduling ads
- Communicating our upcoming shows to advertisers who may want to buy targeted ads on those shows
- Notating edits that we need to communicate to our audio editor
Experienced podcasters who I have spoken to all have a similar spreadsheet, Trello board, or show management tool.
Podcasters Need Dynamic Insertion
Podsheets will allow the spreadsheet view to be tied to the current state of a given podcast episode.
This will enable easy dynamic podcast ad management. It also gives a place for podcast editors, hosts, producers, and operations members to collaborate.
Self-managed, dynamic ad insertion allows a podcaster to control their own media platform and earn more money.
It’s important to emphasize how much money a podcaster is giving up without dynamic insertion. This is because podcasts are “evergreen content”. Listeners are often seeking out episodes that are 1–10 years old. Listeners who hear these episodes are listening to old ads, which are not paying a podcaster any more.
Let’s illustrate with an example.
Below is a row from our Software Engineering Daily management spreadsheet. This row is an episode from 2/1/2018, which was an interview with Zhamak Deghani. We had 4 ads scheduled on that show: OutSystems, Datadog, Microsoft, and GoCD.
Below is a chart showing the last 3 months of listens to our interview with Zhamak.
Most podcast ads are sold on a 6–36 week performance basis. When we sell Software Engineering Daily ads that are permanently embedded in the audio file, the advertiser is only judging us based on the performance over the next 6–36 weeks. This means that the downloads that we get after 36 weeks are not monetized.
Over the last 3 months, we have had 206 listens to this episode. At a $25 CPM (lower bound pricing for a podcast with a technical audience), that’s $5 we left on the table. We have roughly 1000 episodes in our back catalog, which means that we could have earned $5000 in additional ad sales if we could have easily sold and dynamically inserted this new ad inventory.
Podcasters with a large back catalog need a way to dynamically change the ad inventory.
The only available solutions are closed source, expensive products such as Megaphone. Megaphone hosts the shows for some of the biggest podcasters in the world, including Malcolm Gladwell.
The relationship between a podcaster and a closed source hosting and dynamic insertion platform is similar to the relationship between YouTubers and YouTube.
We are lucky to live in a time when the centralized options such as Megaphone and YouTube allow so many small media platforms to thrive. But we need options that are more portable, and are not overseen by monolithic institutions.
Podcasters make much more money from dynamic insertion.
To the extent that dynamic insertion technology is centralized, podcasting will be centralized.
Podsheets Dynamic Insertion
How would dynamic insertion work in Podsheets?
Let’s revisit the Software Engineering Daily spreadsheet from earlier, zooming in on a week in March 2018.
As a reminder, these ads are no longer generating any revenue for Software Engineering Daily. But the episodes are still getting listens.
Those ad deals were made in January 2018, and we were paid by the end of March. By the time the month of June started, we were in talks with some of these sponsors to renew their ads, and other sponsors chose not to renew.
Now let’s imagine we enter into a negotiation with Audible. Audible wants to buy 2,000 podcast episode listens at a cost of $25 CPM, coming out to a total cost of $50. In an ideal world, we would be able to serve those podcast ads on any episodes that get listens — including old episodes.
Editing old episodes requires splicing out the ads in the old audio file and splicing in the new ads to the file. This is a time-consuming process to do in an audio editing tool like Audacity or GarageBand.
For each audio file you would want to edit, you would need to do the following:
- Find the start and end timestamp of the old ad you want to replace.
- Remove audio from the old ad
- Compare the length of time of the old ad to the length of time of the new ad. If the new ad is longer, you will need to move the post-ad portion of your podcast audio file to a later start point based on the difference in time between the old and new ad. If the new ad is shorter, you will do the same movement in the opposite direction.
- Insert the new ad in the space between the two portions of the podcast audio file.
Closed-source tools like Megaphone include a script that lets you dynamically insert ads easily. Such a script can be created with FFmpeg, an open source tool for editing media files.
Podsheets will require a similar script for doing this ad splicing.
In Podsheets, we should be able to change the ads inserted in a particular episode as simply as changing the entry in our spreadsheet, illustrated below.
Under the covers, this change to the spreadsheet interface should trigger our FFmpeg script to run and replace the ads from these podcasts.
In addition, our system should register that these episodes now have an Audible ad. Our stats tracking system should aggregate the new listens that these episodes get, so that we can easily evaluate the number of listens we have received across the Audible campaign.
One downside of this system is that it requires podcasters to notate the timestamps of the initial ads in their episodes. For podcasters who can afford it, this notation exercise can be outsourced to digital knowledge workers, or to podcast audio editors.
The example of a podcaster easily changing 5 ads is simple but powerful.
On top of this spreadsheet-plus-podcast-editing interface, a podcaster could write macros to run their own creative campaigns.
- Serve ads to users in specific geographies based on IP address
- Serve ads on all episodes about a specific topic
- Make all the ad schedules across your entire inventory the same, so that you could quickly run bulk campaigns on short notice
- Make all your episodes ad-free, or run your own personal calls-to-action when you do not have ad inventory to sell
- Dynamically serve super cheap “remnant” advertising when you do not have any other ad inventory to sell
Additionally, such a spreadsheet creates a flexible shared interface. A podcaster could make this spreadsheet public, allowing advertisers to buy ads on episodes in a “self-service” fashion. The Podsheets ecosystem could develop other rich interfaces to allow a podcaster to broadcast their advertising options and explain their process for working with advertisers.
If you are not a podcaster, the ability to dynamically manage your ad inventory may not sound like a killer feature.
Understand that advertising is how podcasters make most of their money. That is not likely to change in the near future. Podcasters need advertisers. Podcasters need technology to manage their advertisements.
Why are there so many frictions in podcast advertising? Podcast advertising is decentralized.
Podcasters can and should manage their own ad inventory.
The world of podcasts is in a different universe than the rest of the Internet. Podcast advertising has moved us beyond the bureaucracy of advertising agencies, the menacing opacity of corporate advertising duopolies, and the ineffectiveness of one-size-fits-all advertising units.
With Podsheets, we can build a tool for empowering podcasters — and paint a picture for how other media formats can decentralize as well.
The Future of Media
Podcasting represents the future of media.
Consider the following three aspects of the podcast ecosystem:
- Consumers are extremely happy with the state of podcasting. There is no shortage of content. Listeners are finding comfort and joy in their podcast listening habits.
- Content producers have control of their content. In contrast to Facebook, Twitter, and YouTube, it is almost impossible to get de-platformed or de-monetized as a podcaster.
- Advertisers are seeing excellent results. Most podcast advertising is highly measurable direct response campaigns with ads read by the host. This format resonates with consumers.
Every constituent in podcasting is happy. Podcasting’s decentralized nature allows the actors to operate independently. The podcast ecosystem has self-organized into a beautiful, chaotic, positive sum environment.
Centralized platforms like YouTube, Twitter, and Facebook have enabled all of us to connect with each other. Centralization allows for economies of scale that would not exist in a fragmented, decentralized ecosystem. But that centralization also comes at a cost.
We need centralized platforms.
We need Twitter to create a town square for us all to connect within. We need Facebook to onboard new Internet users with a system that is more intuitive than the open web. We need Google to unify our intellect.
But we also need decentralization.
Decentralized technologies give us fresh, fertile ground to play with on the Internet. To see this in action, look no further than podcasts. The decentralized model of podcasting will spread across other mediums, notably video and social networking.
In the long term, Podsheets should allow podcasters to move up the stack, and own their own entire platforms, all built on open source software.
A Social Network For Every Podcast
Software Engineering Daily is built on WordPress. We use WordPress to publish our podcast episodes and share basic text and audio information with our listeners.
WordPress was built as a publishing tool, not a social network. Software Engineering Daily has a community of 30,000 frequent listeners and 150,000+ occasional listeners, and they want to connect with each other. Two years ago, our community started working on an open source platform called Software Daily.
Software Daily gives us the following tools that we cannot offer through WordPress:
- Comments, forums, and profiles that allow users to connect with each other
- Advanced indexing and search functionality
- Subscription offerings, so that users can pay us to opt out of advertising
- “Related Links”, a wiki-like feature that allows users to add additional resources
Software Daily also has mobile apps. This allows us to offer a logged-in experience to podcast listeners so we can customize their experience based on what episodes they listen to. Our mobile apps can be used to listen to our episodes without ads using our paid subscription feature.
This functionality would be a useful option to any podcaster with a substantial community. Software Daily is completely open source and could be reconfigured to work for any podcast.
In the future, Podsheets could allow a quick and easy way to spin up your own social network for your podcast.
This standalone podcast social network could also serve as a tool for a podcaster to organize events, manage donations, build an email list, sell e-commerce, and any other functionality that requires a user login.
The Coming Disaggregation
Open source software has disaggregated the infrastructure layer. But open source does not replace the proprietary layer. It gives users another option. We have Linux in addition to Windows. We have Android in addition to iOS.
The same phenomenon is coming to the application layer. We will see open source, easily manageable alternatives to YouTube, Instagram, and Netflix. Text publishing has already been disaggregated by WordPress. Audio publishing has been disaggregated by podcasts.
With Podsheets, we hope to give podcasters an alternative to the centralized networks. But it is an adjunct alternative, not a complete replacement. A podcaster does not need to opt out of the centralized ecosystem completely to use Podsheets.
As the cost of computing and storage drops to zero, the media landscape will change in favor of the creators. Raw resources will not be scarce, but creativity and entrepreneurship will always be scarce.
Video has not been disaggregated yet, possibly due to the size of video files. Social networks have notbeen disaggregated yet, perhaps because of the importance of strong recommendation systems, search algorithms, and other complexities that are required to make a decent social network.
Eventually, video creators and social media influencers will want tools to build and manage their own platforms. In the limit, every media company becomes a social media company. And every social media company wants its own platform.
The golden age of decentralized media starts with podcasting. Let’s hope it never ends.
Haseeb Qureshi is an entrepreneur and investor. As a teenager, Haseeb played poker professionally through the online poker bubble. His path from poker to software entrepreneurship has been explored in previous episodes.
In 2007, Haseeb and I met at an online poker table. As we battled each other for thousands of dollars, Haseeb and I realized we shared an affinity for obnoxious screen names, obnoxious online avatars, and the city of Austin, Texas. We were both living in the city, and met each other in the real world.
In our earliest days, Haseeb and I were not friends. It was a strange time–we were disembodied minds, drifting on the Internet, attached mostly to the fluctuating balances of our Full Tilt Poker and Pokerstars accounts. This was not a time for friendship–it was a time for ruthless, modern competition.
Throughout the history of poker, alliances have always been fickle. And online, backstabbing and deception was an art that had been barely explored. Any true friendship was a missed opportunity to exploit a competitor.
The duplicity of the online poker world knew no limits, and our sheltered, posh existence of teenagers with great parents, food on the table every evening, and no reason to worry became shattered by the daily tumult of complete financial instability.
Online poker was in a bubble. In the early days of a bubble, success comes easy. You have to be a fool to fail. When a bubble pops, the ocean washes back to the sea, and we see who is left without any clothing.
The poker bust wiped me out. Not just financially, but emotionally. In a month, I lost hundreds of thousands of dollars, and I lost my identity. After doing nothing but playing poker for years, what was I left with? What durable skills had I developed? What friends did I have to turn to? What was my ideology? What was my vision for my own future?
As I plummeted into despair, Haseeb rose like a meteor through the world of heads up poker, thriving on the rise in popularity of pot limit Omaha, a game whose theoretical complexity suited Haseeb better than the rudimentary game of no limit hold’em. The bigger the stacks, the bigger the decision trees, and the bigger the decision trees, the more edge Haseeb had over his opponents.
As a professional poker player, regardless of whether you succeed or fail, the banality of what you are actually doing eventually catches up to you. The best players are able to put an athletic framing to the game. Yes, you are competing on a zero sum basis, with a 52 card deck that was invented last century. Yes, your innovation is measured in the smallest increment of invention. But in some ways, that is the beauty of the game. We don’t need a revolution in the game of basketball, because to appreciate the dynamic of basketball is to appreciate the dynamic of humans. And the same can be said of poker.
Unfortunately, the successful online poker player must eventually have their own reality shattered. Because to be a successful poker player, you must be rigorous and critical. You will eventually be forced to step back and say: “what is this thing I’m doing every day? How have I become hooked to a screen? I don’t know how that screen works. What are these numbers? Are they fabricated? How do they control my emotions so thoroughly? Who is running this thing?”
Haseeb grew tired of poker. He wrote a book about the game to memorialize his thoughts, then abandoned it. He studied philosophy and literature, searching for something new in the historical musings of humanity. He traveled Europe, working as a farmer to reconnect with the physical world. He discovered the Effective Altruism movement.
Finding no solace in his poker spoils, Haseeb gave away most of his money and started from scratch. As he rebuilt himself, he found software engineering and charted a path to San Francisco, where we reconnected.
In this episode, Haseeb joins me for a discussion of software, philosophy, poker, and the nature of bubbles. Indeed, Haseeb and I have now lived through four major bubbles: dot coms, poker, the 2008 financial crisis, and the crypto bubble. Throughout these bubbles, the mediums change but never does the message: human beings are deeply irrational, tribalistic, and emotional.
Consul is a tool from HashiCorp that allows users to store and retrieve information from a highly available key/value data store. Consul is used for storage of critical cluster information, such as service IP locations and configuration data. A service interacts with Consul via a daemon process on the node of that service. The daemon process periodically shares information with the Consul server over a gossip UDP protocol and can share data on a more immediate basis using TCP.
Consul’s functionality has increased recently to add secure service connectivity. Consul Connect allows services to establish mutual TLS encryption with each other. The addition of mutual TLS to the Consul feature set is closely incidental with Consul gaining a title of “service mesh.”
Service mesh is an increasingly popular pattern that can encompass a variety of features: load balancing, security policy management, service discovery, and routing. Tools which offer self-described “service mesh” functionality include Linkerd, Kong, AWS App Mesh, Solo.io Gloo, and Google’s Istio open source project.
Paul Banks is the engineering lead of Consul at HashiCorp. He joins the show to talk about the service mesh category and the past, present, and future of Consul.
Data sets can be modeled in a row-wise, relational format. When two data sets share a common field, those data sets can be combined in a procedure called a join. A join combines the data of two data sets into one data set that is often bigger than the initial two data sets independently occupied. In fact, this new data set is often so much bigger that it creates problems for the machine learning engineers.
Arun Kumar is an assistant professor at UC San Diego. He joins the show to discuss the modern lifecycle of machine learning models, and the gaps in the tooling.
Arun’s research into improving processing of joined data sets has been adopted by companies such as Google. Some of that research has been adapted into open source machine learning tools that improve the performance of machine learning jobs with minimal code required.
Distributed stream processing allows developers to build applications on top of large sets of data that are being rapidly created. Stream processing is often described as an alternative to batch processing. In batch processing, a single large computation is performed over a large, static data set. In stream processing, a computation is performed repeatedly and continuously over a data set that is being appended to.
A stream is often stored in a distributed queue such as Kafka, Kinesis, Pulsar, or Google PubSub. A stream is often processed with a stream processing tool such as Spark, Flink, Storm, or Google Cloud Dataflow.
Holden Karau is an engineer who works on open source projects at Google. She returns to the show to describe the state of stream processing and discuss modern best practices.
A software application requires compute and storage.
Both compute and storage have been abstracted into cloud tools that can be used by developers to build highly available distributed systems. In our previous episode, we explored the compute side. In today’s episode we discuss storage.
Application developers store data in a variety of abstractions. In-memory caches allow for fast lookups. Relational databases allow for efficient retrieval of well-structured tables. NoSQL databases allow for retrieval of documents that may have a less defined schema. File storage systems allow the access pattern of nested file systems, like on your laptop. Distributed object storage systems allow for highly durable storage of any data type.
Amazon S3 is a distributed object storage system with a wide spectrum of use cases. S3 is used for media file storage, archiving of log files, and data lake applications. S3 functionality has increased over the years, developing different tiers of data retrieval latency and cost structure. AWS S3 Glacier allows for long-term storage of data at a large cost reduction, in exchange for increased latency of data access.
Kevin Miller is the general manager of Amazon Glacier at Amazon Web Services. He joins the show to talk about the history of storage, the different options for storage in the cloud, and the design of S3 Glacier.
On Amazon Web Services, there are many ways to run an application on a single node.
The first compute option on AWS was the EC2 virtual server instance. But EC2 is a large abstraction compared to what many people need for their nodes–which is a container with a smaller set of resources to work with. Containers can be run within a managed cluster like ECS or EKS, or run on their own as AWS Fargate instances, or simply as Docker containers running without a container orchestration tool.
Beyond the option of explicit container instances, users can run their application as a “serverless” function-as-a-service such as AWS Lambda. Functions-as-a-service abstract away the container and let the developer operate at a higher level, while also providing some cost savings.
Developers use these different compute options for different reasons. Deepak Singh is the director of compute services at Amazon Web Services, and he joins the show to discuss the use cases and tradeoffs of these options.
Deepak also discusses how these tools are useful internally to AWS. ECS and Lambda are high-level APIs that are used to build even higher level services such as AWS Batch, which is a service for performing batch processing over large data sets.
Ben Lorica is the chief data scientist at O’Reilly Media and the program director of the Strata Data Conference. In his work, Ben spends time with people across the software industry, giving him broad perspective.
In the early days of the data engineering ecosystem, the Hadoop vendor wars were starting between Cloudera and Hortonworks. Strata was a neutral ground for practitioners and open source contributors to meet and share ideas about the Hadoop ecosystem. Since then, the conference has grown to encompass topics such as data science, distributed databases, streaming frameworks, and machine learning.
There are many open questions in the data world right now. What is the best path that an enterprise can take to build out a data platform? How should a software team be arranged to efficiently build machine learning models? Which distributed streaming frameworks should I use for what purpose?
Ben joins the show to discuss modern data engineering, data science, and infrastructure.
A currency can fulfill numerous financial use cases.
One use case is store of value: currency holders can reliably expect their currency to maintain some value, though that value may fluctuate over time. Another use case is speculation: currency holders are owning currency in the hope that the market price of the currency will increase over time.
Bitcoin is a useful store of value and an instrument for speculation. However, Bitcoin still does not fulfill the financial use case that most people need from a currency: price stability. The price of Bitcoin fluctuates rapidly, making it difficult to use Bitcoin for small purchases such as coffee.
Imagine you want to buy a cup of coffee with Bitcoin. The coffee shop owner needs to offer the option to sell you that cup of coffee using Bitcoin as the medium of exchange. This owner must denominate the price of that coffee as some number of Bitcoin. Since the price of Bitcoin fluctuates so rapidly, the coffee shop owner needs to adjust the price of that cup of coffee constantly in order to make sure that the coffee is cheap enough for the consumer to want to buy it, but expensive enough to make a profit.
It is hard to assign prices to market goods in terms of Bitcoin because the currency is in constant flux. Even though many of us would like to use Bitcoin in our everyday lives, most marketplaces are denominated in US dollars or other currencies because a marketplace needs a stable currency in order to operate.
Rune Christensen is the CEO of MakerDAO, a system that provides a price-stable cryptocurrency. MakerDAO is an elegant set of currencies, collateralized debt, smart contracts, and other incentive tools that result in the creation of several transparent, decentralized financial instruments.
Rune joins the show to talk about the importance of stablecoins and how MakerDAO has engineered a decentralized currency that has maintained stability even through tumultuous market conditions.
Chris Yeh is an entrepreneur, investor, and author. He co-wrote Blitzscaling with LinkedIn founder Reid Hoffman.
Blitzscaling is a strategy for growing a company that has found product market fit. Blitzscaling prioritizes speed over efficiency, arguing that fast growth is necessary to achieve “first scaler advantage.” When a company is the first to scale successfully within a large market, that company gains access to a wealth of market opportunities that are not available to companies which are not at scale.
Examples of successful Blitzscalers include Airbnb, LinkedIn, Amazon, and Facebook. In the hypergrowth phases of these companies, there were deliberate strategic tradeoffs that caused the company to suffer in the short term in exchange for the chance at market dominance in the long term.
Blitzscaling is a broad strategic concept which manifests differently in different companies.
When Airbnb was in its early stages of growth in 2011, the company was faced with the existential threat of a European competitor called Wimdu. Wimdu offered to sell to Airbnb, but this would have required the merger of two companies with distinctly different cultures. Instead, Airbnb chose to raise more money and rapidly expand into Europe.
In contrast, Google’s rapid path to becoming a dominant information service involved acquisitions that we now see as key Google products, including Android, Google Maps, and Google Earth.
Through numerous examples in recent business history, Blitzscaling explores the fundamental tradeoff between speed and efficiency, usually biasing speed as the preferable element. But Blitzscaling does not work for every company.
In the food delivery sector, many companies who tried to blitzscale ended up going out of business because they had lowered their prices too much in order to try to earn customer loyalty. By lowering their prices too much, food delivery startups built businesses with fundamentally bad unit economics and a fickle customer base.
In other cases, aggressive blitzscaling can work for a short period of time, but can cause a company’s culture to suffer in ways that are very hard to repair. Blitzscaling can also cause problems in a core software product. Growing too quickly can cause a product to have a bloated user interface. If the backend infrastructure layer expands too quickly, sensitive data could be left exposed due to a lack of proper software security policies.
Chris Yeh joins the show to talk about the strategy of Blitzscaling and his wide-ranging career. Chris studied creative writing and product design at Stanford before joining DE Shaw, the famous quantitative hedge fund. Later, he became an investor and worked in several leadership roles in software companies.
His wide range of experiences make Chris an excellent author and conversationalist. We explored the ideas of both Blitzscaling and his previous book The Alliance, which lays out a modern vision for the dynamic between employers and employees. We also talked about investing, Dungeons and Dragons, and podcasting.
Uber’s infrastructure supports millions of riders and billions of dollars in transactions. Uber has high throughput and high availability requirements, because users depend on the service for their day-to-day transportation.
When Uber was going through hypergrowth in 2015, the number of services was growing rapidly, as was the load across those services. Using a cloud provider was a risky option, because the costs could potentially grow out of control. Uber made a decision early on to invest in physical hardware in order to keep costs at a reasonable level.
In the last 3 years, Uber’s infrastructure has stabilized. The platform engineering team has built systems for monitoring, deployment, and service proxying. Developing and maintaining microservices within Uber has become easier.
Prashant Varanasi and Akshay Shah are engineers who have been with Uber for more than three years. They work on Uber’s platform engineering team, and their current focus is on the service proxy layer, a sidecar that runs alongside Uber services providing features such as load balancing, service discovery, and rate limiting.
Prashant and Akshay join the show to talk about Uber infrastructure, microservices, and the architecture of a service proxy. We also talk in detail about the benefits of using Go for critical systems infrastructure, and some techniques for profiling and debugging in Go.
The post Uber Infrastructure with Prashant Varanasi and Akshay Shah appeared first on Software Engineering Daily.
Google has been building large-scale scheduling systems for more than fifteen years.
Google Borg was started around 2003, giving engineers at Google a unified platform to issue long-lived service workloads as well as short-lived batch workloads onto a pool of servers. Since the early days of Borg, the scheduler systems built by Google have matured through several iterations. Omega was an effort to improve the internal Borg system, and Kubernetes is an open source container orchestrator built with the learnings of Borg and Omega.
A scheduling system needs to be able to accept a wide variety of workload types and find compute resources within a cluster to schedule those workloads onto.
There is a wide variety of potential workloads that could be scheduled–batch jobs, stateful services, stateless services, and daemon services. Different workloads can have different priority levels. A high priority workload should be able to find compute resources quickly, and a low priority workload can wait longer to find resources.
Brian Grant is a principal engineer at Google. He joins the show to talk about his experience building workload schedulers and designing APIs for engineers to interface with those schedulers.
Google’s Borg system is a cluster manager that powers the applications running across Google’s massive infrastructure. Borg provided inspiration for open source tools like Apache Mesos and Kubernetes.
Over the last decade, some of the largest new technology companies have built their own systems that fulfill the roles of cluster management and resource scheduling. Netflix, Twitter, and Facebook have all spoken about their internal projects to make distributed systems resource allocation more economical. These companies find themselves continually reinventing scheduling and orchestration, with inspiration from Google Borg and their own internal experiences running large numbers of containers and virtual machines.
Uber’s engineering team has built a cluster scheduler called Peloton. Peloton is based on Apache Mesos, and is architected to handle a wide range of workloads: data science jobs like Hadoop MapReduce; long running services such as a ridesharing marketplace service; monitoring daemons such as Uber’s M3 collector; and database services such as MySQL.
Min Cai and Mayank Bansal are engineers at Uber who work on Peloton. When they set out to create Peloton, they looked at the existing schedulers in the ecosystem, including Kubernetes, Mesos, Hadoop’s YARN system, and Borg itself.
Both Min and Mayank join the show today to give a brief history of distributed systems schedulers and discuss their work on Peloton. They have been working in the world of distributed systems schedulers for many years–including experiences building core Hadoop infrastructure and virtual machine schedulers at VMware.
The post Peloton: Uber’s Cluster Scheduler with Min Cai and Mayank Bansal appeared first on Software Engineering Daily.
Log management requires the processing and indexing of high volumes of semi-structured data. A log management service takes log data and puts it in a cloud-hosted application so that application operators can access those logs to troubleshoot issues.
A large tech company will produce terabytes of logs. Those logs are produced on the host where a service is running. A logging agent on that host will transfer the logs to the log management service in the cloud. Once the logs are in the cloud, they are parsed, indexed, and stored in a way that is easy to query.
In 2014, Renaud Boutet co-founded Logmatic, a log management service that eventually became a leading provider. Logmatic was acquired by Datadog, and Renaud now works as a vice president at Datadog. In today’s episode, Renaud joins the show to talk about the architecture of a log management service. We talk about storage tiers, scalability requirements, failover strategies, and logging for serverless functions.
Full disclosure: Datadog is a sponsor of Software Engineering Daily.
Steve Herrod was the CTO at VMware and now works as a managing director at General Catalyst, where he focuses on investments relating to security.
Large enterprises are difficult to secure. An enterprise has sprawling infrastructure, with both on-prem and cloud infrastructure. Identity management systems, vulnerability scanning, secure network infrastructure, and policy management tools are just a few example areas where enterprises spend billions of dollars on security software.
Threats often make their way into an enterprise by way of social engineering. This can result in phishing attacks, corporate espionage, and ransomware. Protecting against social engineering is very difficult, as there are so many channels to communicate through–Facebook Messenger, Linkedin, email, and ad networks can all be used to perform social engineering attacks.
Enterprise security software is a very different business from other types of software companies. Unlike developer tools or cloud infrastructure, security software is usually not self-serve. Security solutions usually require a longer sales and integration process with a customer.
Steve Herrod joins the show to talk about the enterprise security world, the go-to-market strategy for successful security companies, and his perspective on what makes for a viable venture capital investment.
Coding in the browser has been attempted several times in the last decade. Building a development environment in the browser has numerous technical challenges. How does the code execute safely? How do you fit all of the requirements of a development environment into a browser window? How do you get users to switch from their normal IDE (interactive development environment)?
CodeSandbox is an online code editor created by Ives van Hoorne and Bas Buursma. CodeSandbox allows users to program and run applications in the browser. It is a full developer platform that allows users the ability to install npm modules, run their code, and share their applications with other users.
The engineering problems within CodeSandbox are not easy–building a web-based IDE is complicated. But CodeSandbox is also an exciting project because it lowers the barrier to entry for many newer programmers. The development experience for a new programmer is still a difficult onramp.
If you are an experienced developer, you have a workflow that you are comfortable with. It might involve vim, or emacs, or JetBrains IDEs, or Eclipse. But newer developers can find these environments confusing and hard to get started with. The development environments of today are integrated with build tools, Github repositories, and deployment platforms. This can be overwhelming for a newer developer.
CodeSandbox is a very visual tool, which makes it especially useful for new developers who learn through seeing examples running live in the browser. CodeSandbox is also used by web developers who want a modern, shareable form of developing software.
Ives and Bas join the show to talk about the motivation for CodeSandbox and the engineering challenges they have solved.
The post CodeSandbox: Online Code Editor with Bas Buursma and Ives van Hoorne appeared first on Software Engineering Daily.
Data engineering touches every area of an organization.
Engineers need a data platform to build search indexes and microservices. Data scientists need data pipelines to build machine learning models. Business analysts need flexible dashboards to understand the trends and customer use for a product.
Max Beauchemin is a data engineer who has worked at Airbnb, Lyft, and Facebook. He’s the creator of two successful open source projects: Apache Airflow and Apache Superset. In a previous show, Max discussed data engineering at Airbnb, and the usage of Airflow. In today’s show, Max discusses the engineering of Apache Superset.
Superset is an open source business intelligence web application. Superset allows users to create visualizations, slice and dice their data, and query it. Superset integrates with Druid, a database that supports exploratory, OLAP-style workloads.
One reason Superset is distinctive is that it is a full open source application. Many open source projects are tools like databases, command line tools, and web frameworks. Superset is an open source application that can be used by individuals who are not developers–so the audience is wider than the typical open source tool built for engineers.
Max joins the show to talk about his experience as a data engineer at Airbnb and Lyft, and the open source projects he has started.
Twitter’s early engineers faced scalability problems that caused infrastructure failures on a regular basis. The infamous “fail whale” could happen as a result of problems in the application servers, the network, or the database layer.
When Twitter was scaling in its early days, the cloud providers were still immature. Engineers did not have access to the autoscaling cloud infrastructure that is available today. The early Twitter architecture was a combination of open source tools and internally created infrastructure custom built for Twitter’s workloads.
Evan Weaver was an early engineer at Twitter, and he saw the deficiencies of the data tools that the company had access to. Twitter engineers wanted access to a truly reusable data platform that would fit Twitter’s requirements: high availability, globally replicated, and transactionally consistent.
By 2012, Evan had left Twitter and started consulting for other technology companies. He found that databases across the industry were lacking the same properties that Twitter wanted, and the ideas for FaunaDB began to percolate. Around this time, there were two relevant papers about distributed databases that had come out: the Spanner paper from Google and the Calvin paper, a distributed systems paper from Yale.
With inspiration from the literature, his time at Twitter, and his knowledge from consulting, Evan started FaunaDB. Seven years later, FaunaDB is a fully fledged company with a cloud service offering. Fauna is an OLTP database used by companies like Nvidia, Nextdoor, and Capital One.
Evan joins the show to talk about his time spent scaling Twitter and the architecture of FaunaDB.
Bol.com is the biggest e-commerce company in the Netherlands and Belgium. For 20 years, Bol has been developing its software architecture, which includes a variety of services and databases, and a mix of physical and cloud infrastructure.
For an ecommerce company, the search engine is critical for allowing customers to find the products they are looking for. But search also has many applications for internal systems. A search engine is a database with a query engine, and internal application developers want to build on top of that database.
Volkan Yazici is an engineer at Bol.com specializing in search and the author of the blog post Using ElasticSearch as the Primary Data Store. In his post, Volkan describes the process of scaling ElasticSearch to fit the use cases of both internal and external users at a large ecommerce company.
Volkan joins the show to discuss how search infrastructure at scale can require a carefully architected data pipeline in order to propagate changes to a large data set to a search index.
Modern web development tools have given frontend developers more power.
There are also a multitude of APIs that give developers rich business functionality out of the box, making it easy to build applications around SMS, payments, and computer vision. If you are building a new application today, you have the option to build it around a completely “serverless” architecture.
As the backend and frontend have changed, the middleware to communicate between those layers has also evolved. GraphQL is a modern way of fetching data from disparate data sources.
In previous episodes, we have talked about how GraphQL works, and some common patterns for using GraphQL in mature applications. In today’s episode, Tanmai Gopal joins the show to describe how to use GraphQL in newer applications. Tanmai is the CEO of Hasura, a company building tools around GraphQL. He discusses the advantages of using serverless functions together with GraphQL, and how to architect an event-based serverless application.
Red Hat was the first commercial open source software company. For years, investors and entrepreneurs assumed there would never be another Red Hat.
Red Hat’s business was built around enterprise operating system distribution and support. Since the operating system is at the core of how users within a company are doing their job, Red Hat had a lot of leverage and a strong business model. But how many enterprise software products could be so critical to a business that they could manage to offer their software as an open source option yet still make money?
As it turns out, there are many ways to make money in open source.
MySQL ate away at the dominance of Oracle’s database business for similar reasons to Red Hat’s success: much like an operating system, the database layer is critical infrastructure. Cloudera and Hortonworks were able to monetize the open source Hadoop project because Hadoop was hard to deploy and manage.
As cloud infrastructure matured, it became easier to start companies that offered open source software as-a-service. Elastic offers an easy way to use ElasticSearch. RedisLabs offers Redis as a service. MongoDB (the company) offers MongoDB as a service. As it turns out, engineers love to see the source code for databases, but they do not enjoy deploying and managing them and they are happy to pay providers to save them time.
Still, there is continued skepticism of open source businesses.
Today’s debates center around whether individual providers like Elastic can offer a service that competes with an ElasticSearch service offered by AWS. We could just as easily be asking the inverse question– how can AWS compete with an entire company that is dedicated to the deeply technical problem of solving search?
The reality is that most of these open source product categories have an enormous total addressable market, and extremely good unit economics. This applies to both cloud providers and point solution providers. Investors often talk about how much they love subscription businesses. When a company starts purchasing infrastructure-as-a-service from you, it is like they are buying a subscription where the annuity increases over time!
When an investor says they are worried about a giant cloud provider offering the same service as an open source company, it is similar to the investor being worried that a new sales tool is going to be duplicated by Salesforce. The market always needs new sales tools–and the market needs those tools to be offered both by Salesforce and by smaller CRM companies.
In the world of commercial open source, there is plenty of room for both point solution providers and cloud providers. But they are competing for the same customers, and the competitive battlefield is expanding to the nuanced world of software licensing. By changing their licenses, open source projects like Kafka, MongoDB, and Redis can prohibit AWS from certain usage patterns. This might offer some protection for companies based around the point solutions–companies like Confluent and RedisLabs.
Beyond the fracas of the battle between cloud providers and point solutions, there are newer open source companies with models that do not fit tightly into any historical business models. HashiCorp makes a suite of differentiated open source tools that have not been seriously contested or offered as a service by cloud providers. GitLab makes an open source platform that is built with monitoring, logging, CI, and code hosting out of the box.
As the world of open source business models expands, more companies will find opportunity in open sourcing the code that runs their products. In many cases, they will find that it strengthens their advantage rather than weakens it. The defensibility of many businesses relies more on data and network effects than the contents of the codebase. We may see the default question gradually shift from “why should I open source my codebase?” to “why shouldn’t I open source my codebase?”
Mike Volpi is a partner at Index Ventures and has invested in many open source businesses over the last decade. He is on the board of Confluent, Cockroach Labs, Kong, and Elastic. Mike joins the show to share his perspective on open source business models of the past, present, and future.
This is a post written and narrated by Haseeb Qureshi, a cryptocurrency investor and entrepreneur. Haseeb is speaking at an upcoming Software Engineering Daily Meetup.
We can safely say the ICO bubble is over now.
When the bubble finally popped last year, the “market cap” of all crypto fell over $700B, an 85% drop from its peak in January — steeper than the dotcom bubble’s 78% crash. The media gawked at this collapse, and as usual, proclaimed this was the nail in the coffin for cryptocurrencies.
There’s already been enough hysterics and I-told-you-sos. In this essay, I just want to answer the simple question:
Why did the ICO bubble happen?
It’s easy to believe that the ICO bubble, having taken place on uncensorable public blockchains, was a fundamentally new phenomenon.
The technologies that enable bubbles are always new, but the underlying social dynamics are not. The open and permissionless nature of blockchains allows anyone to co-opt them. Thus, blockchains enabled multiple social forces, all interacting in the same network, all reified under the name “the ICO bubble.”
In this blog post, I’ll examine three major moments in history that illuminate three separate social dynamics that were at play in the ICO bubble.
The first is the peer-to-peer file sharing revolution in the late 2000s, which explains the ideology of decentralization, the proclamations of revolution, and companies trying to circumvent securities laws.
The second is the penny stock boom of the 90s, which explains the casino of shitcoin gambling, market manipulation, and fraudsters that comprised the long tail of ICOs.
And the third is the dotcom bubble, which explains the mass of speculators, the new paradigm of decentralized companies, the VC coins, and the redistribution of wealth.
By exploring these episodes, I hope to show you how the ICO bubble recapitulated well-known patterns of human behavior.
History does not repeat, but there are a few refrains it loves to come back to.
I. The file sharing revolution
Bitcoin shares deep roots with the P2P networks. File sharing protocols became the world’s first global decentralized networks. Bitcoin’s gossip-based networking model was inspired by Gnutella, the protocol behind LimeWire. Many P2P barons were foundational to the crypto movement: Jed McCaleb of eDonkey2000, Zooko Wilcox of Mojo Nation, and Bram Cohen of BitTorrent to name a few. They also share a philosophical lineage — Lawrence Lessig, the intellectual godfather of piracy culture, is the originator of the phrase “code is law.”
The P2P file sharing revolution began in 1999, with a little application called Napster. On its face, Napster was straightforward: log in, search for a song you want, double click, and it’s yours.
It sounds simple, but it’s hard to describe how large of a paradigm shift Napster was.
Remember what it was like to purchase music in 1999: standing in a CD aisle, surrounded by rows of disc jackets, debating in your head which album to spend your $20 on. Jay-Z? Smashmouth? Or maybe J-Lo? Every purchase was a careful tradeoff. Music was scarce and precious.
Napster changed all that. It was like a bank vault of music was propped open, free for anyone to plunder. Entirely via word of mouth, Napster spread across America like a riot, clogging up bandwidth on college campuses and dialup lines.
Soon, legal challenges from Metallica and Dr. Dre would thrust Napster into news headlines. Napster seized the national conscience. At its peak in 2001, the service had more than 80M registered users. The RIAA took notice.
After a lengthy court battle with the RIAA, a judge ruled Napster liable for all of its users’ copyright infringement, despite the fact that Napster’s servers didn’t host any copyrighted content. This legal doctrine, known as vicarious infringement, was the death knell of Napster and of any file sharing-based business model. Napster was driven to bankruptcy and forced to clamp down on all illegal file sharing. But Napster’s surrender was only the beginning of this war.
To the digital revolutionaries, the lesson of Napster was obvious. Despite all downloads being peer-to-peer, Napster operated a central server, primarily used for search indexing and peer discovery. This was its downfall. If the file sharing revolution were to continue, it would have to decentralize and become resilient to legal injunctions.
A traditional war had to evolve into guerrilla warfare.
Decentralized alternatives to Napster gradually arose, intentionally designed around this legal constraint. Successors like Gnutella (LimeWire) and eDonkey2000 (eMule) would have decentralized architectures that would be much more difficult to take down.
As the children of Napster proliferated, a philosophy began to solidify around internet piracy. Slogans materialized: “information wants to be free,” “open culture,” “sharing is caring.” A new political party called The Pirate Party was formed, championing online freedom and copyright reform, winning multiple political appointments across Europe. Radical innovations in intellectual property were explored such as the Creative Commons and copyleft licensing. The revolution had gained an energy and identity of its own.
BitTorrent, founded by Bram Cohen in 2001, would arguably be the last stage in the evolution of P2P file sharing. The BitTorrent protocol overtook KaZaA, Limewire, DC++, SoulSeek, and all the other P2P networks. By 2012 it’s estimated to have peaked at a staggering 400M-500M monthly active users, almost half of the entire Facebook userbase at that time. During its peak, BitTorrent was by far the single largest source of Internet traffic in the world.
It’s worth asking: why did BitTorrent dominate file sharing while other networks fell into irrelevance?
Simon Morris, a former executive at the BitTorrent company, wrote an excellent four-part tour de force analyzing the parallels between BitTorrent and crypto (if you can’t be bothered to read the whole thing, I encourage you to read its final chapter). I’ll be building upon many of his insights here.
BitTorrent was intentionally structured differently from other P2P file sharing projects. The project’s original home page begins its explanation of BitTorrentlike this:
BitTorrent is a free speech tool. BitTorrent gives you the same freedom to publish previously enjoyed by only a select few with special equipment and lots of money. (“Freedom of the press is limited to those who own one” — journalist A.J. Liebling.)
It is a surprisingly dry, intellectual manifesto.
Compare this to KaZaA’s credo:
Bram Cohen explicitly disavowed all illegal file sharing usage of BitTorrent. He never once acknowledged this as a legitimate use of the service. The core team and their messaging was unimpeachable. And this is precisely what allowed BitTorrent to flourish on the back of all of its legitimate uses: Linux distros, World of Warcraft updates, dataset sharing, and so on.
BitTorrent was never supposed to be a revolution in internet piracy; it was supposed to be a revolution in low-cost file distribution. This unobjectionable mission statement made BitTorrent safely beyond the reach of the RIAA or any other aggrieved copyright holder.
There’s a striking parallel with crypto: Vitalik and the Ethereum core team never endorsed the flood of ICOs — they often denounced them. This is precisely what allowed Ethereum to flourish, despite being subverted by ICOs for speculative and extralegal purposes. If Ethereum did not brand itself as a revolution in decentralized computing, as “the world computer,” it would have been labeled by regulators as an illegal ICO platform.
If you fast forward to today, the story of P2P file sharing is the story of BitTorrent. All other protocols have faded into obscurity. But BitTorrent is no longer used for downloading music in the western world.
I’ll give you three reasons: Spotify, Apple Music, Pandora. Newcomers in the music industry have adapted, and these services transformed the experience of discovering and listening to music.
P2P file sharing once competed against the experience of driving to Walmart and buying a $20 DRM-protected CD to listen to a single hit song. Between the two options, the decision was comically easy: just pirate it.
Daniel Ek, the former CEO of uTorrent, understood this firsthand. After witnessing the new decentralized file sharing networks replace Napster, Ek came up with the idea for a company he’d eventually name Spotify:
I realised that you can never legislate away from piracy. Laws can definitely help, but it doesn’t take away the problem. The only way to solve the problem was to create a service that was better than piracy and at the same time compensates the music industry — that gave us Spotify.
Today P2P file sharing has significantly declined. But make no mistake: the digital piracy revolution moved industries. It forced music and film to cater to a digital-first world and eventually incentivized the invention of web streaming. The companies that didn’t adapt are now footnotes in history. Those that did will get to build the media dynasties of the next decade.
The lesson of P2P file sharing
If you listen to Lawrence Lessig or Peter Sunde, it’s easy to assume the file sharing revolution was rooted in ideology. But few file sharing veterans are still committed to piracy today. Trying to explain file sharing by appealing to intellectual property reform is like trying to explain the Boston Tea Party by appealing to John Locke. Ideology, while important, is usually post hoc.
The real story is much simpler than that.
The file sharing revolution took off because people opposed a rule: you can only consume music how the record industry says you can.
People hated this rule. So they broke it. And P2P protocols enabled this great, overwhelming mob of rule breakers to demonstrate the way they thought music ought to work.
Today, music works that way. No matter where you are, by doing a search and double clicking, you can listen to almost any song ever created.
Simon Morris claims this was the raison d’être of BitTorrent, and of decentralized networks more generally. Decentralization allows rules to be broken. And when a rule is opposed widely and strongly enough, people will build the technology to break that rule, and that technology will spread.
Without this sort of rule-breaking (whether by accident or design) it’s quite hard to imagine why decentralized blockchain technology even matters. It’s a distributed data store with a complicated and slow update mechanism… The one value proposition that everyone seems to agree on for blockchain technologies is that they are ‘censor-proof’. And this matters only if you have something that someone wants to censor.
Every rebel wants a cause
So why did ICOs need to be decentralized? Why couldn’t the bubble have kicked off through equity-backed blockchain startups like in the dotcom bubble?
To Simon Morris, the answer is obvious: because ICOs were not just about the chance to invest in speculative blockchain projects. They were also about breaking rules: the rules around capital formation.
This begs the question, why do people want to break the rules around capital formation so badly?
The answer is complicated.
Consider the initial conditions of the ICO bubble. The bubble was primarily driven by countries in Asia with high savings rates and income inequality — China, Japan, and Korea. Over the last decade, we’ve seen income inequality rising, declines in wage growth for the global middle class, waning trust in governments, and a mass of overeducated young people with shrinking opportunities.
The ICO bubble was a loud rattling of this cage.
The most massive wealth creation events in the world took place over the last decade — but it all went to other people. You didn’t get any. This new wave of technology has subjugated your digital life and attention span, but its fruits belong to the capitalists in Silicon Valley, not you.
The ICO bubble let young people convince themselves: hey, maybe I can get my share. I see this Bitcoin thing, I see this Ethereum thing, they’re so novel and revolutionary, why couldn’t they change the world?
And wouldn’t they get just a little convinced that for once, they were a step ahead of their parents, ahead of the gatekeepers, ahead of Wall Street and Silicon Valley?
So they got in early. They started using VPNs. They got friends and family to create overseas accounts. They broke the rules. And what were those rules good for anyway, besides rigging the system for the rich? Why couldn’t anyone in the world invest in whatever they wanted? Who needs disclosures anyway, if the future is going to be open source?
The market kept proving them right. So they speculated, they joined Subreddits and debated ideas and convinced themselves that their investments would revolutionize the world’s infrastructure. A decentralized future was fast approaching, and they were going to be, for once, at the vanguard ushering it in.
And of course, when reality finally caught up, it showed them the consequences of breaking those rules. 2018 brought everything crashing down, laying bare all the scams, frauds, and widespread market manipulation. A once ecstatic market flattened out like a pancake.
And here we must also acknowledge the other side of the market: entrepreneurs. To them, ICOs represented a great equalizer. After all, ICOs were dispersed internationally, and venture capital is still hard to come by outside of Silicon Valley. In the age of the Internet, in the age of blockchain, why hadn’t technology already leveled the playing field? Why should it matter where an entrepreneur lived or what language they spoke, so long as they knew the language of programming?
With ICOs, you didn’t need an intro to Sequoia to get your company started. Now you simply needed a good idea and a white paper, and the world’s capital would beat a path to your door.
Remember all the stories of VCs quaking in their Patagonia vests, worried that they were being displaced?
And then there were the reverse ICOs, where established companies ICOed their own token, as in the case of Kik or Kakao. This is perfect if you want to fund an internal blockchain-related initiative, but skip all the burdensome overhead of shareholder protections or revenue generation.
Even among startups funded by Y Combinator, the hottest accelerator in the world, I heard reports that a startling number of them were pondering ICOs. Even Silicon Valley elite wanted to break the rules! In this case, they wanted a way out of their illiquid startup ownership, and speculators were all too happy to provide it to them.
It was only in retrospect that they would each realize why these rules were there in the first place. Once again, crypto relearned the lessons that traditional finance had long ago internalized.
The revolution that cried wolf
Let’s grant that the ICO bubble was instigated by a desire to break the rules around capital formation. Today, with the dust having settled and the ICO bubble now an awkward memory, we can reexamine its battle cries with more clarity.
Did people really care about changing the rules around capital formation? Did they really care about democratizing investing access? Did they really care about reforming accredited investor laws, financial disclosures and AML/KYC requirements?
As for the file sharing revolution — copyright law is, for all intents and purposes, mostly intact 20 years later.
So what does the file sharing revolution tell you about the ICO bubble?
First, it tells you that you should not take ideology at its word. The underlying causes of revolutions are usually more pragmatic than they appear.
It also explains the supply side for ICOs, companies that wanted to circumvent traditional channels for capital formation, and the demand side, individuals who were desperate to get access to high-growth speculative investments. Once the incentive to break the rules has waned, the revolution is likely to stop. That is precisely what we’ve seen in both file sharing and in ICOs.
But there’s another, darker side to explaining the ICO bubble — as an enabler of fraud, manipulation, and gambling. For this, we turn to our second historical model: the penny stock boom of the 90s.
II. Penny stocks
Balaji Srinivasan once claimed that tokens would turn blockchains into the world’s biggest stock market. This may someday be true — but for now, it seems that blockchains have become the world’s biggest penny stock market instead.
The term “penny stock” evokes images of shady stockbrokers in boiler rooms, and for good reason. You’ll remember Jordan Belfort, the protagonist of the 2013 film Wolf of Wall Street, made his fortune hustling as a penny stock broker.
Penny stocks are defined differently in different countries, but in the US, a penny stock is a stock issued by a small company that trades below $5 a share (originally it was stocks that traded below $1, but inflation and all). They are generally quoted OTC and seldom trade on national exchanges. They have low liquidity, little public information, and are not required to make significant financial disclosures.
As such, they are plagued with fraud.
The history of penny stocks
The legal designation of penny stocks began after the 1929 stock market crash, which subsequently triggered the Great Depression. It was believed that the crash was partially caused by unbridled speculation on penny stocks, and this led to the Securities Exchange Act of 1934 designating legal restrictions on penny stock trading.
Throughout most of the 20th century, penny stock offerings could not legally be placed in newspapers. Orders could only be placed via telephone. The highest quality penny stocks would only provide financial reporting once a year, and the very worst penny stocks had no financial disclosures at all. Given these barriers, penny stocks tended not to receive much attention.
But starting in the mid-90s with the growth of the Internet, penny stock trading exploded. Discount brokers emerged, offering automated interfaces and much lower trading fees. As retail investors flooded in, the space grew faster than regulators could track it, and market manipulation became rampant.
Eventually, the SEC stepped in and brought a string of high profile casesagainst Mafia crime families for penny stock manipulation schemes. All of these schemes were ultimately enabled by the Internet: it accelerated the velocity of fraud and allowed bad actors to connect directly with speculators.
Penny stock trading has been brought back under stricter regulatory oversight, but it’s still extremely speculative, and manipulation is common. In 1989, the heyday of penny stocks, a survey found that Americans had been cheated out of at least $2B a year by fraudulent penny stock schemes.
There’s an obvious parallel here with the ICO bubble.
The SATIS group estimated that 81% (!) of ICOs were scams and it’s been widely reported that over $9M was stolen per day in 2018 (annualized, that would be upwards of $3B a year). By sheer number, the overwhelming majority of ICOs can be explained this way. But the parallel between ICOs and penny stocks runs deeper.
Let’s take a step back here and ask: why are people so drawn to penny stocks in the first place, given how fraught they are?
As a former professional poker player, I can tell you the answer is simple: people love to gamble. Humans will forever be drawn to the idea of turning around their fortunes, of outsmarting the establishment, of skipping steps on the social ladder. No matter how much gambling is stigmatized, regulated, or outlawed, it always survives one way or another.
For penny stocks, they tap into the same greedy credulity behind all get rich quick schemes. It hardly matters what the underlying company does.
But we must also acknowledge the other side of this market: the hustlers and fraudsters. To them, penny stocks are a gift. And they also don’t care what the stocks represent — they simply need a company with a ticker and story to manipulate. Greed takes care of the rest.
Reminding you of anything?
An embarrassment of riches
Here’s a telling article published in 2000 on common Internet penny stock pump and dump schemes. Let’s contrast it to ICOs.
Before the Net, these promoters hired squads of telemarketers to push their stocks on unsuspecting investors. Now, it’s as easy as blasting out e-mail, or if they’re industrious, maintaining a Web site.
The modern version: hiring an ICO marketing agency, which will manage your Telegram chat, Subreddit, Medium account, and BitcoinTalk thread.
Don’t have a real community? No problem, buy a fake one.
The current fashion is to announce the discovery of a new, inexpensive method of sending broadband signals over telephone lines at speeds far exceeding existing technologies… The promoters issue a constant stream of press releases chronicling development breakthroughs, marketing agreements and endorsements of the technology from qualified scientists.
How about a “hyper-scalable” “quantum-resistant protocol” for “supply chain management”? Getting some credentialed advisors to praise your protocol? Maybe rumored partnerships with a couple Fortune 100 companies?
Optimistic posts then begin appearing on Internet stock message boards, such as those maintained by Yahoo and Raging Bull. Because few have heard of the new company, the promoters plant messages on other heavily frequented boards.
If the promoters do a good job, and the market is strong, the stock price can soar from a few cents a share to $10, or in some cases, much more. Eventually, the share price collapses after the promoters sell out and quit pumping.
(many of these examples are drawn from Anatomy of a Pump & Dump Group)
You know the jig. Bounties, referral bonuses, airdrops, presales, advisor shares, purchased reviews, paid followers, social media bots, wash trading, painting TA signals, and so on. By the time the ICO boom had gone mainstream, this procedure was a well-oiled machine.
Most of the long tail of ICOs — and that tail was very long — were complete nonsense. According to most trackers, the total number of ICOs was well into the thousands, and that’s only counting those that were able to rise above the noise.
And like each penny stock boom before it, the ICO boom ended in multiple regulatory actions against the worst offenders. But regulators can’t reach everyone, and most of the pump and dumpers either moved on or continue to manipulate coins at smaller scales.
So who was the ICO boom for?
ICOs, just as penny stocks, are a two-sided market. Speculators don’t care about the technology, they just want tickers to bet on and get rich. Fraudsters don’t care either, they just want tickers to manipulate and get rich. Everyone gets what they need, the market bustles with activity, and everyone makes money — until they don’t.
If you think about it, global, uncensorable blockchains are basically the optimal platform for another penny stock boom. It’s no wonder that bad actors and speculators were quick to converge on it. But despite the technological accoutrements, it’s an old story.
So what do penny stocks explain about the ICO bubble? Penny stocks explain the shitcoins, the scams, the unregistered securities, the market manipulation, and most of the long tail of the ICO bubble. Again, by volume, this is most of what was happening in the bubble.
But I want to be careful here. Penny stock fraudsters were quick to co-opt ICOs, but the concept of an ICO didn’t begin that way. ICOs arose out of blockchain community crowdfunding, first pioneered by Mastercoin in 2013, with Augur being the first ICO on Ethereum in 2015. Most of their investors were nerdy cypherpunks, excited to support some new technology they could play with.
We should not conflate cryptocurrencies, the underlying technology, with the ICO bubble, which was a speculative phenomenon that converged atop it. The ICO bubble was something that happened to crypto, not something intrinsic to it.
Most of the technologists and cypherpunks who built this stuff were simply motivated by building a new financial system. And they’re still whittling away, even after the thundering herd of speculators and fraudsters have come and gone.
So we’ve looked at P2P file sharing and penny stocks in comparison to the ICO bubble. But there’s still one aspect of the bubble that I’ve failed to address so far — technological innovation.
After all, I don’t believe for a second that the crypto boom was principally about defrauding people, or that its underlying technology was irrelevant. The very opposite — the ICO bubble occurred atop the kindling of real technological innovation. To fully understand this, we have to turn to the last historical precedent: the dotcom bubble and its soothsayer, Carlota Perez.
III. Bubbles and technological revolutions
All bubbles are about greed… but some bubbles are also about the installation of technological revolutions. — Carlota Perez
The World Wide Web — the Internet as most people know it — was created by Tim Berners-Lee in 1989. Its invention was the spark that set off the Information Age, and alongside it, the greatest stock market bubble of this generation.
Compared to the technologies that came before it, the Web evolved rapidly. The Internet only had 2% penetration in the US when the Mosaic browser launched in 1993. Six years later, at the height of the bubble, a full 36% of the US was online. (Telephones took more than 30 years to reach the same level of penetration.)
The rapid rise of the Web, combined with low interest rates and the Clinton tax cuts of 1997, led to an incredible bullishness around the growth potential of the Internet. Venture capital became cheap and opportunistic. Entrepreneurs flocked to Silicon Valley.
Netscape, the company that built the Web’s most dominant web browser, kicked off the age. Netscape IPOed for $2.9B in 1995. It was somewhat unusual for an unprofitable company to IPO so successfully, but Netscape’s revenues were growing so rapidly that this would soon be forgotten.
The Netscape IPO would be quickly followed in 1996 by Yahoo!, Excite, and Lycos, all fantastically successful IPOs by companies that were also growing rapidly. And though, like Netscape, they were burning through cash, it didn’t seem to matter. Internet companies had become anointed.
Any company, so long as it had a “.com” in the name, attracted huge valuations. Investors pulled money out of slower-growth companies to plow more capital into dotcoms. Retail investors, having recently received tax rebates, piled in. The Internet itself became the interface for many investors, through platforms like E-Trade (which also IPOed in 1996). Many publications reported stories of white collar professionals quitting their jobs to daytrade tech stocks full-time.
In just five years, the NASDAQ had risen more than 400%. This fomented an all-out frenzy. In 1999 alone, Qualcomm 26X’d its stock price. Analysts stopped emphasizing P/E ratios and began citing Metcalfe’s Law. A WSJ article from 1999 posed the question: are profits just a “quaint concept” that doesn’t matter anymore? The Super Bowl in January of 2000 featured no fewer than 16 dotcom commercials.
Companies like Pets.com were going from incorporation to IPO in a single year. Almost every single IPO popped, with an average of 68% first-day gains. Investing in tech IPOs was widely agreed to be a foolproof way to multiply your money. A phenomenon of dotcom parties spread across the valley, and those close to founders often received “friends and family” shares as tokens of generosity.
It was a time of excess. The trend was baffling to Wall Street, to the East Coast elites, to the old money. Storied hedge funds like Tiger Management went under, unable to keep up with the shifting market structure. But the techies — they knew it all along, they told themselves.
The Internet would change everything.
On March 10 of 2000, the NASDAQ would hit its peak. The first tremor of weakness was on April 14, likely triggered by a tax selloff. By the end of that week, the NASDAQ tumbled a staggering 25%.
Soon, dotcoms realized that their burn rates were unsustainable. The Fed announced plans to aggressively raise interest rates, and the economy would see six such tightenings over the next several months. Capital wavered.
By May 18th, Boo.com went bust. In November, Pets.com followed. A few months later, Webvan shuttered operations. The show came crashing down faster than it had started, and funding had all but vanished. By the end of 2001, after the September 11 attacks, most publicly traded dotcoms went bankrupt. Trillions of dollars of investment capital had evaporated.
The ensuing recession would last several years. It wouldn’t be until 2004 when the first major post-crash dotcom company, Google, would IPO again.
Its first day pop was 18%.
With hindsight we can say that investing in the Internet was clearly right. It’s obvious how dramatically the Internet has changed the world. And yet the dotcom bubble seemed to have veered off somewhere terribly wrong.
What happened? Why couldn’t people at the time see it? And what can it teach us about the ICO bubble?
Technological Revolutions and Financial Capital
To understand the dotcom bubble, we have to start with Carlota Perez.
Carlota Perez is the patron saint of venture capitalists. Her seminal work, Technological Revolutions and Financial Capital has been been cited by Marc Andreessen and Fred Wilson as pivotal to their understanding of the tech industry.
I won’t do her book justice here, but I’ll attempt to summarize the key ideas that are relevant to both the dotcom crash and the ICO bubble. I’ll be quoting heavily from Carlota herself.
Innovation moves in cycles
Carlota Perez subscribes to the long wave economic cycle theory, known as Kondratiev Waves, in which technological innovation occurs in 45–60 year waves.
According to Perez, these innovation waves consist of three phases:
Installation is the period when a new technology is first explored, installed, and then speculated on. This speculation leads to an unsustainable asset bubble and a spectacular collapse. Then a more sober period of deploymenttakes place, during which the technology matures and sustainably alters many aspects of society. After a full deployment cycle has exhausted its economic growth, a new technology initiates a new innovation wave, and the cycle begins anew.
There have been five technological revolutions in 240 years… Each of these revolutions drives a great surge of development and shapes growth for half a century or more.
The same general shape can be observed in each cycle.
Each of these revolutions was kicked off by a seminal project that would catalyze the technology — the industrial revolution with Arkwright’s Cromford mill, the steam and railway age with the Liverpool-Manchester Railway, the steel and heavy engineering age with Carnegie’s Bessemer steel plant, the age of automobiles with Ford’s assembly line, and the computer age with Intel’s 4004 microprocessor.
By this model, the moment that kicked off ICO bubble must be the launch of Ethereum. Ethereum was not the first cryptocurrency, but it was the first ICO to produce astronomical returns, and it would set the foundation for the frenzy that was to come. Ethereum launched in 2015, exactly 44 years after the Intel 4004 in 1971.
Why call them revolutions, though? Because they go far beyond the powerful set of new industries; they also transform the whole economy by providing a new techno-economic paradigm.
What does she mean by “techno-economic paradigm”?
Simply put, a techno-economic paradigm is a new accepted way of doing things. When a new technology takes form and begins driving innovation, it brings with it a new logic of how businesses should be structured. For example, with the advent of the automobile, the paradigm encouraged businesses to adopt mass production, economies of scale, and standardized products for mass marketing appeal — the logic of the factory. In this paradigm, every American should own not just a mass-produced automobile, but also a TV, a refrigerator, a washing machine, etc.
The emerging heuristic routines and approaches are gradually internalized by engineers and managers, investors and bankers, sales and advertising people, entrepreneurs and consumers. In time, a shared logic is established; a new “common sense” is accepted for investment decisions as well as for consumer choice. The old ideas are unlearned and the new ones become “normal.”
As a techno-economic paradigm becomes ascendant, any entrepreneur who does not subscribe to the new paradigm will be seen as low-status and behind the times.
We know the techno-economic paradigm of the late internet revolution: move fast and break things, launch MVPs, iterate in short cycles, pursue business models with zero marginal cost. Basically all the mantras consumed today by aspiring tech founders.
Cast in these terms, the techno-economic paradigm of crypto is almost embarrassing to say out loud. Here’s what we taught founders about how to build a business in the crypto age:
- Invent a token and hypothesize an economy that will use it
- Spin a story how it will eventually become decentralized
- Write an academic-looking white paper with some math in it
- Create a Swiss foundation
- Open source your code
- Recruit advisors and put them on your website
- Do a public ICO
Basically, cargo culting the Ethereum ICO—the same way dotcoms cargo culted Netscape.
In the ICO bubble, founders who deviated from this paradigm were seen as low-status, opportunistic, “not getting crypto,” and were thus less rewarded in their fundraises. Yet in hindsight, almost none of this was predictive of an entrepreneur’s long-term success.
Keep the dotcom crash in the back of your mind. For now, I’m going to focus entirely on how Perez’s account of financial bubbles comports with the ICO bubble.
Frenzy is the tumultuous period when financial capital takes off on its own… All those benefitting from this flourishing of opportunities believe the world is going through a marvelous time.
In the frenzy, new millionaires are minted. They try to multiply their wealth in the same way they made it, redeploying their capital to generate more profits. The gap between paper values and real values widens, and the newly rich come to believe that their newfound wealth is due to superior insight and intuition.
Financial capital… breaks loose, backs the new entrepreneurs, dismantles as much as it can of the institutional framework, overinvests in the new infrastructure, and also uses the new technologies to innovate in instruments for financial speculation.
Dismantles institutional frameworks, check. Overinvests in new infrastructure, check. Invents new instruments for financial speculation (ICOs, SAFTs, SAFTEs), check.
As the various assets go up in price, confidence grows that they will continue to do so… Since the profits to be had are amazing, everybody — including widows and orphans — eventually become aware of the incredible possibilities. They gradually dare to enter what used to be alien territory, trying to get a piece of the action.
Do you remember being told by Uber drivers to invest in BAT or IOTA? Do you remember the New York Times article, Everyone is Getting Hilariously Rich and You’re Not?
Perez recalls a quote by Bruce Nussbaum about the dotcom bubble:
“So investors accepted sky-high P/Es, puffed-up bottom lines, and some strange business plans — because who really knew what was possible? It was a time of opportunity, a time to place bets. And they paid off…”
Do you remember the nonsensical projects? The teams that no one had heard of? The copy and pasted white papers? All the shameless rhetoric about 10 trillion dollar TAMs and 100K transactions per second?
The financiers (and the investors who trust their money to them) seem to be convinced that they have discovered the most profitable vein. They then indulge in the intense repetition of the same successful recipe, be it from canals from any river to any river, as in the first revolution, or more dot coms and telecommunications.
Do you remember all the ICO investing syndicates? The Telegram groups? The newsletters?
During Canal Mania in the 1790s, canals were created from river to river with no regard for routing, believing that canals magically produced demand. In the 1840s, railway projects were built from town to town without regard for engineering practicality. In the 1920s, real estate values became untethered from the constraints of urban planning, believing that the automobile meant any territory could be valuable if connected by roads. And of course, in the late 90s, dotcoms were funded with no evidence of product-market fit.
None of this is new.
The whole frenzy phenomenon is, at bottom, a huge process of income redistribution in favor of those directly or indirectly involved in the casino, which funds the massive process of creative destruction in the economy. That regressive distribution generates a double vicious cycle: one is economic, expressed in the market; the other is social, expressed in political terms. Both get worse as the bubble increases.
The ICO bubble was simply a variation on the theme. The players and the tactics were different, but the human stories were the same. Like in the dot com bubble there were, as always, stories of overnight millionaires, flagrant scams, manifestos declaring a new technological order, levered debts and second mortgages that ended in catastrophe — all the usual roil and ruin of speculative manias.
All this is to say, we’ve seen this before.
Crypto’s gilded age is probably now over. Most of those newly minted millionaires have unwittingly surrendered their riches. The hype has died away, ICO funding has dried up, SEC enforcement actions are trickling in, and the media’s crush on crypto has passed.
But, Perez reminds us, the frenzy phase and subsequent crash is not merely painful — it is necessary to any technological revolution. The financial casino attracts the funds necessary to install basic infrastructure and facilitate social learning.
Without the dotcom bubble, there would not have been all of the investments into optical fiber buildouts, ISPs and internet infrastructure, packet-switched networks for telecoms, and all the competition overs consumers that would ultimately galvanize internet adoption. We needed that social and technological foundation in place for the Internet to flourish.
Perez’s book was written in 2003 during the nadir of the dotcom crash, and she presciently situates the Internet within her K-wave framework. History has proven her right.
But we should be careful not to invert her thesis: she claims that all technological revolutions induce a bubble, but that does not mean all bubbles are induced by technological revolutions. Indeed, most aren’t. It remains to be seen which camp crypto falls into.
So what does Perez explain about the ICO bubble?
She explains the logic of frenzy, the stoking of financial capital, and the rhetoric of paradigm shifts (“all companies will become decentralized”). She explains the influx of retail investors (“widows and orphans”), traditional financial capital piling in at dizzying prices (Telegram, Filecoin, Hashgraph), and the flood of traditional entrepreneurs contorting themselves to follow the new paradigm.
A bubble from up close
In the end, we should count ourselves lucky that the ICO bubble was not as destructive as the dotcom bubble. About $15B were raised by ICOs in 2017–2018, but that’s a drop in the bucket compared to all of venture capital, which deployed around $500B during the same period. And the ICO crash was not nearly as destabilizing as the dotcom bubble. When all is said and done, the dotcom bubble wiped out about $5 trillion of value and was much more concentrated in the United States. Losses in the ICO bubble were ~15% of that, absorbed across many more economies, and during a time of relative economic prosperity. (Also, we should be cautious when conflating the “market cap” of crypto with the NASDAQ.)
The ICO bubble had no single cause. Mono-causal explanations always fall short in explaining complex phenomena. But its effects are easier to pinpoint.
There are now many world class teams well-capitalized to build, scale, and evolve blockchain technology, and tens of millions of people in the world who now understand decentralization, proof of work, and private keys. Looking back, it’s really quite amazing! It comes at a high cost, but Perez hints: it’s likely that bubbles like these are the only way to overcome technological inertia.
At the same time, most people had their first interaction with crypto during its orgiastic adolescence. It’s not a great look. But this has been true for every technological revolution of the last 250 years. In that regard, crypto is in good company.
I was too young to appreciate the dot com bubble when it happened. It’s strange to say, but I’m glad to have witnessed a speculative bubble from up close. I’ve now got war stories to share with future generations. It was a wild time, when anyone in the world could launch a coin and raise tens of millions of dollars to build a network that no one could control. I don’t think we’ll see anything like that again for a long time.
So what happens now?
If you believe that crypto has the stuff of a technological revolution, then as Perez puts it, the collapse will pave the way for a more fruitful deployment phase. At the end of the day, I’m an optimist about technology. So it won’t surprise you that I think this deployment phase is coming. But it will be slow, unglamorous, and probably won’t make for nearly as entertaining of headlines.
GitLab is an open source platform for software development.
GitLab started with the ability to manage git repositories and now has functionality for collaboration, issue tracking, continuous integration, logging, and tracing. GitLab’s core business is selling to enterprises who want a self-hosted git installation, such as banks or other companies who prefer not to use a git service in the cloud.
The vision for GitLab is to provide a platform for managing the full software development lifecycle, from code hosting to deployment–as well as tools for observability and project management.
Sid Sijbrandij is the CEO of GitLab and he joins the show to talk about the product, the business, and the company’s vision for the future. GitLab’s strategy is to offer a set of tools that work for developers out of the box, cutting down on time spent integrating each individual vendor.
An operating system kernel manages the system resources that are needed to run applications. The Linux kernel runs most of the smart devices that we interact with, and is the largest open source project in history.
Shuah Khan has worked on operating systems for two decades, including 13 years at HP and 5 years at Samsung. She has worked on proprietary operating systems and a variety of Linux operating system environments, including mobile devices. Shuah joins the show to discuss her work within Linux and her experience contributing to open source.
Shuah has made significant contributions to kselftest, a set of tests for Linux. Testing the Linux kernel is complicated. Because there is so much depth to the codebase, and such a variety of ways that Linux can be used, there is also a variety of ways that the operating system gets tested. There is smoke testing, performance testing, and regression testing. There are trees of tests, and as a developer you may only want to run a subset of the tests in that tree.
The conversation with Shuah ranged from the low level practices of testing the kernel to a high level discussion of how the Linux kernel can reveal dynamics of human nature.
Malware is malicious software that makes money for the creator of that software. Malware can appear onto a user’s computer if that user visits a malicious website or installs malicious software by accident.
There are many types of malware. Spyware sits on your machine and logs your data in order to sell it. Ransomware can lock your computer and demand that you pay money to unlock it. Adware serves you popup ads that you don’t want to see.
Cryptojacking can occur anywhere that code runs–and there is a lot of code running on cloud providers.
Cloud providers themselves are very secure. But a cloud provider cannot force its customers to be secure. Users who host an insecure application on a cloud provider may get infected with a cryptojacker. If I host a large, complex website on a cloud provider, and I’m serving millions of users, I’m already paying a lot in cloud costs. But when my application gets infected with a cryptojacker, my costs could shoot up. And if I don’t know why my costs are increasing, I might leave the cloud provider.
Estaban Vargas is the co-founder of SafeTalpa, a company that provides defense against cryptojackers. Estaban joins the show to explain how cryptojackers work and why cloud providers have trouble defending against them.
The post Cryptojacking: Bitcoin Malware with Estaban Vargas appeared first on Software Engineering Daily.
Advertising fraud occurs when a brand pays for an advertisement online and that advertisement is shown to an automated bot account that has been created to view ads. Advertising fraud is rampant on the Internet. It’s not possible to know how much money is lost to ad fraud, but the costs are in the billions of dollars.
Praneet Sharma and Shailin Dhar are the founders of Method Media Intelligence, a company that builds solutions around improving advertising quality. In previous shows, Praneet and Shailin have described the online advertising ecosystem in detail. They have told stories of bot farms, replay attacks, and adtech companies.
In today’s episode, Praneet and Shailin return to the show to discuss how advertising fraud is getting worse–not better. Praneet and Shailin worked with BuzzFeed reporter Craig Silverman, who was a previous guest on the show to talk about his remarkable findings about mobile advertising fraud, which accounts for hundreds of millions of dollars in theft every year.
The post Ad Fraud Engineering with Praneet Sharma and Shailin Dhar appeared first on Software Engineering Daily.
The demand for electricity is based on the consumption of the electrical grid at a given time. The supply of electricity is based on how much energy is being produced or stored on the grid at a given time. Because these sources of supply and demand fluctuate rapidly but predictably, energy markets present profit opportunities for traders.
Minh Dang and Corey Noone are engineers with Advanced Microgrid Solutions, a company that builds software to help traders capture better opportunities in the energy markets. Minh and Corey join the show to talk about how their company builds and deploys machine learning models for market prediction.
We discussed data infrastructure, machine learning model deployments, and the dynamics of the energy markets.
The post Energy Market Machine Learning with Minh Dang and Corey Noone appeared first on Software Engineering Daily.
Cloud computing started to become popular in 2006 with the release of Amazon EC2, a system for deploying applications to virtual machines sitting on remote data center infrastructure . With cloud computing, application developers no longer needed to purchase expensive server hardware. Creating an application for the Internet became easier, cheaper, and simpler.
As the cloud has become popular, new ways of deploying applications have emerged. A developer with a web app has so many different options.
You can host your app on an Amazon EC2 server, which will require you to manage cloud infrastructure in case your server crashes. You can deploy your app to Heroku, which gives your cloud deployment better uptime guarantees for a higher price than Amazon EC2. Or you can use Linode, or Microsoft Azure, or Google Cloud.
There is such a large market for cloud computing that the world of cloud providers serves more niches every year. In past episodes we have explored a variety of different cloud providers, and the markets they target.
Pivotal Cloud Foundry is for managing complex distributed systems applications, typically with large teams. Firebase is a cloud provider that simplifies the developer experience for applications with small teams. Spotinst is a cloud provider that emphasizes low cost. Zeit is a cloud provider that is built to manage applications through serverless “functions-as-a-service” like AWS Lambda.
In today’s episode, Mathias Biilman Christensen, CEO of Netlify, joins the show. Netlify is a cloud provider that was built for modern web projects. Netlify represents the convergence of several trends in software development converging: static site deployment, serverless functions, a desire to have a “no-ops” deployment with minimal management, and the rise of newer tools like GraphQL and Gatsby.
One announcement before we begin: we are having a $5000 hackathon. The $5000 hackathon is for a new product we’ve been working on: FindCollabs. FindCollabs is a platform for finding collaborators and building projects. Whether you are an engineer, a musician, a designer, a videographer, or an artist, FindCollabs lets you find people and collaborate. To try out FindCollabs, just go to FindCollabs.com, you can make a project or you can join someone else’s project. And it’s very easy to make these projects–you don’t need to have anything built yet–you need to have a vision for what you want to build. And to find out about the hackathon, go to findcollabs.com/hackathon. We are giving away $5000 in cash to the coolest projects that get built before Sunday April 14th. So I recommend getting started early, finding some people to collaborate with, and building some cool stuff!
Monitoring tools are used by every area of an organization.
Business development teams use monitoring to understand the metrics for product performance. Finance teams need to understand how the costs of cloud computing resources are changing. Site reliability engineers use monitoring dashboards that applications are up and running without problematic latency. Product managers evaluate the results of AB tests based off of the monitoring data of how users are reacting to new features.
A monitoring system needs to be able to handle large volumes of data that are being generated at a high velocity. The data needs to be queryable in an aggregated format, which might require an ETL system for getting data into columnar format.
Alexander Pucher is an engineer at LinkedIn, where he works on a monitoring platform called ThirdEye. ThirdEye is built on top of Apache Pinot, a distributed columnar storage engine that ingests data and serves analytical queries at low latency. Pinot is comparable to Apache Druid.
Alexander joins the show to discuss ThirdEye, and explain why Pinot is a useful building block for monitoring infrastructure.
The post LinkedIn Monitoring Infrastructure with Alexander Pucher appeared first on Software Engineering Daily.
Language interoperability is only one part of why WebAssembly is exciting. The execution environment for WebAssembly modules has benefits for security and software distribution and consumption as well.
In previous shows, we’ve given an overview of WebAssembly and explored its future applications as well as its relationship to the Rust programming language. In today’s episode, we explore the packaging and execution path of a WebAssembly module, and some other applications of the technology.
Syrus Akbary is the CEO and founder of Wasmer, a company focused on creating universal binaries powered by WebAssembly. Wasmer provides a way to execute WebAssembly files universally. He joins the show to talk about the state of WebAssembly, and what his company is building.
Cryptocurrencies enable a large number of applications. Trustless reputation systems, decentralized identity tools, micropayments, non-fungible Internet items, borderless currencies, just to name a few. But cryptocurrencies have not yet impacted daily life, for most of us. Why is that?
One reason is that it is still very hard for developers to build within the cryptocurrency ecosystem. The programming languages, such as Solidity, are not widely used by software engineers. Building and deploying smart contracts is not as easy as deploying a simple Ruby on Rails webapp. The open source tooling is immature, as are the paid developer tools.
Sean Li is the CEO of Fortmatic, a company that is building tools to improve the Ethereum developer experience. Fortmatic simplifies wallet creation, user identity management, security, and money transfer for Ethereum developers.
Before starting Fortmatic, Sean was the founder of Kitematic, a company that made the developer experience of Docker easier. Kitematic was acquired by Docker. Sean is one of the few people with significant experience in both the enterprise container ecosystem and the cryptocurrency ecosystem.
Sean joins the show to discuss his time in the Docker ecosystem, his new company Fortmatic, and his perspective on how to build tools for developers. Someday there will be hundreds of thousands of developers building applications around cryptocurrencies, just like people use cloud computing today. The road to getting there is unclear, and Sean provides useful insights and predictions for the future.
Computational integrity is a property that is required for financial transactions on the Internet. Computational integrity means that the output of a certain computation is correct.
If I deposit money into my bank, my bank sends me a number that represents the new balance in my account. I assume that the number they have sent me is correct. The bank could be lying to me–maybe this bank is not actually trustworthy. But I use a bank with a good reputation. If the bank stole money from its users, it would quickly go out of business. Therefore, I feel safe by trusting a bank with my money, because the bank needs to maintain its reputation.
The problem with reputation-based systems is that they are opaque. It’s not easy for us to audit the bank and prove the bank actually has the money that it claims to have. Most of the time, the reputation-based systems work fine. But occasionally, we have catastrophic events–think of the 2008 financial crisis, or the Bernie Madoff financial scandal.
These circumstances would have been avoided if the financial institutions could have been continuously audited for their solvency.
With blockchains and cryptocurrencies, we now have tools that allow us to maintain computational integrity without the opaque systems of reputation. We no longer have to trust a central authority–we can verify computational integrity with math.
Eli Ben-Sasson is a co-founder and chief scientist at StarkWare Industries, a company that is bringing zero-trust technology to market. Implementations of zero-trust technology include zk-STARKs, zk-SNARKs, and bulletproofs. StarkWare is focused on the application of zk-STARKs, which can be used to improve scalability and privacy.
Eli joins the show to discuss the topic of computational integrity, and how STARKs can be used to provide scalable, secure infrastructure to blockchain applications.
The post StarkWare: Transparent Computational Integrity with Eli Ben Sasson appeared first on Software Engineering Daily.
Collaboration on the Internet creates innovation. New inventions, new art, and new products–built by people working together on the Internet.
FindCollabs is a product we have been working on to enable people to find and collaborate with each other. If you want to try it out, you can go to FindCollabs.com.
FindCollabs is for finding people to create your projects with, and getting those projects built. Whether you are a programmer, a writer, a musician, a game designer, an actor, a videographer, or a project manager–you can find people to collaborate with.
Everyone has ideas for new projects. Whether your project is a business, an open source tool, a cryptocurrency, or a video, you can use FindCollabs to define your projects.
Click “New Project”, and FindCollabs walks you through the project creation.
You can always change the project title and description later on.
Once you have made your project, you can define roles for that project. Say you need a script writer and three videographers. Just define each role and a short description for those roles.
Click “Done”, and now you are in the project page. You can use the chat interface to post more details about your project.
FindCollabs lets you find and invite people to your projects, so that you can put together a team to build your project.
FindCollabs also has a reputation system.
When you work on a project, your collaborators rate you. As you make contributions to projects, you show the FindCollabs community that you are reliable and productive–and other people will want to work with you because of that.
The only way to build huge, ambitious projects as a team is for people to trust each other. If you are unreliable, people will not want to work with you.
When you join a project on FindCollabs, you are committing to doing work that will add value to that project.
If you like to build projects and be creative, you might like FindCollabs. To get started, you can go to FindCollabs.com, log in, and post a project. Or join someone else’s project.
If you are ever confused about anything, you can always send me an email.
We are sponsoring a series of hackathons on FindCollabs.
These hackathons are for anyone with a creative project–whether you want to make a music video, a virtual reality game, an acoustic guitar song, a cryptocurrency whitepaper, a mobile app, a commercial–anything creative.
Our first hackathon starts today, March 3rd 2019, and ends at 11:59 PM PST on Sunday April 14th. On April 15th we will announce the winners of the first hackathon, and send them emails. We will also announce the details of the second hackathon.
In the first hackathon, the prizes are not very big. But they will get bigger over time. If you like the idea of FindCollabs, it might benefit you to get involved now, so that you can build your reputation and find better people to work with in the future.
1st place: $4000 divided evenly among the winning team; SE Daily hoodies for each member of the team; appearance on the SE Daily podcast
2nd place: $1000 divided evenly among the winning team; SE Daily hoodies for each member of the team; appearance on the SE Daily podcast
3rd place: SE Daily hoodies for each member of the team; appearance on the SE Daily podcast
Most valuable feedback on the product: SE Daily Towel
Most helpful community member award: SE Daily Old School Bucket Hat
The FindCollabs hackathons will be judged by a panel of investors, entrepreneurs, podcasters, artists, and technologists. We will announce the judges of the first hackathon in the next few days.
These judges will be voting based on which projects they like the most.
Every project on the FindCollabs site before 11:59 PM PST on Sunday April 14th will be entered to win the contest.
To find our detailed terms and conditions, go to findcollabs.com/terms.
Thanks for taking the time to read through this post. If you get a chance, check out FindCollabs and feel free to send me feedback. I’d love to know what you think, and any suggestions you have.
The Internet has transformed humanity.
The Internet is the result of a long series of innovations from military, academia, business, and the open source community. In his book, How The Internet Happened: From Netscape to the iPhone, Brian McCullough tells the story of the last 25 years of Internet development through the lens of companies like ebay, Amazon, Google, and Apple.
Whereas other books have focused on the trajectory of these individual companies, Brian explains how innovations in one company often lead to success in another. Without the lessons from Napster, we might not have Spotify. Without the trust model pioneered by ebay, we would not have marketplaces like Airbnb.
Brian is also the host of The Internet History Podcast and the Techmeme Ride Home podcast. In The Internet History Podcast, Brian interviews entrepreneurs and engineers who were firsthand witnesses to the developments that led to our modern Internet, including early employees at Amazon, Tesla, and TheGlobe.com. In his other podcast, the Techmeme Ride Home, Brian gives a daily overview of the day’s Internet news.
Through his podcasts about the Internet’s past and present, Brian has also accumulated an intuition about the future. He joins the show to discuss his book, the art of podcasting, and the historical lessons of technology.
The post Internet History (and Future) with Brian McCullough appeared first on Software Engineering Daily.
Advertising fraud steals billions of dollars every year.
BuzzFeed reporter Craig Silverman reports on advertising fraud and its impact on the Internet. In one investigation, Craig uncovered a mobile advertising fraud scheme in which four people stole millions of dollars (perhaps as much as $75 million or even $750 million) by serving advertisements to automated users on mobile apps.
The scheme worked as follows:
- A shell company called “We Purchase Apps” would buy legitimate apps from app developers
- The new owners of the legitimate app would record the behavior of the users on those apps
- The recorded behavior was used to train models of fake users who could replicate that behavior
- The fake user models were spun up to use the apps, where they would view ads that would automatically be served to them
- The owners of the apps would earn the money generated by displaying ads in these apps
This scheme was easy to pull off. It did not require much sophistication in terms of engineering or business skills. If a group of four people can generate tens of millions of dollars, how much ill-gotten capital is being generated by large corporations that are deeply involved in the advertising market?
Craig’s article went viral, and he has followed it up with other pieces about ad networks, fraud investigations by Google, and the potential for mobile apps to be used for large scale surveillance of Americans by the Chinese.
Craig is the most dedicated reporter covering advertising fraud today. His work is invaluable because he is asking difficult questions about the economics of our Internet. As we discuss in the episode, there is currently no effective automated means of detecting a bot from a human on the internet.
Ad fraud is not the fault of any one party. It is an emergent result of the way that our Internet is set up. It is as hard to imagine a world without advertising fraud as it is to imagine a world without email spam.
Podcast listeners usually find out about a new podcast in one of two ways: either a friend recommends that podcast or the Apple podcast charts rank that new podcast highly.
The Apple podcast charts are created using an algorithm that is not public. Many people believe that the chart ranking of a podcast is based on the number of podcast subscribers, the number of podcast downloads, and the reviews that are written about the podcasts on iTunes.
Jack Rhysider is the host of Darknet Diaries, a podcast about the dark and strange elements of the Internet. Darknet Diaries is told in a high quality, narrative audio format. Jack is a security engineer with a deep understanding of technology, and has been blogging for a long time.
As Jack has built a following with his podcast, he has spent more time looking at the iTunes podcast charts. He has seen the rank of Darknet Diaries increase–but he has seen the rank of other podcasts increase much faster. Some of these podcasts have low quality content. The audio quality is poor, the host is unprepared–these are the kinds of podcasts you would listen to once, and never subscribe to.
And yet, numerous podcasts with low quality were somehow able to game the rankings and make it to the top of the charts.
In episode 27 of Darknet Diaries, Jack investigated the phenomenon of fraudulent podcast chart manipulation. It was one of my favorite podcast episodes ever (and this is coming from someone who has listened to a lot of podcasts). The investigation went to several unexpected places, but Jack did solve the riddle of how low quality podcasts climb the iTunes charts.
Jack joins the show to talk about fraudulent–and the broader implications of the fake Internet. Today’s episode is a simple example of how easily Internet platforms can be gamed–for a deeper dive into the fake Internet, listen to our past episodes on advertising fraud, or tomorrow’s episode with ad fraud investigative journalist Craig Silverman, which I am very excited about.
Many devices in our world are not “smart.” Air conditioners, electric guitars, power outlets, and factory conveyor belts, just to name a few. There are exciting software applications that we could build around these devices, but we need to be able to interface with them programmatically.
We need to be able to know the state of these devices. We need to be able to save that state, and then we can use that state data to perform actions that change the state of those devices. To make these devices smart, we can use a microcontroller, a small device with a constrained amount of CPU, memory, and I/O.
Device data can be sent to the cloud or processed locally, and that data can be used perform predictive maintenance, or create machine learning models, or create simple dashboards so human operators can understand the state of their hardware.
Dirk Didascalou is the VP of Internet of Things with Amazon Web Services. Dirk joins today’s show to discuss the strategy and philosophy of the AWS Internet of Things set of tools. We talk about a wide-ranging set of topics–including IoT security, edge deployments, and machine learning.
Edge computing refers to computation involving drones, connected cars, smart factories, or IoT sensors. Any software deployment that is not a large centralized server installation could qualify as an edge device–even a smartphone.
Today, much of our heavy computation takes place in the cloud–a set of remote data centers some distance away from our client devices. For many use cases, this works fine. But there are a growing number of use cases with lower latency and higher bandwidth requirements at the edge.
A simple example is video. Let’s say you want to record a video stream, and detect people in that video stream in real time. Based on who those people in the video stream are, you want to do different things–maybe you want to send them a text message, or report to the police that a dangerous person has entered the premises. This video stream could be captured by a drone, or by a smart car, or by a video camera mounted somewhere.
Where is the video stream getting stored? Where is the machine learning model running? How do you deploy new machine learning models to the operating system with the machine learning model? This is a simple example, and there are many open questions as to how to best solve such a problem.
With the increased resource constraints at the edge, there is a need for new hardware and software to power these edge applications. This led to the creation of LF Edge, a new open source group under the Linux Foundation. The goal of LF Edge is to build an open source framework for the edge.
Arpit Joshipura is the general manager of networking, orchestration, edge computing, and IoT with the Linux Foundation. He joins the show to describe the state of edge computation, and the mission of LF Edge.
This episode was exciting for several reasons. After seeing the rise of Kubernetes for container orchestration, we know that a popular open source technology that solves a widespread problem can have dramatic influence on the software world. And when multiple large companies get involved in that open source project, it can gain traction quite quickly.
Edge computing has a large set of unanswered questions, but telecom providers like AT&T and large infrastructure companies like Dell EMC are getting heavily involved with the Linux Foundation Edge group. This represents a significant expansion of the open source model, and a suggestion of further investment into open source projects in the near future.
Chris Severns and Lee Johnson work at G2i, a group of React and React Native specialists. Chris and Lee join the show to discuss the rearchitecture, including the engineering history of React, the technical debt within the React project, and the vision that the React team has for the future. We also discuss Google’s Flutter project, a cross-platform native framework with a different architectural model than React Native.
In the early days of YouTube, there were scalability problems with the MySQL database that hosted the data model for all of YouTube’s videos. The state of the art solution to scaling MySQL at the time was known as “application-level sharding.”
To scale a database using application-level sharding, you break up the database into shards–disjoint regions of data. When you want to query the database, you need know which shard to query. In your application code, you have to issue the query to a specific shard.
The solution of application-level sharding does scale your database. But the downside is that every application that interfaces with the database now has to include code that is aware of the sharding schema.
If you are an application engineer, you don’t want to have to worry about the way that the database is sharded, because it adds significant complexity to your code. The engineers at YouTube decided to fix this problem with a project called Vitess. Vitess abstracts away the details of sharding by orchestrating reads and writes across the distributed database.
In a previous episode, we covered the architecture, read and write path, and the story of Vitess in detail. In today’s episode, Jiten Vaidya and Dan Kozlowski of PlanetScale Data join the show to give their perspective on MySQL scalability, and their work taking Vitess to market as a solution to scaling relational databases.
The post PlanetScale: Sharded Database Management with Jiten Vaidya and Dan Kozlowski appeared first on Software Engineering Daily.
Zoox is a full-stack self-driving car company. Zoox engineers work on everything a self-driving car company needs, from the physical car itself to the algorithms running on the car to the ride hailing system which the company plans to use to drive around riders. Since starting in 2014, Zoox has grown to over 500 employees.
Ethan Dreyfuss is a software infrastructure engineer at Zoox. He joins the show to discuss scaling an engineering team for self-driving. Machine learning was a big part of our conversation, because there are so many different approaches that an engineering team can take when it comes to machine learning for cars.
Can you take computer vision algorithms from academic papers and apply them to cars? Can you use the computer vision APIs from the cloud providers for anything useful? What about physical world mapping companies like Mapillary? How do you do data labeling, and data management? And how do you manage the interactions across the stack, from mechanical engineering to user interface design?
We touched on some of these areas, but barely scratched the surface of the self-driving car domain.
DoorDash is a food delivery company where users find restaurants to order from. When a user opens the DoorDash app, the user can search for types of food or specific restaurants from the search bar or they can scroll through the feed section and look at recommendations that the app gives them within their local geographic area.
Recommendations is a classic computer science problem. Much like sorting, or mapping, or scheduling, we will probably never “solve” recommendations. We will adapt our recommendation systems based off of discoveries in computer science and software engineering.
One pattern that has been utilized recently by software engineers in many different areas is the “word2vec”-style strategy of embedding entities in a vector space and then finding relationships between them. If you have never heard of the word2vec algorithm, you can listen to the episode we did with computer scientist and venture capitalist Adrian Colyer or listen to this episode in which we will describe the algorithm with a few brief examples.
Store2vec is a strategy used by DoorDash to model restaurants in vector space and find relationships between them in order to generate recommendations. Mitchell Koch is a senior data scientist with DoorDash, and he joins the show to discuss the application of store2vec, and the more general strategy of word2vec-like systems. This episode is also a great companion to our episode about data infrastructure at DoorDash.
- Medium – Personalized Store Feed with Vector Embeddings
- Medium – DoorDash
- Skymind AI – A Beginner’s Guide to Word2Vec
The post Store2Vec: DoorDash Recommendations with Mitchell Koch appeared first on Software Engineering Daily.
The nature of software projects is changing. Projects are using a wider variety of cloud providers and SaaS tools. Projects are being broken up into more git repositories, and the code in those repositories are being deployed into small microservices.
With the increased number of tools, repositories, and deployment targets, it can become difficult to manage software policy. “Policy” defines how different parts of an application can behave. Which parts of your application can access an Amazon S3 bucket? Which parts of your application can communicate with the authentication microservice? Which developers are allowed to push a new build to production?
Shimon Tolts is the CTO and co-founder of Datree, a platform for policy enforcement and code compliance. He joins the show to talk about continuous delivery, configuration management, and policy enforcement. He also explains the motivation for his company Datree, which performs analysis across a user’s GitHub repo to map the committers, code components, and repositories.
Ethereum allows developers to run decentralized applications. But the tooling for building and managing those decentralized applications is immature. Experienced software engineers have difficulty getting started with writing Ethereum applications because the stack of tools is so unfamiliar and different than traditional software tools.
As we move towards Web3, many new tools will be built. Web2 was the result of Ruby on Rails, Amazon Web Services, the iPhone, and other software tools that made it easier to deploy web servers and consume Internet services. In the world of Web2, we saw the birth of Airbnb, Uber, and Netflix. In the world of Web3, we will see new types of gig economy apps, sharing economy platforms, and social networks. These new applications will arrive gradually as the tooling improves, and makes it easier for developers to hack together businesses and side projects built on cryptocurrencies.
Brian Soule is the founder or Ethsimple, a company that makes tools for Ethereum developers. Brian joins the show to talk about the state of cryptocurrencies, the tooling that developers have access to, and his company Ethsimple.
We cover high-level ideas, such as Bitcoin maximalism and also talk about some more technical areas of the Ethereum ecosystem, such as the Ethereum Name Service.
A Kubernetes cluster presents multiple potential attack surfaces: the cluster itself, a node running on the cluster, a pod running in the node, a container running in a pod. If you are managing your own Kubernetes cluster, you need to be aware of the security settings on your etcd, your API server, and your container build pipeline.
Many of the security risks of a Kubernetes cluster can be avoided by using the default settings of Kubernetes, or by using a managed Kubernetes service from a cloud provider or an infrastructure company. But it is useful to know about the fundamentals of operating a secure cluster, so that you can hopefully avoid falling victim to the most common vulnerabilities.
Liz Rice wrote the book Kubernetes Security with co-author Michael Hausenblas. Liz works at Aqua Security, a company that develops security tools for containerized applications. In today’s show, Liz gives an overview of the security risks of a Kubernetes cluster, and provides some best practices including secret management, penetration testing, and container lifecycle management.
- Kubernetes Security by Michael Hausenblas, Liz Rice – O’Reilly Media
- Open Source Security Podcast – Talking about Kubernetes and container security with Liz Rice
- Keynote: Running with Scissors – Liz Rice, Technology Evangelist, Aqua Security
Cloud computing has been popular for less than twenty years. Large software companies have existed for much longer. If your company was started before the cloud became popular, you probably have a large, data center on your companies premises. The shorthand term for this software environment is “on-prem”.
Deploying software to your own on-prem servers can be significantly different than deploying to remote servers in the cloud. In the cloud, servers and resources are more standardized. It is often easier to find documentation and best practices for how to use cloud services.
Many of the software vendors who got started in the last decade created their software in the cloud. For example, Readme.io makes it easy for companies to create hosted documentation. Their early customers were startups and other cloud-native companies. All of those companies were happy to consume the software in the cloud. As time went on, Readme found that other customers wanted to use the Readme product as a self-hosted, on-prem service. Readme needed to figure out how to deploy their software easily to the “on-prem” environment.
It turns out that this is a common problem. Software vendors who want to sell to on-prem enterprises must have a defined strategy for making those deployments to on-prem infrastructure–and those deployments are not always easy to configure.
Replicated is a company that allows cloud-based software to easily deploy to on-prem infrastructure. Grant Miller is the founder of Replicated and he joins the show to discuss on-prem, cloud, and the changing adoption patterns of enterprise software companies.
- Medium – Introducing Replicated, A better way to deploy SaaS on-premise
- How ReadMe Went From SaaS To On-Premises In Less Than One Week
Uber manages the car rides for millions of people. The Uber system must remain operational 24/7, and the app involves financial transactions and the safety of passengers.
Uber infrastructure runs across thousands of server instances and produce terabytes of monitoring data. The monitoring data is used to understand the health of the software systems as well as relevant business metrics, such as driver efficiency, daily revenues, and user satisfaction.
Uber adopted the Prometheus monitoring system to manage their monitoring data. Prometheus regularly scrapes metrics across infrastructure to gather time series data about the state of everything across Uber. As the usage of Prometheus has grown within the company, Uber has had to figure out how to scale their monitoring platform.
M3 is a monitoring system built at Uber to scale Prometheus and provide a platform that can effectively scale the data storage as well as the query serving. Rob Skillington is a staff software engineer at Uber, and he joins the show to talk about monitoring at Uber–from the requirements of the system to the implementation of M3.
At Uber, M3 powers dashboards, ad-hoc queries, and alerting. M3 was open sourced to give other users access to a scalable Prometheus solution. In a previous episode with Brian Boreham, we discussed one strategy for scaling Prometheus. Today’s episode covers another scalability solution, with M3.
- Uber Engineering Blog – M3: Uber’s Open Source, Large-scale Metrics Platform for Prometheus
- M3 – The fully open source metrics platform built on M3DB, a distributed timeseries database
- GitHub – M3 Monorepo – Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Metrics Platform
- M3 Documentation
Data infrastructure is advancing beyond the days of Hadoop MapReduce, single-node databases, and nightly reporting.
Companies are adopting modern data warehouses, streaming data systems, and cloud-specific data tools like BigQuery. Every company with a large amount of data wants to aggregate that data into a data lake and make the data available to developers. All of this data can be used to power machine learning models which can potentially improve every area within a company where they have historical data.
“Data pipeline” is a term used to describe the process of preparing data, building machine learning models, deploying those models, and tracking the results of those models.
Pachyderm is a company and open source project that is focused on deployment, management, and scalability of data pipelines. Pachyderm allows developers to version data, track the state of data sets, backtest machine learning models, and collaborate on data. It also tackles the very hard problem of machine learning auditability.
Joe Doliner is the CEO of Pachyderm and joins the show to discuss his experience building Pachyderm over the last five years. Data infrastructure has changed a lot in five years, and the world has moved in a direction that has benefitted Pachyderm, with more infrastructure moving to containers and more data teams advancing beyond a world of just Hadoop MapReduce.
In today’s show, Joe talks about modern infrastructure, data provenance, and the long-term vision of Pachyderm.
Infrastructure software is having a renaissance.
Cloud providers offer a wide range of deployment tools, including virtual machines, managed Kubernetes clusters, standalone container instances, and serverless functions. Kubernetes has standardized the container orchestration layer and created a thriving community. The Kubernetes community gives the cloud providers a neutral ground to collaborate on projects that benefit everyone.
The two forces of cloud providers and Kubernetes have led to massive improvements in software quality and development practices over the last few years. But one downside of the current ecosystem is that many more developers learn how to operate a Kubernetes cluster than perhaps is necessary. “Serverless” tools are at a higher level than Kubernetes, and can improve developer productivity–but a risk of using a serverless tool is the potential for lock-in, and a lack of portability.
Knative is an open-source serverless platform from Google built on top of Kubernetes. Ville Aikas is a senior staff engineer at Google who has worked at the company for eleven years. With his experience, Ville brings a rare perspective to the subjects of Kubernetes, serverless, and the infrastructure lessons of Google. Ville joins the show to discuss Knative, the motivation for building it, and the future of “serverless” infrastructure.
Virtualization software allows companies to get better utilization from their physical servers. A single physical host can manage multiple virtual machines using a hypervisor. VMware brought virtualization software to market, creating popular tools for allowing enterprises to deploy virtual machines throughout their organization.
Containers provide another improvement to server utilization. A virtual machine can be broken up into containers, allowing multiple services to run within a single VM. Containers proliferated after the popularization of Docker, and the Kubernetes open source container orchestration system grew to be the most common way of managing the large numbers of containers that were running throughout an organization.
As Kubernetes has risen to prominence, software infrastructure companies have developed Kubernetes services to allow enterprises to use Kubernetes more easily. VMware’s PKS is one example of a managed Kubernetes service.
Brad Meiseles is a senior director of engineering at VMware with more than nine years of experience with the company. He joins the show to discuss virtualization, Kubernetes, containers, and the strategy of a large infrastructure provider like VMware.
Real estate is an asset that is not straightforward to invest in. Real estate can generate excellent returns for investors, but can require much more time and expertise than stocks. Cadre is a company that allows users to invest in real estate more easily and intelligently. Cadre provides users with lots of data about potential investments and enables investments in those opportunities within the platform.
Leonid Movsesyan is the head of engineering at Cadre and joins the show to talk about the problems being solved by the company in areas of product development, infrastructure engineering, hiring, and data science. To build a platform for evaluating real estate investments, Cadre ingests and merges lots of data sets–some public and some private. This gives investors a detailed picture of the value of investments.
Fintech Daily is a new podcast from Software Engineering Daily covering payments, cryptocurrencies, trading, and the intersection of finance and technology. We are looking for volunteer hosts for Fintech Daily, and if you are interested in working with us to conduct interviews, send an email to email@example.com. You can find the podcast on iTunes, Google, and everywhere else, and if you are interested in hosting, don’t hesitate to reach out.
RocksDB is a storage engine based on the log-structured merge tree data structure. RocksDB was developed at Facebook to provide a tool for embedded databases. The code for RocksDB is a fork of LevelDB, an embedded database built by Google for the Chrome browser.
Every database has a storage engine. The storage engine is the low-level data structure that manages the data in the database. RocksDB is widely used in database applications where a log-structured merge tree is preferable to a b-tree. These tend to be write-heavy workloads.
In past shows, we have explored applications of RocksDB in our coverage of databases like TiDB, data-intensive applications like Smyte, and data platforms like Rockset. In today’s episode, Dhruba Borthakur and Igor Canadi join for a deep dive into how RocksDB works. Dhruba was the original creator of RocksDB, and Igor is a former Facebook engineer who worked on RocksDB in its early days. Both Dhruba and Igor work at Rockset.
We talk about the log-structured merge tree, discuss why an LSM has higher write throughput than storage engines based on a b-tree, and evaluate some of the use cases for RocksDB.
HashiCorp was founded seven years ago with the goal of building infrastructure tools for automating cloud workflows such as provisioning, secret management, and service discovery. Hashicorp’s thesis was that operating cloud infrastructure was too hard: there was a need for new tools to serve application developers.
Hashicorp founders Mitchell Hashimoto and Armon Dadgar began releasing open source tools to fulfill their vision of better automation. Terraform, Vagrant, Consul, and other tools created by Hashicorp gained popularity, and Hashicorp began iterating on their business model. Today, Hashicorp makes money by offering enterprise features and support to enterprises such as Pinterest, Adobe, and Cruise Automation.
Over the last seven years, enterprise software infrastructure has changed rapidly. First, enterprises moved from script-based infrastructure automation to container orchestration frameworks. Then, the container orchestration world consolidated around Kubernetes.
Today, large enterprises are rapidly adopting Kubernetes with a mix of public cloud and on-prem vendors. At the same time, these enterprises are also becoming more willing to consume proprietary tools from the public cloud providers.
Hashicorp has benefitted from all of this change. Their different tools fit into a variety of workflows, and are not closely coupled with any particular cloud provider or platform solution.
Armon and Mitchell join today’s show to discuss the business model and the product philosophy of HashiCorp. We also touch on service mesh, zero trust networking, and their lessons from the container orchestration wars.
The post Scaling HashiCorp with Armon Dadgar and Mitchell Hashimoto appeared first on Software Engineering Daily.
Tyler Cowen’s book Stubborn Attachments outlines a framework that individuals can use to make decisions grounded in economic philosophy. In his previous books, Tyler examined recent economic history. Stubborn Attachments gives his perspective for navigating the future.
Tyler is a professor of economics at George Mason University. He is also the host of Conversations with Tyler, a podcast that includes guests such as Ethereum creator Vitalik Buterin, Stripe co-founder Patrick Collison, and Coinbase CTO Balaji Srinivasan. Tyler blogs frequently at Marginal Revolution.
Tyler’s previous appearance on Software Engineering Daily centered around his earlier books, including The Complacent Class. In this episode, Tyler describes the philosophy outlined in Stubborn Attachments, then we discuss how his philosophy relates to software engineering, podcasting, and economics.
To find all 900 of our old episodes, including past episodes with writers, entrepreneurs, and venture capitalists, check out the Software Engineering Daily app in the iOS and Android app stores. Whether or not you are a software engineer, we have lots of content about technology, business, and culture. In our app, you can also become a paid subscriber and get ad-free episodes–and you can have conversations with other members of the Software Engineering Daily community.
Artificial intelligence is reshaping every aspect of our lives, from transportation to agriculture to dating. Someday, we may even create a superintelligence–a computer system that is demonstrably smarter than humans. But there is widespread disagreement on how soon we could build a superintelligence. There is not even a broad consensus on how we can define the term “intelligence”.
Information technology is improving so rapidly we are losing the ability to forecast the near future. Even the most well-informed politicians and business people are constantly surprised by technological changes, and the downstream impact on society. Today, the most accurate guidance on the pace of technology comes from the scientists and the engineers who are building the tools of our future.
Martin Ford is a computer engineer and the author of Architects of Intelligence, a new book of interviews with the top researchers in artificial intelligence. His interviewees include Jeff Dean, Andrew Ng, Demis Hassabis, Ian Goodfellow, and Ray Kurzweil.
Architects of Intelligence is a privileged look at how AI is developing. Martin Ford surveys these different AI experts with similar questions. How will China’s adoption of AI differ from that of the US? What is the difference between the human brain and that of a computer? What are the low-hanging fruit applications of AI that we have yet to build?
Martin joins the show to talk about his new book. In our conversation, Martin synthesizes ideas from these different researchers, and describes the key areas of disagreement from across the field.
To find all 900 of our old episodes, including past episodes with authors and artificial intelligence researchers, check out the Software Engineering Daily app in the iOS and Android app stores. Whether or not you are a software engineer, we have lots of content about technology, business, and culture. In our app, you can also become a paid subscriber and get ad-free episodes–and you can have conversations with other members of the Software Engineering Daily community.
Mars is a cold, inhospitable planet far from earth. It presents one of the most complex challenges faced by engineers: how can we create a new world?
To create a new world, first we have to get there. We can build new rockets with improved propulsion systems. We can build ships that allow us to survive the long, grueling trip from Earth to Mars. We can build robots that will help us construct our new home. And this is just the beginning. Mars could be warmed, and could develop a hydrologic cycle like the system of clouds and oceans on earth. Mars could be a place for new ideas and new cultures, unfettered by the conventions of Earth.
Mike Solana is the host of Anatomy of Next, a podcast about technologies and philosophies of the future. He’s also a vice president at Founder’s Fund. In a previous episode, Mike joined the show to talk about artificial intelligence, genetics, and robotics. Today, we discuss Mars.
The latest season of Anatomy of Next explores the science that is bringing us closer to exploring other planets. On his podcast, Mike speaks with engineers, researchers, and entrepreneurs about the state of the art of space technology–as well as the challenges that remain unsolved.
Mike returns to the show to discuss this dream of a new world. Why should we go to Mars? And why should the software engineers listening to this podcast even care?
To find all 900 of our old episodes, including past episodes with venture capitalists, futurists, and philosophers, check out the Software Engineering Daily app in the iOS and Android app stores. Whether or not you are a software engineer, we have lots of content about technology, business, and culture. In our app, you can also become a paid subscriber and get ad-free episodes–and you can have conversations with other members of the Software Engineering Daily community.
Social media has transformed our lives. It has also transformed how wars are fought. P.W. Singer’s new book “Likewar: The Weaponization of Social Media” describes the far-reaching impact of social media on the tactics and strategies used by military, business, and everyday citizens.
We have all read about stories such as Russian bots and Cambridge Analytica, but Likewar covers many more cases that are surprising and mildly frightening. From the Gaza Strip to the streets of Chicago to Taylor Swift’s Instagram feed, Likewar describes just how pervasive the effect of social media has been on warfare.
Likewar also provides historical context. For software engineers, the repurposing of social media as a weapon is disconcerting. Many of us are working on products with a social networking component. Does this make us complicit in building weapons?
We can find some reassurance in the fact that this has happened before: from the newspaper to the television, every new invention has been used repurposed for war.
In a war, a new piece of technology always presents a new vector to gain an advantage in a conflict. Because the stakes are so high in a war, there is a large incentive to find creative ways to use technology to undermine your adversaries and to help your allies.
P.W. Singer has written about robotics, cybersecurity, and modern warfare for a decade. In a previous episode, we discussed subjects like Stuxnet, drones, and social media manipulation. In today’s show, P.W. returns to talk about his book Likewar: The Weaponization of Social Media.
The post Likewar: The Weaponization of Social Media with P.W. Singer appeared first on Software Engineering Daily.
Infrastructure software can be a great business.
An infrastructure software company sells core technology to a large enterprise such as a bank or insurance company. This software has near zero marginal cost and generates a large annuity for the infrastructure software company. Once a bank has purchased your infrastructure software, the bank is likely to renew every year and never remove the software.
Selling infrastructure software is like selling concrete or steel, except the software is cheaper to produce, easier to distribute, and generates an annuity rather than being a one-time sale.
The fundamental economics of enterprise infrastructure software are extremely appealing, and every year more businesses enter the space–but few businesses ever leave. If you are starting an infrastructure software company, you can expect a complex battle for market share. There is no easy trick to get it into the hands of your target customer.
Martin Casado studied computer science at Stanford before founding Nicira, a company that pioneered software-defined networking and virtualization technology. In 2012, Nicira sold to VMware for $1.26 billion. Martin now works as a general partner at Andreessen Horowitz.
Martin writes about the modern strategies of building a successful infrastructure software company. He describes two methods of selling into an enterprise: bottoms-up and top-down.
In a bottoms-up model, engineers within an enterprise start using your product to solve a well-defined problem, such as API management. As more and more employees within the organization start to use your product, you can begin to engage the enterprise about becoming a paying customer for your product. Since the enterprise is already using your product, the sales conversation is much easier.
In the top-down model, you engage the CIO, CEO, or CTO directly and try to convince them that your product is worth paying for. When the senior leadership of a bank buys into your product idea, you can count on that senior leadership to convince their developers to use your product within the bank.
It is a rare occurrence that your infrastructure software company will be able to fit cleanly into either of these models–bottoms-up or top-down. More often, there will be some bottoms-up usage, and some top-down buy-in for your product. But you will have to evangelize the product on all fronts. You will have to convince both the engineers and the senior leadership.
Your product probably won’t speak for itself. You will have to develop expertise in sales, marketing, and consultancy. And in many cases, you might end up in an unending chasm.
The unending chasm describes a mode in which an infrastructure company must function as both a product company and a consultancy. Your consultancy is necessary to integrate your product into the enterprise, and ensure that your software actually gets used. But it reduces the appealing economics of a pure software company.
The unending chasm does not prevent you from being successful. Companies who have had very successful IPOs remain in the unending chasm. But it’s useful to know whether you are heading for an unending chasm–or if you are already in one.
Martin Casado joins the show today for a discussion of product development, software engineering, and go-to-market strategy.
To find all 900 of our episodes, including past episodes with a16z partners, check out the Software Engineering Daily app in the iOS and Android app stores. Whether or not you are a software engineer, we have lots of content about technology, business, and culture. In our app, you can also become a paid subscriber and get ad-free episodes–and you can have conversations with other members of the Software Engineering Daily community.
When TensorFlow came out of Google, the machine learning community converged around it. TensorFlow is a framework for building machine learning models, but the lifecycle of a machine learning model has a scope that is bigger than just creating a model. Machine learning developers also need to have a testing and deployment process for continuous delivery of models.
The continuous delivery process for machine learning models is like the continuous delivery process for microservices, but can be more complicated. A developer testing a model on their local machine is working with a smaller data set than what they will have access to when it is deployed. A machine learning engineer needs to be conscious of versioning and auditability.
Kubeflow is a machine learning toolkit for Kubernetes based on Google’s internal machine learning pipelines. Google open sourced Kubernetes and TensorFlow, and the projects have users AWS and Microsoft. David Aronchick is the head of open source machine learning strategy at Microsoft, and he joins the show to talk about the problems that Kubeflow solves for developers, and the evolving strategies for cloud providers.
David was previously on the show when he worked at Google, and in this episode he provides some useful discussion about how open source software presents a great opportunity for the cloud providers to collaborate with each other in a positive sum relationship.
The post Kubeflow: TensorFlow on Kubernetes with David Aronchick appeared first on Software Engineering Daily.
When a user interacts with an application to order a ride with a ridesharing app, the data for that user interaction is written to a “transactional” database. A transactional database is a database where specific rows need to be written to and read from quickly and consistently.
Speed and consistency are important for applications like a user ordering a car, and riding around in that car, because the user’s client is frequently communicating with the database to update their session. Other applications of a transactional database would include a database that backs a messaging system, a banking application, or document editing software.
The data from a transactional database is often reused in “analytic” databases. An analytic database can be used for performing large scale analysis, aggregations, averages, and other data science queries.
The requirements for an analytic database are different from a transactional database because the data is not being used for an active user session. To fill the data in an analytic database, the transactional data gets copied from the transactional database in a process called ETL.
The separation of the transaction data store from the analytic data store causes problems for data engineering. To address these problems, some newer databases combine transactional and analytic functionality in the same database. These databases are often called “NewSQL”.
TiDB is an open source database built on RocksDB and Kubernetes. TiDB is widely used in China by high volume applications such as bike sharing and massively multiplayer online games. Kevin Xu works at PingCAP, a company built around TiDB. He joins the show to talk about modern databases, distributed systems, and the architecture for TiDB.
Frontend development has moved towards component-driven-development. At a typical technology company, a designer will put together a design file of different user interface elements, and the frontend engineer will take those UI elements and program code that can render those designs as components. As organizations have started to reuse their components and share them across the organization, the efficiency of design and frontend engineering is improving.
User interface is gaining more of an emphasis with organizations and new tools are allowing frontend engineers and designers to work together more productively. One of these tools is Storybook, a system for sharing components and the code that renders those components.
Zoltan Olah joins the show to talk about Storybook, and his company Chroma. Chroma is building tools to allow design-driven teams to work more effectively. We talked about how the relationship of designers and frontend engineers has some resemblance to the relationship between “dev” and “ops” before the DevOps movement. There are some frictions in the process of moving between design and engineering implementation, and in talking to Zoltan, I got an understanding for how much the UI layer could improve through better tooling.
Netflix has thousands of service instances communicating with each other. When a Netflix client on a smartphone makes a request for a movie, that request hits Netflix’s backend, where the request is fulfilled by a chain of requests through different services.
Services and clients communicate using several different interaction patterns. A service might send a single request and expect a single response. Or it might fire and forget, not expecting a response. A service also might send a single request and expect a stream of messages to be sent back over the network. In a highly interactive application like Netflix, there is a frequent use of “streams” of data.
RSocket is a protocol that makes reactive streams easier to work with. Ryland Degnan is the CTO of Netifi, and he joins the show to discuss reactive streams and service-to-service networking. Ryland worked at Netflix on the Edge Platform team for four years, and he shares his experience working at Netflix, the challenges of networking at scale, and the company he is building around RSocket.
The post RSocket: Reactive Streaming Service Networking with Ryland Degnan appeared first on Software Engineering Daily.
Prometheus is an open source monitoring system and time series database. Prometheus includes a multi-dimensional data model, a query language called PromQL, and a pull model for gathering metrics from your different services. Prometheus is widely used by large distributed systems deployments such as Kubernetes and Cloud Foundry.
Prometheus gathers metrics from your services by periodically scraping those services. Those metrics get gathered, compressed, and stored onto disk for querying. But Prometheus is designed to store all of its records on one host in one set of files–which limits the scalability and availability of those metrics.
Cortex is an open source project built to scale Prometheus. Cortex effectively shards Prometheus by parallelizing the “ingestion” and storage of Prometheus metrics. Cortex can take metrics from multiple Prometheus instances and store them across a distributed NoSQL database like DynamoDB, BigTable, or Cassandra.
Bryan Boreham is an engineer at Weaveworks, where he works on deployment, observability, and monitoring tools for containers and microservices. He wrote much of the code for Cortex, and we met up at KubeCon North America to talk about the motivation for creating Cortex, the broader landscape of Kubernetes monitoring, and other approaches to scaling Prometheus.
When a developer provisions a cloud server, that server is called an “instance”. These instances can be used for running whatever workload a developer has, whether it is a web application, a database, or a set of containers.
The cloud is cheap to get started on. New applications with few users can often be hosted on infrastructure that is less than $10 per month. But as an application grows in popularity, there is more demand for CPUs and storage. A company will start to buy more and more servers to scale up to the requirements of their growing user base. The costs of running infrastructure in the cloud will increase, and the company will start to look for ways to save money.
One common method of saving money is to buy “spot instances”. A spot instance is an instance that is cheaper than “reserved instances” or “on-demand” instances. The reason that there are different instance types is because a giant cloud provider has a highly variable amount of work that is being demanded from that cloud provider.
If you are in charge of AWS, you have to make sure that at any given time, you can give server resources to anyone that asks for it. Your data centers need to have physical machines that are ready to go at any time. This means that much of the time, you have server resources that are going unused.
If you are a cloud provider, how can you get people to use your compute resources? You can make them cheaper. So a user can come along and buy your compute at the discounted “spot” price.
But this presents a problem for the cloud provider. If you start to give away your compute at cheaper prices, and then the overall demand for your cloud resources go up once again, you are going to miss out on profits. As the cloud provider, you need to kick people off of your spot instances, so that you can take those same instances and sell them to people at the higher market prices.
And this presents a problem for the user. If you buy a cheap spot instance, that instance is only available until the cloud provider decides to kick you off. You have a tradeoff between cost and availability of your instances. Because of this, spot instances are typically used only for workloads that are not mission critical–workloads that can afford to fail.
Spotinst is a company that allows developers to deploy their workloads reliably onto spot instances. Spotinst works by detecting when a spot instance is going to be reclaimed by a cloud provider and re-scheduling the workload from that cloud provider onto a new spot instance.
Amiram Shachar is the CEO of Spotinst. He joins the show to talk about the different types of instances across cloud providers, the engineering behind Spotinst, and how the usage of containers and the rise of Kubernetes is changing the business landscape of the cloud.
If a business has been operating successfully for a few years, that business has accumulated a high volume of data. That data exists in spreadsheets, CSV files, log files, and balance sheets. Data might be spread across local files on a user’s laptop, databases in the cloud, or storage systems in an on-premise data center.
Older businesses have more data, in more places, in more formats. Legacy systems and old batch processing jobs that have been running for years are taking data from one place and porting it to another.
Every mature company needs to access and analyze the data in all of these different places–whether they are a publication with millions of readers like The Economist or a fast growing infrastructure provider like Twilio.
“Business intelligence” is a term often used to describe tools for analyzing data in the form of charts, graphs, and reports. Business intelligence applications are crucial to the success of a business because they are used by everyone in an organization–whether you are a business analyst forecasting sales for the next quarter, an engineer who is determining how many servers to provision, or a CEO trying to decide what the best area of your business to focus on is.
There have been several generations of business intelligence tools. Each generation of business intelligence is built for the trends and infrastructure of that generation.
Looker is a more recent business intelligence tool that was built in light of several trends in software: the growth in volume of data; the growth in the number of systems that users need; the changing types of users that need to access data; and the need to share business intelligence across social workplace tools like Slack, Asana, and email.
Daniel Mintz joins the show to describe his experience using business intelligence tooling and his work at Looker, as well as the landscape of business intelligence, ETL, and data engineering.
The post Looker: Business Intelligence Platform with Daniel Mintz appeared first on Software Engineering Daily.
Robots are making their way into every area of our lives. Security robots roll around industrial parks at night, monitoring the area for intruders. Amazon robots tirelessly move packages around in warehouses, reducing the time and cost of logistics. Self-driving cars have become a ubiquitous presence in cities like San Francisco.
For a hacker in a dorm room, or a researcher in a small lab, how do you get started with robotics? There are drones and other small options like AWS DeepRacer–but what is the equivalent of the Raspberry Pi for large, human-sized robots?
Zach Allen is the founder of Slate Robotics, a company that makes large, human-sized robots that are at a low enough cost to be accessible to tinkerers, researchers, and prototype builders. Zach joins the show to talk about the state of robotics and why he started a robot company.
What Zach is doing is quite hard–he is a solo founder who has bootstrapped a robotics company from scratch. He is set up in a strip mall in Missouri, where he has set up a row of 3-D printers to create the parts for his robots. He programs and assembles these robots himself.
Whether you are interested in robots are thinking about starting a hardware company, this episode could be useful to you.
Netflix has petabytes of data and thousands of workloads running across that data every day. These workloads generate movie recommendations for users, create dashboards for data analysts to study, and reshape data in ETL jobs, to make it more accessible across the organization.
Over the last ten years, data engineering has become a key component of what makes Netflix successful. There are many different engineering roles who interact with the data infrastructure–including data analyst, machine learning scientist, analytics engineer, and software engineer.
Data engineering at Netflix has come a long way from the days of Hadoop MapReduce jobs running nightly, and generating reports of the most popular movies.
As data engineering and data science has grown, the tooling has expanded. The people in different data roles at Netflix might use Apache Spark, Presto, Python, Scala, SQL, and many other applications to study data–but in recent years, there is one tool that has stood out for its ability to be distinctly useful: Jupyter Notebooks.
A Jupyter Notebook lets users create and share documents that contain live code, visualizations, documentation, and many other types of components. In some ways, it is like a shareable IDE, that allows other people to see how you are working with your code and why you are making certain decisions. It is also a tool for building interactive, user-friendly applications–you can embed videos and images in a Jupyter notebook.
A Jupyter Notebook stores both the code and the results together in one place. By combining code with results in one document, you can have context around why a certain result came out the way it did.
Matthew Seal is a senior software engineer at Netflix, where he builds infrastructure and internal tools around Jupyter Notebooks. He joins the show to explain what problems Jupyter Notebooks solve for Netflix, and why they have quickly grown in popularity within the company.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
SponsorsHPE OneView integrates compute, storage, and networking resources across your data center and leverages a unified API to enable IT to manage infrastructure as code. Deploy infrastructure faster; simplify life cycle maintenance for your servers; give IT the ability to deliver infrastructure to developers as a service like the public cloud. Go to softwareengineeringdaily.com/hpeand learn about how HPE OneView can improve your infrastructure operations.OpenShift is a Kubernetes platform from Red Hat. OpenShift takes the Kubernetes container orchestration system and adds features that let you build software more quickly. OpenShift includes service discovery, CI/CD, built-in monitoring and health management, and scalability. With OpenShift, you avoid getting locked into any particular cloud provider. Check out OpenShift from RedHat, by going tosoftwareengineeringdaily.com/redhat.IBM Developer is a community of developers learning how to build entire applications with AI, containers, blockchains, serverless functions, and anything else you might want to learn about. Go to softwareengineeringdaily.com/ibm, and join the IBM Developer community.FullStory is offering a free 1 month trial at fullstory.com/sedaily to Software Engineering Daily listeners. This free trial doubles the regular 14-day trial available from fullstory.com, giving you time to test FullStory’s powerful search and session replay and even try out FullStory’s many integrations (Jira, Bugsnag, Trello, Intercom, and more).
Chinese Internet companies operate at a massive scale.
WeChat has over a billion users and is widely used as the primary means of payment by urban Chinese consumers. Alibaba ships 12 million packages per day, which is four times the amount of Amazon. JD.com, a Chinese ecommerce company, has perhaps the largest production Kubernetes installation in the world.
China’s rapid adoption of Internet services, combined with a large population and a growing middle class has led to the creation of Internet giants on par with the social networks, ecommerce sites, and ridesharing startups of the United States.
Last November, I attended the first KubeCon China and saw firsthand how the Chinese Internet companies are using open source software to scale their infrastructure.
Despite the differences between the US and China, the culture of technologists at KubeCon felt familiar. In some ways, it was just like any other Kubernetes conference that I have attended: large numbers of engineers trying to find the cutting edge of technology, and learning how to solve the problems they are facing back at the office.
There were presentations on scaling databases and service meshes and machine learning on Kubernetes. Outside of these presentation halls, there were tables where you could pick up a translation device so that Chinese-only and English-only presentations could be understood by the other nationality.
Dan Kohn joins the show to talk about Chinese Internet companies and how they are adopting Kubernetes. Dan is the executive director of the Cloud Native Computing Foundation, an organization within the Linux Foundation that organizes KubeCon. Before joining the CNCF, Dan worked as an entrepreneur, engineer, and executive at several technology companies.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Amazon Web Services changed how software engineers work. Before AWS, it was common for startups to purchase their own physical servers. AWS made server resources as accessible as an API request, and has gone on to create higher-level abstractions for building applications.
For the first few years of AWS, the abstractions were familiar. S3 provided distributed, reliable object storage. Elastic MapReduce provided a managed Hadoop system. Kinesis provided a scalable queue. Amazon provided developers with managed alternatives to complicated open source software.
More recently, AWS has started to release products that are unlike anything else. A perfect example is AWS Lambda, the first function-as-a-service platform. Other newer AWS products include Ground Station, a service for processing satellite data and AWS DeepRacer, a miniature race car for developers to build and test machine learning algorithms on.
As AWS has grown into new categories, the blog announcements of new services and features have started coming so frequently that it is hard to keep track of it all. Corey Quinn is the author of “Last Week in AWS”, a popular newsletter about what is changing across Amazon Web Services.
Corey joins the show to give his perspective on the growing, shifting behemoth that is Amazon Web Services–as well as the other major cloud providers that have risen to prominence. He’s also the host of the Screaming in the Cloud podcast, which you should check out if you like this episode.
Serverless computing is a technique for deploying applications without an addressable server.
A serverless application is running on servers, but the developer does not have access to the server in the traditional sense. The developer is not dealing with IP addresses and configuring instances of their different services to be able to scale.
Just as higher level languages like C abstracted away the necessity of a developer to work with assembly code, serverless computing gives a developer more leverage by letting them focus on business logic while a serverless platform takes care of deployment, uptime, autoscaling, and other aspects of cloud computing that are fundamental to every application.
Zeit is a deployment platform built for serverless development. In Zeit, users model a GitHub repository in terms of the functions within their application. Zeit deploys the code from those functions onto functions-as-a-service and allows you to run your code across all the major cloud providers.
Guillermo Rauch is the founder of Zeit, and he joins the show to discuss his vision for the company and the platform as it looks today. Guillermo was previously on the show to discuss Socket.io, which he created.
Functions-as-a-service allow developers to run their code in a “serverless” environment. A developer can provide a function to a cloud provider and the code for that function will be scheduled onto a container and executed whenever an event triggers that function.
An “event” can mean many different things. It is a signal that something has changed within your application. When you save a file to an Amazon S3 bucket, that creates an event. When a user signs up for your app, that can create an event.
Functions-as-a-service are allowing people to build applications completely out of managed cloud infrastructure. Apps can be fully “serverless”, with managed databases, queueing systems, and APIs tied together by event-triggered functions.
Today, there is not a consistent format for events across different applications and cloud providers. This makes it more difficult to stitch together events across these different environments. Ideally, events would be lightweight, easy to deserialize, and easy to interoperate with.
The Cloud Events specification is a project within the Cloud Native Computing Foundation with the goal of creating a standard format for events. Doug Davis is the CTO for developer advocacy of containers at IBM. He joins the show to discuss how events and event-based programming works, and the need for a common format across cloud events.
Most applications today are either deployed to on-premise environments or deployed to a single cloud provider.
Developers who are deploying on-prem struggle to set up complicated open source tools like Kafka and Hadoop. Developers who are deploying to a cloud provider tend to stay within that specific cloud provider, because moving between different clouds and integrating services across clouds adds complexity.
Ben Hindman started the Apache Mesos project when he was working in the Berkeley AMPLab. Mesos is a scheduler for resources in a distributed system, allowing compute and storage to be scheduled onto jobs that can use those resources. In his time at the AMPLab, Ben collaborated with Matei Zaharia, creator of Apache Spark.
Ben founded Mesosphere based off of his work on Apache Mesos, and since 2013 he has been building a company to bring it to market. In the meantime, several market forces have influenced the enterprise market.
Enterprise businesses built on virtual machines and on-prem hardware are trying to migrate to containers, Kubernetes, and Spark. Cloud providers like Google and Microsoft have risen to prominence in addition to Amazon’s continued growth, and enterprises are increasingly willing to adopt multiple clouds.
I spoke with Ben Hindman at Kubecon North America. Today, the company that he co-founded works to provide tools for managing these changes in infrastructure. In our conversation, we talked about the necessary mindset shifts for taking a research project and turning it into a highly successful product. We also talked about the newer trends in infrastructure–why enterprises will want multicloud deployments and how serverless APIs and backends will make the lives of developers much easier.
In a cloud infrastructure environment, failures happen regularly. The servers can fail, the network can fail, and software bugs can crash your software unexpectedly.
The amount of failures that can occur in cloud infrastructure is one reason why storage is often separated from application logic. A developer can launch multiple instances of their application, with each instance providing a “stateless” environment for serving API requests.
When the application needs to save state, it can make a call out to a managed cloud infrastructure product. Managed cloud databases provide a reliable place to manage application state. Managed object storage systems like Amazon S3 provide a reliable place to store files.
The pattern of relying on remote cloud services does not work so well for on-prem and hybrid cloud environments. In these environments, companies are managing their own data centers and their own storage devices. As companies with on-prem infrastructure adopt Kubernetes, there is a need for ways to manage on-prem storage through Kubernetes.
Saad Ali is a senior engineer at Google, where he works on Kubernetes. He is also a part of the Kubernetes Storage Special Interest Group. Saad joins the show talk about how Kubernetes interacts with storage, and how to manage stateful workloads on Kubernetes. We discuss the basics of Kubernetes storage, including persistent volumes and the container storage interface.
When a user makes a request to product like The New York Times, that request hits an API gateway. An API gateway is the entry point for an external request. An API gateway serves several purposes: authentication, security, routing, load balancing, and logging.
API gateways have grown in popularity as applications have become more distributed, and companies offer a wider variety of services. If an API is public, and anyone can access it, you might need to apply rate limiting so that users cannot spam the API. If the API is private, the user needs to be authenticated before the request is fulfilled.
Kong is a company that builds infrastructure for API management. The Kong API gateway is a widely used open source project, and Kong is a company built around supporting and building on top of the API gateway.
Marco Palladino is the co-founder and CTO of Kong. He joins the show to tell the story of starting Kong eight years ago, and how the API gateway product evolved out of an API marketplace. Marco also discusses the architecture of Kong and his vision for how the product will develop in the future–including the Kong service mesh.
Augmented reality glasses will let us walk through a world where the digital blends together with the physical. 3-D objects will be rendered and superimposed onto our field of vision, creating an environment for people to build applications we can hardly dream of today.
These augmented reality glasses are probably three to five years away from being ready for consumer use. But developers are already building augmented reality applications for smartphones using Apple ARKit and Android ARCore. These augmented reality toolkits use powerful smartphone processors and computer vision to give developers simple primitives for placing and manipulating 3-D objects.
Most of these AR applications are made for a single phone, and AR is useful for a single phone–for example, you could hold your phone up in front of an empty room, and see on your phone how it would look if you had an IKEA couch sitting in the middle of that room.
But shared augmented reality experiences are much more exciting.
Shared augmented reality can allow us to play a game of virtual basketball, both controlling the game that is synchronized between us. Shared AR would let me go to a restaurant, and create a virtual billboard in front of the restaurant that only you could see when you walked up to the restaurant and held your phone in front of you.
Ubiquity6 is a company with the goal of enabling shared AR experiences. Ankit Kumar is the co-founder and CTO of Ubiquity6, and he joins the show to explain why building shared AR is a challenging technical problem. It requires building a digital model of the real world, and mapping that model to coordinates in space, so that users can reliably persist augmented reality objects that each other can see.
We discuss computer vision, digital mapping, the increasing power of phone processors, and the potential of shared AR.
The post Ubiquity6: Augmented Reality Platform with Ankit Kumar appeared first on Software Engineering Daily.
Cloud providers created the ability for developers to easily deploy their applications to servers on data centers. In the early days of the cloud, most of the code that a developer wrote for their application could run on any cloud provider, whether it was Amazon, Google, or Microsoft. These cloud providers were giving developers the same Linux server that they would expect from an on-premise deployment.
Early cloud applications such as Netflix, Airbnb, and Uber took advantage of this cloud infrastructure to quickly scale their businesses. In the process, these companies had to figure out how to manage open source distributed systems tools such as Hadoop and Kafka. Cloud servers were easy to create, but orchestrating them together to build distributed systems was still very hard.
As the cloud providers matured, they developed higher level systems that solved many of the painful infrastructure problems. Managed databases, autoscaling queueing systems, machine learning APIs, and hundreds of other tools. Examples include Amazon Kinesis and Google BigQuery. These tools are invaluable because they allow a developer to quickly build applications on top of durable, resilient cloud infrastructure.
With all of these managed services, developers are spending less time on infrastructure and more time on business logic. But managed services also lead to a new infrastructure problem—how do you manage resources across multiple clouds?
A bucket storage system like Amazon S3 has different APIs than Google Cloud Storage. Google Cloud PubSub has different APIs than Amazon Kinesis. Since different clouds have different APIs, developers have trouble connecting cloud resources together, and it has become difficult to migrate your entire application from one cloud provider to another.
Crossplane is an open source control plane for managing resources across multiple clouds. Crossplane’s goal is to provide a single API surface for interfacing with all the parts of your application, regardless of what cloud they are on.
Crossplane is a project that was started by Upbound, a company with the goal of making multicloud software development easier. Bassam Tabbara is the CEO of Upbound, and he joins the show to talk about multi cloud deployments, Kubernetes federation, and his strategy for building a multi cloud API.
The post Crossplane: Multicloud Control Plane with Bassam Tabbara appeared first on Software Engineering Daily.
Originally posted on 13 September 2017.
Machines understand the world through mathematical representations. In order to train a machine learning model, we need to describe everything in terms of numbers. Images, words, and sounds are too abstract for a computer. But a series of numbers is a representation that we can all agree on, whether we are a computer or a human.
In recent shows, we have explored how to train machine learning models to understand images and video. Today, we explore words. You might be thinking–”isn’t a word easy to understand? Can’t you just take the dictionary definition?” A dictionary definition does not capture the richness of a word. Dictionaries do not give you a way to measure similarity between one word and all other words in a given language.
Word2vec is a system for defining words in terms of the words that appear close to that word. For example, the sentence “Howard is sitting in a Starbucks cafe drinking a cup of coffee” gives an obvious indication that the words “cafe,” “cup,” and “coffee” are all related. With enough sentences like that, we can start to understand the entire language.
Adrian Colyer is a venture capitalist with Accel, and blogs about technical topics such as word2vec. We talked about word2vec specifically, and the deep learning space more generally. We also explored how the rapidly improving tools around deep learning are changing the venture investment landscape.
Originally posted on 28 July 2017.
Self-driving cars are here. Fully autonomous systems like Waymo are being piloted in less complex circumstances. Human-in-the-loop systems like Tesla Autopilot navigate drivers when it is safe to do so, and lets the human take control in ambiguous circumstances.
Computers are great at memorization, but not yet great at reasoning. We cannot enumerate to a computer every single circumstance that a car might find itself in. The computer needs to perceive its surroundings, plan how to take action, execute control over the situation, and respond to changing circumstances inside and outside of the car.
Lex Fridman has worked on autonomous vehicles with companies like Google and Tesla. He recently taught a class on deep learning for semi-autonomous vehicles at MIT, which is freely available online. There was so much ground to cover in this conversation. Most of the conversation was higher level. How do you even approach the problem? What is the hardware and software architecture of a car?
I enjoyed talking to Lex, and if you want to hear more from him check out his podcast Take It Uneasy, which is about jiu jitsu, judo, wrestling, and learning.
The post Self-Driving Deep Learning with Lex Fridman Holiday Repeat appeared first on Software Engineering Daily.
Originally posted on 1 May 2018.
Technology is pushing us rapidly toward a future that is impossible to forecast. We try to imagine what that future might look like, and we can’t help having our predictions shaped by the media we have consumed.
1984, Terminator, Gattaca, Ex Machina, Black Mirror–all of these stories present a dystopian future. But if you look around the world, the most successful technologists are mostly guided by a sense of optimism. Technologists themselves are mostly idealistic–they see the future through a utopian lens. Popular media largely tells a different story: that we are headed for a dystopian world.
Why is there such a gulf in the level of idealism between technologists and the media?
Mike Solana found himself asking that question on a regular basis during his work at Founder’s Fund, where he is a vice president. Founder’s Fund has a bias toward funding difficult, cutting-edge technology like gene editing, robotics, and nuclear energy. This technology that Mike was seeing made him excited about the future–which led to his creation of the podcast “Anatomy of Next.”
“Anatomy of Next” has explored biology, robotics, nuclear energy, superintelligence, and the nature of reality. Soon the podcast will be exploring how our civilization will explore and settle the solar system–specifically Mars.
I’ve listened through the entire first season of the show twice and enjoyed it so much because Mike explores questions that are on the border of philosophy and technology–questions about the nature of reality, and what makes us human–and nobody can give perfect answers to these questions. But Mike interviews top experts on the show, which provides us with a framework. Guests on “Anatomy of Next” include Nick Bostrom (the author of Superintelligence), George Church (a pioneer in gene editing), and Palmer Luckey (the founder of VR company Oculus).
Mike joins the show to talk about why he started “Anatomy of Next,” and his own perspective on the future.
The post Technology Utopia with Michael Solana Holiday Repeat appeared first on Software Engineering Daily.
Originally posted on 16 June 2017.
John Looney spent more than 10 years at Google. He started with infrastructure, and was part of the team that migrated Google File System to Colossus, the successor to GFS. Imagine migrating every piece of data on Google from one distributed file system to another.
In this episode, John sheds light on the engineering culture that has made Google so successful. He has very entertaining stories about clusterops and site-reliability engineering.
Google’s success in engineering is due to extremely high standards, and a culture of intellectual honesty. With the volume of data and throughput that Google responds to, 1-in-a-million events are likely to occur. There isn’t room for sloppy practices.
John now works at Intercom, where he is adjusting to the modern world of Google infrastructure for everyone. This conversation made me feel quite grateful to be an engineer in a time where everything is so much cheaper, so much easier, and so much more performant than it was in the days when Google first built everything from scratch.
I had a great time talking to John, and hope he comes back on the show again in the future because it felt like we were just scratching the surface of his experience.
Originally posted on 14 February 2017.
Most tech companies are moving toward a highly distributed microservices architecture. In this architecture, services are decoupled from each other and communicate with a common service language, often JSON over HTTP. This provides some standardization, but these companies are finding that more standardization would come in handy.
At the ridesharing company Lyft, every internal service runs a tool called Envoy. Envoy is a service proxy. Whenever a service sends or receives a request, that request goes through Envoy before meeting its destination.
Matt Klein started Envoy, and he joins the show to explain why it is useful to have this layer of standardization between services. He also gives some historical context for why Envoy was so helpful to Lyft.
At Facebook, Venkat Venkataramani saw how large volumes of data were changing software infrastructure. Applications such as logging servers and advertising were creating fast moving, semi-structured data. The user base was growing, the traffic was growing, and the volume of data was growing. And the popular methods for managing this data were insufficient for the applications that developers wanted to build on top.
In previous episodes about data platforms, we have covered similar difficulties as experienced by Uber and Doordash. Incoming data is often in JSON, which is hard to query. The data is transformed to a file format like Parquet, which requires an ETL job. Once it is in a Parquet file on disk in a data lake, the access time is slow. To query the data efficiently, it must be loaded into a data warehouse, which loads the data into memory, often in a columnar format that is easy to aggregate.
Imagine being a developer at Facebook, Uber, or Doordash, and trying to build a simple dashboard, or a machine learning application on top of this data platform. Where do you find the right data? How do you know it is up to date? And what if you don’t know the shape of your queries ahead of time, and you haven’t defined indexes over your data? The access speed will be too slow to do exploratory analysis.
There are so many steps in this process, and each of these steps creates friction for application developers that want to build on top of “big data”. Since even Facebook was having trouble managing this problem of the data platform, Venkat figured there was an opportunity to build a company around solving the data platform for other software companies.
Venkat is the CEO of Rockset, a data system that is built to make it easy for developers to build data-driven apps. In Rockset, data can be ingested from data streams, data lakes, and databases. Rockset creates multiple indexes and schemas across the data. Because there are multiple models for querying, Rockset can analyze an incoming query and create an intelligent query plan for serving it.
Venkat joins the show to discuss his time working on data at Facebook, the untapped opportunities of using that data, and the architecture of Rockset.
The post Modern Front End: React, GraphQL, VR, WebAssembly with Adam Conrad appeared first on Software Engineering Daily.
Software products are distributed across more and more servers as they grow. With the proliferation of cloud providers like AWS, these large infrastructure deployments have become much easier to create. With the maturity of Kubernetes, these distributed applications are more reliable.
Developers and operators can use a service mesh to manage the interactions between services across this distributed application.
A service mesh is a layer across a distributed microservices application that consists of service proxy sidecars running alongside each service in a cluster, along with a central control plane for communicating with those sidecar proxies.
A service mesh has many uses. Every request and response within the application gets routed through the service proxy, which can improve observability, traffic control to different instances, and circuit breaking in case of an instance failure. The central control plane can be used manage network policy throughout the whole system.
We have done shows about each of the different components of a service mesh system, including different types of service proxies, as well as the service meshes built on top of these proxies.
Linkerd, which is made by the startup Buoyant, was the first service mesh product to come to market, and it has the most production use, with customers like Expedia and Monzo bank. Istio is a more recent service mesh which uses the Envoy service proxy. Istio came out of Google and is also supported by IBM—setting up a classic competition between a startup and the large incumbents.
William Morgan is the CEO of Buoyant, and he joins the show to talk about the use cases and adoption of service mesh. He also talks about the business landscape of the service mesh category, and how to compete with giant cloud providers.
Market strategy defines how a company is positioning itself to be successful. This strategy encompasses engineering, sales, marketing, recruiting, and everything else within a company.
Herb Cunitz has led teams at Hortonworks, VMware, SpringSource, and several other companies over his 30 year career in software. After working as president of Hortonworks, Herb started AccelG2M. AccelG2M works with software companies to define their go-to-market strategy.
Software companies require a great deal of long-term strategic thinking. Engineering, sales, marketing, and leadership must work together to build a plan that will allow the company to reach an exit: either an acquisition or an IPO.
Executives at a software company must create a clear strategy and communicate it to the employees throughout the organization. The strategy must be implemented, meeting deadlines and hitting milestones. New team members must be recruited, and unsuccessful workers must be let go.
In today’s show, Herb provides some invaluable strategic wisdom for anyone working in software–whether you are an engineer, salesperson, or investor.
Software companies today rely on group chat applications.
The world of startups and small businesses is dominated by Slack. But for some large enterprises, regulatory constraints prevent them from using Slack. Slack is a web application that is hosted in the cloud, and regulated industries such as banking often need to run their applications on their own on-prem infrastructure.
Mattermost is an open source alternative to Slack that can be self-hosted. This means that all of the networking complexities and scalability challenges that are controlled in the cloud by Slack need to be handled by open source code rather than managed services running in the cloud.
Because it is open source, Mattermost can also be redesigned and customized. Uber designed their own custom version of Mattermost called uChat.
Corey Hulen is a co-founder and the CTO of Mattermost. He joins the show to discuss the motivation for building Mattermost and the engineering challenges of building an open source chat system.
The post Mattermost: Self-Hosted Slack Alternative with Corey Hulen appeared first on Software Engineering Daily.
A bank account is a platform for apps to be built on top of.
If that sounds like a weird idea, think about the features of a bank account. Most users only have a single bank account, making it a tool for identity and authentication. The series of transactions in a bank account provides a data set that can be used for analyzing payment history and issuing loans, or insurance.
But there are difficulties to building a platform on top of banking. There are thousands of different banks. If you want to build an application that integrates with a user’s bank, you need to be able to integrate with any bank that the user might use–whether it’s Bank of America, Wells Fargo, or Chase.
Plaid is a company that builds APIs for users to connect to banks. Applications such as Venmo, Betterment, and Coinbase use Plaid to connect with the bank accounts of their users. Jean-Denis Greze joins the show to explain how applications use Plaid, and how Plaid has scaled its infrastructure to handle a high volume of requests. He also discusses the potential of banking as a platform, and the strategy for expanding the APIs that Plaid can offer to developers.
Fintech Daily is a new podcast from Software Engineering Daily covering payments, cryptocurrencies, trading, and the intersection of finance and technology. We are looking for volunteer hosts for Fintech Daily, and if you are interested in working with us to conduct interviews, send an email to firstname.lastname@example.org. You can find the podcast on iTunes, Google, and everywhere else, and if you are interested in hosting, don’t hesitate to reach out.
When a startup finds product market fit, the adoption of that product can grow rapidly, turning a startup into a high growth company.
All of a sudden, a startup that was struggling to find its first customer is bombarded with new challenges. The startup has to hire tens of new employees. This requires raising capital, so the startup has to meet with investors and lawyers. A rapid influx of new customers puts a strain on the engineering and customer service elements of the company.
There is too much to do, and there is only so much time in a day.
The CEO of the high-growth company is up late into the night, answering emails and losing sleep. But these are good problems to have, and the company is in a state of exuberance. The CEO must balance psychological health with the stressful task of scaling a company.
Elad Gil is an entrepreneur and author of “High Growth Handbook”, a book of lessons and guidelines about how to navigate a startup that has found product market fit, and is beginning to scale. High Growth Handbook includes interviews with experienced entrepreneurs such as Marc Andreessen and Patrick Collison, whom Elad met with as he wrote the book.
Elad joins the show to discuss his book, and his own personal lessons of working with companies such as Twitter, Google, Stripe, and Coinbase. Elad has worked at several high growth companies and invested in others, and he has gathered a lot of wisdom from these different experiences.
Releasing software has inherent risk. If your users don’t like your new feature, they might stop using your product immediately. If a software bug makes it into production, it can crash your entire application.
Releasing software gradually has many benefits. A slow rollout to an increasing population of users allows you to test your software in multiple real-world environments before it goes live to everyone. A system of AB testing different versions of your software lets you see how different flavors of your software perform against similar audiences.
Edith Harbaugh is the CEO of LaunchDarkly, a system for feature management. LaunchDarkly allows developers to deploy new software releases in a controlled fashion. Edith joins the show to discuss how to implement feature flagging, and why an intelligent release process can lead to a more scientific, predictable environment for software development. Edith is also the host of To Be Continuous, a podcast about continuous delivery, software engineering, and DevOps.
- Homepage – LaunchDarkly
- AMA with Edith Harbaugh, Co-founder & CEO of LaunchDarkly
- Feature Management | LaunchDarkly Blog
- The Product Manager’s Guide to Feature Flags
- How LaunchDarkly improves your customer experience | LaunchDarkly Blog
- Feature Flag-Driven Development
- Buying vs Build Feature Flag Driven Development
- Benefits – LaunchDarkly
- Getting Started with Feature Flags – #1 LaunchDarkly Feature Flags – YouTube
- LaunchDarkly – Launch, control, and measure your features – YouTube
The Berkeley AMPLab was a research lab where Apache Spark and Apache Mesos were both created. In the last five years, the Mesos and Spark projects have changed the way infrastructure is managed and improved the tools for data science.
Because of its proximity to Silicon Valley, Berkeley has become a university where fundamental research is blended with a sense of industry applications. Students and professors move between business and academia, finding problems in industry and bringing them into the lab where they can be studied without the day-to-day pressures of a corporation.
This makes Berkeley the perfect place for research around “serverless”.
Serverless computing abstracts away the notion of a server, allowing developers to work at a higher level and be less concerned about the problems inherent in servers–such as failing instances and unpredictable network connections.
With serverless functions-as-a-service, the cloud provider makes guarantees around the execution of serverless code–such as with AWS Lambda. With serverless backend services, the cloud provider makes guarantees around the reliability of a database or queueing system.
The cloud provider is operating servers to power this functionality. But the user is not exposed to those servers.
Today’s show centers around the serverless functions-as-a-service. This is a new paradigm of computing, and there are many open questions. How can the servers for our functions be quickly provisioned? How can we parallelize batch jobs into functions as a service? How can large numbers of serverless functions communicate with each other reliably to coordinate?
In production applications, functions-as-a-service are mostly used for “event-driven” applications. But the potential for functions-as-a-service is much larger.
Ion Stoica is a professor of computer science at Berkeley, where he leads the RISELab. He is the co-founder of Conviva Networks and Databricks. Databricks is the company that was born as a result of the research on Apache Spark. Ion now serves as executive chairman of Databricks. Ion joins the show to describe why serverless computing is exciting, the open research problems, and the solutions that researchers at the RISELab are exploring.
- Occupy the Cloud: Distributed Computing for the 99%
- RISELab at UC Berkeley – REAL-TIME INTELLIGENT SECURE EXECUTION
- pywren — run your python code on thousands of cores – pywren
- IEEE Cloud Serverless Workshop July 2018 Jonas
- GitHub – Vaishaal/numpywren: Serverless Scientific Computing
- Serverless for Data Scientists| Mike Lee Williams @ PyBay2018 – YouTube
- Serverless for data scientists
- Serverless Big Data Analytics at Traveloka (Cloud Next ’18) – YouTube
- With PyWren, AWS Lambda Finds an Unexpected Market in Scientific Computing – The New Stack
Robotics, genomics, and backend infrastructure: as an investor, it can be difficult to assess the viability of a startup that is on the cutting edge in any of these areas.
A robotics startup requires a team with an integrated understanding of hardware and software. A genomics company will not only have to develop a successful healthcare product, but will have to bring it to market through regulation. And in the world of backend infrastructure, building a business that will be differentiated from giant cloud providers gets harder every day.
Amplify Partners is a venture capital fund with an emphasis on technical investments. Their portfolio includes infrastructure companies like Datadog and Gremlin, as well as pharmaceutical and hardware companies.
Sunil Dhaliwal is the founder of Amplify Partners, and joins the show to discuss the thesis of Amplify. The investments that Amplify makes are in technical companies–which makes these financing decisions complex enough to require detailed, individualized research. But there are commonalities among the founding teams. Sunil lays out a useful rubric for anyone who is looking to learn about venture capital investing.