111: GreyBeards talk data analytics with Matthew Tyrer, Sr. Mgr. Solutions Mkt & Competitive Intelligence, Commvault
Dec 10, 2020
Sponsored by:
I’ve known Matthew Tyrer, Senior Manager Solutions Marketing and Competitive Intelligence, Commvault for quite awhile now and he’s always been knowledgeable about the problems the enterprise has in supporting and backing up large file data repositories. But lately he’s been focused on Commvault Activatetheir data analytics solution.
We had a great talk with Matthew. He was easy to talk to and knew a lot about how data analytics can ease the operational burden of the enterprise growing file data environments. .Remind me not to have two Matthew’s on the same program ever again. Listen to the podcast to learn more.
Matthew mentioned that their Activate was built on the Commvault platform software stack, which has had a rich and long history of development and customer deployments. It seems that Activate data analytics had been an early part of the platform but recently was split out as a separate solution.
One capability that Activate has that many other data analytics solutions do not, is the ability to examine both online data as well as data in backups. Most analytics solution can do one or the other, only a few do both. But if a solution only has access to online or backup data, they are missing half the story.
In addition, Activate can operate across multiple data centers as well as across multiple public cloud environments to provide analytics for an enterprise’s file data where it may reside.
Given the proliferation of file data these days, data analytics has become a necessity to most large IT shops. In the past, an admin could track some data over time but with the volumes of file data today, this is no longer tenable. At PB or more of file data, located in on prem data centers as well as across multiple clouds, there’s just too much file data to keep track of manually anymore.
Activate also indexes file content to provide more visibility and tracking of the different types of data under management in the enterprise. This is in addition to the extensive metadata that is collected and analyzed so it can better understand data access rights, copies and physical locations around the enterprise.
Activate can help organizations govern their data flows in support of industry as well as government data compliance requirements. Activate Data Governance, one of the three Activate solutions, is focused exclusively on providing enterprises the tools needed to manage any and all data that exists under compliance regulation environments.
Mat Leib had worked in eDiscovery before and it had always been a pain to extract “legally relevant” data from online and backup repositories. With the Activate eDiscovery solution and Activate’s content indexing of all file data, legal can perform their own relevant data searches to create eDiscovery data sets in support of litigation activities. Self service legal extracts like this vastly reduces the admin time and cost needed for eDiscovery.
The Activate File Space Optimization solution was deployed in one environment that had ~20PB of data online. By using File Space Optimization, the customer was able to cut 20PB down to 10PB. Any customer could benefit from such a reduction but customers doing data migration would see even more benefit.
Matthew Tyrer, Senior Solutions Marketing and Competitive Intelligence
Having worked at Commvault for over twelve years, after 8 years as a Sales Engineer Matt took that technical knowledge and transitioned to marketing where he is currently serving as a Senior Manager in Commvault’s Solution Marketing team. He is also heavily involved in Competitive Intelligence initiatives, and actively participates in field enablement programs.
He brings over 20 years’ experience in the IT industry, including within the fields of data and information management, cloud, data governance, enterprise storage, disaster recovery, and ultimately both implementing and supporting those projects and endeavours for public and private sector clients across Canada and around the globe.
Matt’s passion, deep product knowledge, and broad field experiences have enabled him to translate Commvault technology and vision such that their value is easily understood in the market and amongst client and partner families.
A self-described geek-dad, Matt is an avid boardgame enthusiast, firmly believes that Han shot first, and enjoys tormenting his girls with bad dad jokes.
110: GreyBeards talk FMS2020 wrap up with Jim Handy, General Director of Objective Analysis
Dec 02, 2020
This months it’s back to storage and our annual wrap-up on the Flash Memory Summit Conference with Jim Handy, General Director of Objective Analysis. Jim’s been on our show 5 times before and is a well known expert on NAND and SSDs (as well as DRAM and memory systems). Jim also blogs at TheSSDGuy.com and TheMemoryGuy.com just in case you want to learn more.
FMS went virtual this year and had many interesting topics including how computational storage is making headway in the cloud, 3D QLC is hitting the enterprise with PLC on the way, and for a first at FMS, a talk on DNA storage (for more information on this, see our podcast with CatalogDNA). Jim’s always interesting to talk with to help us understand where the NAND-SSD industry is headed. Listen to the podcast to learn more.
Jim mentioned that the major NAND vendors are all increasing the number of layers for their 3D NAND, and it continues to scale well. Most vendors are currently shipping ~100 layer NAND, with Micron doing more than that. And vendor roadmaps are looking at the possibility of 200 layers or more. Jim doesn’t think anyone knows how high it can go.
Another advantage of 3D NAND is it can be used to make bigger bit cells and thus have better endurance. From Jim’s perspective more electrons per cell means a better more resilient bit cell.
Many vendors in the nascent persistent memory industry were all hoping that NAND would stop scaling at some point and they would be able to pick up the slack. But NAND manufacturers found 3D and scaling hasn’t stopped at all. This has relegated most persistent memory vendors to a small niche market with the exception of Intel (and Micron).
Jim said that Intel is losing money on Optane every year, ~$5B so far. But Intel knows that chip profitability is tied to economies of scale, volumes matter. With enough volume, Optane will become cheap enough to manufacture that they will make buckets of money from it.
Interestingly, Jim said that DRAM scaling is slowing down. That means there may be an even bigger market for something close to DRAM access speeds, but with increased density and lower cost. Optane seems to fit that description very well.
Jim also mentioned that computational storage is starting to see some traction with public cloud vendors. Computational storage adds generic compute power to inside an SSD which can be used to perform storage intensive functions out at the SSD rather than transferring data into the CPU for processing. This makes sense where a lot of data would need to be transferred back and forth to an SSD and where computational cycles are just as cheap out on the SSD as in the server. For example, for data compression, search, and video transcoding, computational storage can make a lot of sense. (See our podcast with NGD systems for more informaiton).
In contrast, Open-Channel SSDs are making dumb SSDs, or SSDs without any flash translation layer or other smarts needed to make NAND work as persistent storage bin the enterprise. There’s a small group of system providers that want to perform all this functionality at a global scale (or across multiple SSDs) rather than at the local, SSD drive level.
Another topic that hit it’s stride this year at FMS2020 was Zoned Name Spaces (ZNS). ZNS partitions an SSD into separately addressable segments, to allow higher performing sequential (write) access within those zones. As SSD capacity has increased, IO activity has sky-rocketed and this has led to an “IO blender” effect. Within an IO blender, it’s impossible to understand which IO is following a sequential pattern and which is not. ZNS is intended to solve that probplem
With ZNS SSDs, IOs doing sequential access can have their own partition and that way the SSD can understand its sequential IO and act accordingly. It turns out that sequential writes to NAND can perform much, much faster than random writes.
ZNS was invented for SMR (shingled magnetic recording) disks, because these overwrote portions of other tracks (like roof shingles, tracks on SMR disks overlap). We had heard about ZNS at FMS2019 but had thought this just a better way to share access to a single SSD, by carving it up into logical (mini-)volumes. Jim said that was also a benefit but the major advantage is being able to understand sequential IO and write to the SSD more effectively.
We talked some on the economics of NAND flash, disk and tape as storage media. Jim and I see this continuing a trend that’s been going on for years, where NAND storage cost $/GB ~10X more than disk capacity, and disk storage costs $/GB ~10X more than tape capacity. All three technologies continue their relentless pursuit of increasing capacity but it’s almost like train tracks, all three $/GB curves following one another into the future.
On the other hand, high RPM disk seems to have died, and been replaced with SSDs. Disk manufacturers have seen unit declines but the # GB they are shipping continues to increase. Contrary to a number of AFA system providers, disk is not dead and is unlikely to die anytime soon.
Finally, we discussed DNA storage and it’s coming entry into the storage market. It’s all a question of price of the drive and media technology, size of the mechanism (drive?) and read and write access times. At the moment all these are coming down but are not yet competitive with tape. But given DNA technology trends, there doesn’t appear to be any physical barrier that’s going to stop it from becoming yet another storage technology in the enterprise, most likely at a 10X $/GB cost advantage over tape…
Jim Handy, General Director, Objective Analysis
Jim Handy of Objective Analysis has over 35 years in the electronics industry including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.
A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.
He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media.
109: GreyBeards talk SmartNICs & DPUs with Kevin Deierling, Head of Marketing at NVIDIA Networking
Nov 18, 2020
We decided to take a short break (of sorts) from storage to talk about something equally important to the enterprise, networking. At (virtual) VMworld a month or so ago, Pat made mention of developing support for SmartNIC-DPUs and even porting vSphere to run on top of a DPU. So we thought it best to go to the source of this technology and talk with Kevin Deierling (TechSeerKD), Head of Marketing at NVIDIA Networking who are the ones supplying these SmartNICs to VMware and others in the industry.
Kevin is always a pleasure to talk with and comes with a wealth of expertise and understanding of the technology underlying data centers today. The GreyBeards found our discussion to be very educational on what a SmartNIC or DPU can do and why VMware and others would be driving to rapidly adopt the technology. Listen to the podcast to learn more.
NVIDIA’s recent acquisition of Mellanox brought them Mellanox’s NIC, switch and router technology. And while Mellanox, and now NVIDIA have some pretty impressive switches and routers, what interested the GreyBeards was their SmartNIC technology.
Essentially, SmartNICS provide acceleration and offload of data handling needs required to move data around an enterprise network. These offload services include at a minimum, encryption/decryption, packet pacing (delivering gadzillion video streams at the right speed to insure proper playback by all), compression, firewalls, NVMeoF/RoCE, TCP/IP, GPU direct storage (GDS) transfers, VLAN micro-segmentation, scaling, and anything else that requires real time processing to perform at line speeds.
For those who haven’t heard of it, GDS transfers data from storage directly into GPU memory and from GPU memory directly to storage without any CPU cycles or server memory involvement, other than to set up the transfer. This extends NVMeoF RDMA tech to/from storage and server memory, to GPUs. That is, GDS offers a RDMA like path between storage and GPU memory. GPU to/from server memory direct interface already exists over the PCIe bus.
But even with all the offloads and accelerators above, they can also offer an additional a secure enclave outside the TPM in the CPU, to better isolate security sensitive functionality for a data center. (See DPU below).
Kevin mentioned multiple times that the new unit of computation is no longer a server but rather is now a data center. When you have public cloud, private cloud and other systems that all serve up virtual CPUs, NICs, GPUs and storage, what’s really being supplied to a user is a virtual data center. Cloud providers can carve up their hardware and serve it to you any way you want or need it. Virtual data centers can provide a multitude of VMs and any infrastructure that customers need to use to run their workloads.
Kevin mentioned by using SmartNics, IT or cloud providers can return 30% of the processor cycles (that were being spent doing networking work on CPUs) back to workloads that run on CPUs. Any data center can effectively obtain 30% more CPU cycles and increased networking speed and performance just by deploying SmartNICs throughout all the servers in their environment.
SmartNICs are an outgrowth of Mellanox technology embedded in their HPC InfiniBAND and high end Ethernet switches/routers. Mellanox had been well known for their support of NVMeoF/RoCE to supply high IOPs/low-latency IO activity for NVMe storage over Ethernet and before that their InfiniBAND RDMA technologies.
As Mellanox came out with their 2nd Gen SmartNIC they began to call their solution a “DPU” (data processing unit), which they see forming part of a “holy trinity” underpinning the new data center which has CPUs, GPUs and now DPUs. But a DPU is more than just a SmartNIC.
All NVIDIA SmartNICs and DPUs are based on Mellanox’s BlueField cards and chip technology. Their DPU uses BlueField2 (gen 2 technology) chips, which has a multi-core ARM engine inside of it and memory which can be used to perform computational processing in addition to the onboard offload/acceleration capabilities.
Besides adding VMware support for SmartNICs, PatG also mentioned that they were porting vSphere (ESX) to run on top of NVIDIA Networking DPUs. This would move the core VMware’s hypervisor functionality from running on CPUs, to running on DPUs. This of course would free up most if not all VMware Hypervisor CPU cycles for use by customer workloads.
During our discussion with Kevin, we talked a lot about the coming of AI-ML-DL workloads, which will require ever more bandwidth, ever lower latencies and ever more compute power. NVIDIA was a significant early enabler of the AI-ML-DL with their CUDA API that allowed a GPU to be used to perform DL network training and inferencing. As such, CUDA became an industry wide phenomenon allowing industry wide GPUs to be used as DL compute engines.
NVIDIA plans to do the same with their SmartNICs and DPUs. NVIDIA Networking is releasing the DOCA (Data center On a Chip Architecture) SDK and API. DOCA provides the API to use the BlueField2 chips and cards which are the central techonology behind their DPU. They have also announced a roadmap to continue enhancing DOCA, as they have done with CUDA, over the foreseeable future, to add more bandwidth, speed and functionality to DPUs.
It turns out the real problem which forced Mellanox and now NVIDIA to create SmartNics was the need to support the extremely low latencies required for NVMeoF and GDS IO.
It wasn’t clear that the public cloud providers were using SmartNICS but Kevin said it’s been sort of a widely known secret that they have been using the tech. The public clouds (AWS, Azure, Alibaba) have been deploying SmartNICS in their environments for some time now. Always on the lookout for any technology that frees up compute resources to be deployed for cloud users, it appears that public cloud providers were early adopters of SmartNICS.
Kevin Deierling, Head of Marketing NVIDIA Networking
Kevin is an entrepreneur, innovator, and technology executive with a proven track record of creating profitable businesses in highly competitive markets. Kevin has been a founder or senior executive at five startups that have achieved positive outcomes (3 IPOs, 2 acquisitions). Combining both technical and business expertise, he has variously served as the chief officer of technology, architecture, and marketing of these companies where he led the development of strategy and products across a broad range of disciplines including: networking, security, cloud, Big Data, machine learning, virtualization, storage, smart energy, bio-sensors, and DNA sequencing.
Kevin has over 25 patents in the fields of networking, wireless, security, error correction, video compression, smart energy, bio-electronics, and DNA sequencing technologies.
When not driving new technology, he finds time for fly-fishing, cycling, bee keeping, & organic farming.
108: GreyBeards talk DNA storage with David Turek, CTO, Catalog DNA
Oct 17, 2020
The Greybeards get off the beaten (enterprise) path this month, to see what lies ahead with a discussion on DNA storage. David Turek, CTO, Catalog DNA (@CatalogDNA) is a long time IBMer that had been focused on HPC systems at IBM but left and went to Catalog DNA to pursue the commercialization of DNA storage, an “emerging” technology. CatalogDNA is a company out of Boston that had recently closed a round of funding and are focused on bringing DNA storage out into the world of IT.
David was a pleasure to talk and has lot’s of knowledge on HPC and enterprise data center solutions. He also has a good grasp of what it will take to bring DNA storage to market. Keith has had some prior experience with DNA technologies in BioPharma so could talk in more detail about the technology and its ecosystem. [We’re trying out a new format, let us know what you think; The Eds.]
Ray has written about DNA storage in his RayOnStorage Blog, most recently in April of this year and May of last year. It’s been an ongoing blog topic of his for almost a decade now. When Ray was interviewed about the technology he thought it interesting but had serious obstacles with read and write latencies and throughput as well as the size of the storage device.
Well CatalogDNA seems to have got a good handle on write throughput and are seriously working on the rest.
However, DNA storage’- volumetric density was always of exceptional. Early on in the podcast, David mentioned that DNA storage was 6 orders of magnitude (1 million times) more dense in bytes/mm**3 than magnetic tape today. An LTO8 tape device stores 12TB (uncompressed) in a tape cartridge, 14.2 in**3 (230.3 cm**3) or roughly 845GB/in**3 (52GB/cm**3). One million times this, would be 12EB in the same volume.
The challenge with LTO8, disk or SSD storage today is at some point you have to move the data from one device to a more modern device. This could be every 3-5 years (for disk or SSD) or 25-30 years for tape. In either case, at some point IT would need to incur the cost and time to move the data. Not much of a problem for 100TB or so but when you start talking PB or EB of data, it can be a never ending task.
DNA storage
David mentioned Catalog uses “synthetic DNA” in their storage. This means the DNA it uses is designed to be incompatible with natural DNA such that it wouldn’t work in a cell. It has stops or other biological mechanisms to inhibit it’s use in nature. Yes it uses the same sugars, backbones, and other chemistry of biologically active DNA, but it has been specifically modified to inhibit its use by normal cellular machinery.
DNA storage has a number of unique capabilities :
It can be made to last forever, by being dried out (dessicated) and encased in a crystal and takes 0 power/energy to be stored for eons.
It can be cheaply and easily replicated, almost an infinite number of times, for only the cost of chemical feedstock, chemical interactions and energy. Yes, this may take time but the process scales up nicely. One could make 2 copies in first cycle, 4 in the 2nd, 8 in the 3rd, etc and by doing this it would only take 20 cycles to create a million copies. If each cycle takes 10 minutes, in 3:20, you could have a million copies of 1EB of data.
It can be easily searched for target information. This involves fabricating a DNA search molecule and inserting it into the storage solution. Once there it would match up with the DNA segment that held your key. And of course, the search molecule and the data could be replicated to speed up any search process.
We already mentioned the extreme density advantage above.
Speed of DNA storage access
David said they can already write Catalog DNA storage in MB/sec.
The process they use to write is like a conveyer belt which starts off with a polyethylene sheet (web actually). Somewhere, the digital data comes in, is chunked and transformed into DNA strand (25-50 base pairs) molecules or dots. The polyethylene sheet rolls into a machine that uses multiple 3D print heads to deposit dots (the DNA strand data chunks) at web points. This machine/process deposits 100K or more of these dots onto the web. The sheet then moves to the next stage where the DNA molecules are scraped off and drained into a solution. Then a wet process occurs which uses chemistry to make the DNA more readable and enables the separate DNA molecules to connect into a data strand. Then this data strand goes into another process where it gets reduced in volume and so that it is more stable.
If needed, one can add another step that dries out or desiccates the data strand into even a smaller volume which can then be embedded into a crystalline structure which could last for centuries.
David compared the DNA molecules (data chunks) to Legos, only they are the same pieces in a million different colors Each piece represents some segment of data bits/bytes. Using chemistry and proprietary IP each separate DNA molecule self organizes (connects) into a data strand, representing the information you want to store.
Reading DNA involves, off the shelf, DNA sequencers. The one Catalog currently uses is the Oxford NanoPore device, but there are others. David didn’t say how fast they could read DNA data. But current DNA reading devices destroy the data. So making replicas of the data would be required to read it.
David said their current write device is L shaped with one leg about 14’ (4.3m) long and the other about 12’ (3.7m) long with each leg being about 3’ (0.9m) wide.
Searching EB of data in minutes?!
DNA strands can be searched (matched) using a search molecule and inserting this into the storage solution (that holds the data strands). Such a molecule will find a place in the data that has a matching (DNA) data element and I believe attach itself to the data strand.
For example, lets say you had recorded all of a country’s emails for a month or so and you wanted to search them for the words, “bomb”, “terrorist”, “kill”, etc. One could create a set of search molecules, replicate them any number of times (depending on how quickly you wanted to search the data and how many matches you expected), and insert them into a data pool with multiple data strands that stored the email traffic.
After some time, you’d come back and your search would be done. You’d need to then extract the search hits, and read out the portion of the data strands (emails) that matched. I’m guessing extraction would involve some sort of (wet) chemical process or filtration.
State of Catalog DNA storage
David mentioned that as a publicity stunt they wrote the whole Wikipedia onto Catalog DNA storage. The whole Wikipedia fit into a cylinder about the height of a big knuckle on your hand and in a width smaller than a finger. The size of the whole Wikipedia, with complete edit history is 10TB uncompressed and if they stored all the edit versions plus its media such as images, videos, audio and other graphics, that would add another 23TB (as of end of 2014), so ~33TB uncompressed.
David believes in 18 months they could have a WORM (write once, read many times) data storage solution that could be deployed in customer data centers which would supply immense data repositories in relatively small solution containers.
CatalogDNA is currently in a number of PoCs with major corporations (not labs or universities) to show how DNA storage technology can be used to solve problems.
David believes that at some point they will be able to make compute engines entirely of DNA. At that point, one could have a combined compute and storage (HCI-like) DNA server using the same technology in a solution. And as mentioned previously, one could replicate from one DNA server & storage to a million DNA servers & storage in just 20 cycles. How’s that for scale out.
David Turek, CTO Catalog DNA
Dave Turek is Catalog’s Chief Technology Officer. He comes to Catalog from IBM where he held numerous executive positions in High Performance Computing and emerging technologies.
He was the development executive for the IBM SP program which produced the first commercially successful massively parallel system; he started IBM’s Linux Cluster business; launched an early offering in Cloud computing called Deep Computing Capacity on Demand; produced the Roadrunner system, the world’s first petascale computer; and was responsible for IBM’s exascale strategy which led to the deployment of the Summit and Sierra systems at Oak Ridge and Lawrence Livermore National Laboratories respectively.
David has been invited to testify to Congress on numerous occasions regarding the future of computing in the US and has helped establish technical collaborations with universities, businesses, and government agencies around the world.
107: GreyBeards talk MinIO’s support of VMware’s new Data Persistence Platform with AB Periasamy, CEO MinIO
Sep 25, 2020
The podcast runs ~26 minutes. AB is very technically astute and always a delight to talk with. He’s extremely knowledgeable about the cloud, containerized applications and high performing S3 compatible object storage. And now with MinIO and vSAN Data Persistence under VCF Tanzu, very knowledgeable about the virtualized IT environment as well. Listen to the podcast to learn more. [We’re trying out a new format placing the podcast up front. Let us know what you think; The Eds.]
VMware VCF vSAN Data Persistence Platform with MinIO
Earlier this month VMware announced a new capability available with the next updates of vSAN, vSphere & VCF called the vSAN Data Persistence Platform. The Data Persistence Platform is a VMware framework designed to integrate stateful, independent vendor software defined storage services in vSphere. By doing so, VCF can provide API access to persistent storage services for containerized applications running under Tanzu Kubernetes (k8s) Grid service clusters.
At the announcement, VMware identified three object storage and one (Cassandra) database technical partners that had been integrated with the solution. MinIO was an object storage, open source partner.
VMware’s VCF vSAN Data Persistence framework allows vCenter administrators to use vSphere cluster infrastructure to configure and deploy these new stateful storage services, like MinIO, into namespaces and enables app developers direct k8s API access to these storage namespaces to provide persistent, stateful object storage for applications.
With VCF Tanzu and the vSAN Data Persistence Platform using MinIO, dev can have full support for their CiCd pipeline using native k8s tools to deploy and scale containerized apps on prem, in the public cloud and in hybrid cloud, all using VCF vSphere.
MinIO on the Data Persistence Platform
AB said MinIO with Data Persistence takes advantage of a new capability called vSAN Direct which gives vSAN almost JBOF types of IO control and performance. With MinIO vSAN Direct, storage and k8s cluster applications can co-reside on the same ESX node hardware so that IO activity doesn’t have to hop off host to be performed. In addition, can now populate ESX server nodes with lots (100s to 1000s?) of storage devices and be assured the storage will be used by applications running on that host.
As a result, MinIO’s object storage IO performance on VCF Tanzu is very good due to its use of vSAN Direct and MinIO’s inherent superior IO performance for S3 compatible object storage.
With MinIO on the VCF vSAN Data Persistence Platform, VMware takes over all the work of deploying MinIO software services on the VCF cluster. This way customers can take advantage of MiniO’s fully compatible S3 object storage system operating in their VCF cluster. For app developers they get the best of all worlds, infrastructure configured, deployed and managed by admins but completely controllable, scaleable and accessible through k8s API services.
If developers want to take advantage of MinIO specialized services such as data security or replication, they can do so directly using MinIOs APIs, just like they would when operating bare metal or in the cloud.
AB said the VMware development team was very responsive during development of Data Persistence. AB was surprised to see such a big company, like VMware, operate with almost startup like responsiveness. Keith mentioned he’s seen this in action as vSAN has matured very rapidly to a point of almost feature parity, with just about any storage system out there today .
With MinIO object storage, container applications that need PB of data, now have a home on VCF Tanzu. And it’s as easily usable as any public cloud storage. And with VCF Tanzu configuring and deploying the storage over its own infrastructure, and then having it all managed and administered by vCenter admins, its simple to create and use PB of object storage.
MinIO is already the most popular S3 compatible object storage provider for applications running in the cloud and on prem. And VMware is easily the most popular virtualization platform on the planet. Now with the two together on VCF Tanzu, there seems to be nothing in the way of conquering containerized applications running in IT as well.
With that, MinIO is available everywhere containers want to run, natively available in the cloud, on prem and hybrid cloud or running with VCF Tanzu everywhere as well.
AB Periasamy, CEO MinIO
AB Periasamy is the CEO and co-founder of MinIO. One of the leading thinkers and technologists in the open source software movement,
AB was a co-founder and CTO of GlusterFS which was acquired by RedHat in 2011. Following the acquisition, he served in the office of the CTO at RedHat prior to founding MinIO in late 2015.
AB is an active angel investor and serves on the board of H2O.ai and the Free Software Foundation of India.
He earned his BE in Computer Science and Engineering from Annamalai University.
106: Greybeards talk Intel’s new HPC file system with Kelsey Prantis, Senior Software Eng. Manager, Intel
Sep 17, 2020
We had talked with Intel at Storage Field Day 20 (SFD20), about a month ago. At the virtual event, Intel’s focus was on their Optane PMEM (persistent memory) technology. Kelsey Prantis (@kelseyprantis), Senior Software Engineering Manager, Intel was on the show and gave an introduction into Intel’s DAOS (Distributed Architecture Object Storage, DAOS.io) a new HPC (high performance computing, super computers) file system they developed from scratch to use leading edge, Intel technologies, and Optane PMEM was one of them.
Kelsey has worked on LUSTRE and other HPC file systems for a long time now and came into the company from the acquisition of Whamcloud. Currently, she manages the development team working on DAOS. DAOS is a new HPC object storage file system which is completely open source (available on GitHub).
DAOS was designed from the start to take advantage of NVMe SSDs and Optane PMEM. With PMEM, current servers can support up to 20TB of memory. Besides the large memory sizes, Optane PMEM also offers non-volatile memory and byte addressability (just like DRAM). These two characteristics opens up new functionality that allows DAOS to move beyond legacy, block oriented, storage architectures that have been the only storage solution for HPC (and the enterprise) for decades now.
What’s different about DAOS
DAOS uses PMEM for all metadata and for storing small files. HPC IO has always focused on heavy bandwidth (IO using large blocks) oriented but lately newer applications have emerged, such as AI/ML/DL, data analytics and others, that use smaller files/blocks. Indeed, most new HPC clusters and supercomputers are deploying almost as many GPUs as CPUs in their configurations to support AI activities.
The problem is that these newer applications typically consume much smaller files. Matt mentioned one HPC client he worked with was processing small batches of seismic data, to predict, in real time, earthquakes that were happening around the world.
By using PMEM for metadata and small files, DAOS can be much more responsive to file requests (open, close, delete, status) as well as provide higher performing IO for small files. All this leads to a much better performing system for the new HPC workloads as well as great sustainable performance for the more traditional large file workloads.
DAOS storage
DAOS provides a cluster storage system that can be configured with from 1 (no data protection), but more normally 3 nodes (with data protection) at a minimum to 512 nodes (lab tested). Data protection in DAOS is currently based on mirroring data and can use from 0 to the number of nodes in a cluster as data mirrors.
DAOS system nodes are homogeneous. That is they all come with the same amount of PMEM and NVMe SSDs. Note, DAOS doesn’t support disk drives. Kelsey mentioned DAOS node hardware can be tailored to suit any particular application environment. But they typically require an average of 6% of overall DAOS system capacity in PMEM for metadata and small file activity.
DAOS current supports their own API, POSIX, HDFS5, MPIIO and Apache Spark storage protocols. Kelsey mentioned that standard POSIX uses a pessimistic conflict resolution mode which leads to performance bottlenecks during parallel access. In contrast, DAOS’s versos of POSIX uses optimistic conflict resolution, which means DAOS starts writes assuming there’s no conflict, but if one occurs it handles the conflict in real time. Of course with all the metadata byte addressable and in PMEM this doesn’t take up a lot of (IO) time.
As mentioned earlier, DAOS data protection uses mirror-replicas. However, unlike most other major file systems, DAOS mirroring can be done at the object level. DAOS internally is an object store. Data organization on DAOS starts at the pool level, underneath that is data containers, and then under that are objects. Any object in DAOS can have its own mirroring configuration. DAOS is working towards supporting Erasure Coding as another form of data protection for a future release.
DAOS performance
There’s a new storage benchmark that was developed specifically for HPC, called the IO500. The IO500 benchmark simulates a number of different HPC workloads, measures performance for each of them, and computes an (aggregate) performance score to rank HPC storage systems.
IO500 ranks system performance using two lists: one is for any sized configuration that typically range from 50 to 1000s of nodes and their other list limits the configuration to 10 nodes. The first performance ranking can sometimes be gamed by throwing more hardware into a cluster. The 10 node rankings are much harder to game this way and from our perspective, show a fairer comparison of system performance.
As presented (virtually) at ISC 2020, DAOS took the top spot on the IO500 any size configuration list and performed better than 2X the next best solution. And on the IO500 10 node list, Intel’s DAOS configuration, Texas Advanced Computing (TAC) DAOS configuration, and Argonne Nat Labs DAOS configuration took the top 3 spots and had 3X better performance than the next best, non-DAOS storage system.
The Argonne National Labs has already stated that they will be using DAOS in their new HPC system to be deployed in the near future. Early specifications for storage at the new Argonne Lab required support for 230PB of data and 25TB/sec of bandwidth.
The podcast ran ~43 minutes. Kelsey was great to talk with and very knowledgeable about HPC systems and HPC IO in particular. Matt has worked at Argonne in the past so understood these systems better than I. Sadly, we lost Matt’s end of the conversation about 1/2 way into the recording. Both Matt and I thought that DAOS represents the birth of a new generation of HPC storage. Listen to the podcast to learn more.
Kelsey Prantis heads the Extreme Storage Architecture and Development division at Intel Corporation. She leads the development of Distributed Asynchronous Object Storage (DAOS), an open-source, low-latency and high IOPS object store designed from the ground up for massively distributed Non-Volatile Memory (NVM).
She joined Intel in 2012 with the acquisition of Whamcloud, where she led the development of the Intel Manager for Lustre* product.
Prior to Whamcloud, she was a software developer at personal genomics and biotechnology company 23andMe.
Prantis holds a Bachelor’s degree in Computer Science from Rochester Institute of Technology
105: Greybeards talk new datacenter architecture with Pradeep Sindhu, CEO & Co-founder, Fungible
Aug 18, 2020
Neither Ray nor Keith has met Pradeep before, but Ray was very interested in Fungible’s technology. Turns out Pradeep Sindhu, CEO and Co-founder, Fungible has had a long and varied career in the industry starting at Xerox Parc, then co-founding and becoming chief scientist at Juniper, and now reachitecting the data center with Fungible. Pradeep mentioned this at the end of the podcast, he has always been drawn to hard problems with the potential to open up immense possibilities. What he did at Juniper and what he is planning to accomplish with Fungible both fit that pattern.
Today, in a typical data center, we have servers, networking and storage equipment all connected through a fabric. But from Pradeep’s perspective none of it works well in support of data centric computing. What we have today is operating like changing a screw with a pliers. But if there existed some hardware that can execute data centric computing (or to follow the metaphor, a screw driver) well, the data center would operate much more efficiently, with more performance and better resource use.
Fungible was founded in 2015 with the idea that the industry is moving to a data centric computing paradigm and today’s data center is ill equipped to take IT there.
What is data centric computing
The IT industry has been moving to a new type of computing, that is focused on short bursts of CPU activity with relatively small packets of data coming off the network (from sensors/outside world, from storage, from other servers, etc.). Those workloads are often transient, short lived, are intended to be performed quickly and may not leave any persistent state.
We can see this in the emergence of micro-services architectures with Docker and k8s containers. But you don’t have to be using containers. It’s also present in machine learning where the update cycle of the neural network (with accelerators) takes lot’s of small bursts of computation while it consumes lots of small data items (pictures, text documents, ticker/status logs, etc. ).
Furthermore, the move to commodity hardware has taken the same x86/ARM core CPUs and used them to execute these small bursts of computation. And for some of these operations that may still make sense. But when the data center uses these same cores to perform data path packet processing. It bogs down the network. It consumes a lot of power, adds overhead (higher latencies), leads to packet loss, injects network jitter and a host of other problems.
So, in order to get the data packets to where they need to be with out those problems, networking endpoints need to be changed out to something designed to support data path critical workloads. Pradeep calls these data path critical work items “run to complete” code.
The critical question is what proportion of IT workloads are “data centric’ vs. not. While it might not be that high today, Pradeep and Fungible are betting that it’s going to be getting much higher over time. If we look at hyper-scalars today they are the forefront of this computing paradigm change and much of their workloads are moving to containerized execution.
The DPU enables data centric computing
Fungible plans to add a DPU that supports a power efficient, “run-to-complete” programming engine to the data center. By using DPUs, they can create a true fabric (using IPoE) that’s low latency, low jitter, lossless and provides full cross-sectional bandwidth.
The problem as Pradeep sees it is that the X86 and ARM cores are just not made to execute run-to-comple workloads well and this is required to provide a true fabric. Whereas Fungible has designed the DPU from the start to execute run-to-complete work.
Pradeep sees the data center of tomorrow utilizing JBoF(lash) & JBoD(isk) boxes with DPU(s) in front of them providing storage server services (block, file and object), JBoGP(Us) or JBoFP(GAs) boxes with DPU(s) in front of them providing accelerator/graphics server services, and compute boxes with DPU(s) and x86/ARM cores with DRAM-Optane PMEM in them providing CPU server and client services. All the DPUs together in a cluster would in total provide true fabric services.
Essentially, the DPUs would take over all data path operations and the storage, GPUS, CPUs would handle everything else. In effect, segregating data path and control path services in the data center.
Greenfield, brownfield or both
Keith and I both assumed this would be great for a green field deployments. But,. Pradeep said it’s designed to be incrementally added to servers, JBoFs, JBoDs, JBoGs/JBoFPs and start providing data path services within current data center fabric environments. Even as the rest of the data center remain unchanged.
At some point we talked about the programming model of the DPU. The DPU offers a bring your own Linux OS that can be programmed in any language you choose. But the critical, data-path functionalityi is coded in “C” to run as fast and as efficiently as possible.
Fungible has designed this hardware themselves. We didn’t get to talk about how they plan to market their product to the data center.
Pradeep also said to stay-tuned, and they were just about to announce their first product offering based on the DPU.
The podcast ran ~38 minutes. Pradeep, given his education and experience, is a very knowledgeable individual about the data center environment today. He’s certainly one of the most interesting IT tecnologist we have talked with in a while on the GreyBeards podcast. To say what Fungible is trying to do is aggressive and bold is an understatement. But Pradeep feels this is the only way forward to liberate the data center from its data path chains today. Both Keith and I thought we needed at least another hour or so to truly understand what they are doing and where they are going with it. Listen to the podcast to learn more.
Pradeep Sindhu, CEO and Co-Founder, Fungible
Pradeep Sindhu is CEO and Co-Founder of Fungible, a Santa Clara-based startup providing at-scale, next-generation solutions for the data center, cloud and IT industries. He has been at the forefront of the network and processing industry for over three decades.
As the co-founder and CTO of Juniper Networks, he played a central role in the architecture, design and development of Juniper’s M40 router – the M series was the first of its kind, offering the industry true decoupling of the control plane and the forwarding plane.
Prior to Juniper, he was a Principal Scientist and Distinguished Engineer at the Computer Science Lab at Xerox’s Palo Alto Research Center (PARC) pushing the envelope on what silicon could do for networking and processing.
He is passionate about new ways to support our growing data-centric world with the right combination of hardware and software to build the infrastructure our future needs.
104: GreyBeards talk new cloud defined (shared) storage with Siamak Nazari, CEO Nebulon
Jul 07, 2020
Ray has known Siamak Nazari (@NebulonInc), CEO Nebulon for three companies now but has rarely had a one (two) on one discussion with him. With Nebulon just emerging from stealth (a gutsy move during the pandemic), the GreyBeards felt it was a good time to get Siamak on the show to tell us what he’s been up to. Turns out he and Nebulon decided it was time to completely rethink/rearchitect shared storage for the new data center.
At his prior company, Siamak spent a lot of time with many customers discussing the problems they had dealing with the complexity of managing, provisioning and maintaining multiple shared storage arrays. Somewhere in all those discussions Siamak saw this as a problem that needed a radical solution. If we could just redo shared storage from the ground up, there might be a solution to all these problems.
Redefining shared storage
Nebulon’s new approach to shared storage starts with an SPU card which replaces SAS RAID cards in a server. But instead of creating SAS RAID groups, the SPU creates a shareable, enterprise class, pool of storage across a throng of servers.
They call a collection of servers with SPUs, Cloud Defined Storage (CDS) and it creates a Nebulon nPod. An nPod essentially consists of multiple servers with SPU cards, with or without attached SSD storage, that are provisioned, managed and monitored via the cloud. Nebulon nPod servers are elements or nodes of a shared storage pool across all interconnected SPU servers in a data center.
In an SPU server with local (SAS, SATA, NVMe) SSD storage, the SPU creates an erasure coded pool of storage which can be used to serve (SAS) LUNs to this or any other SPU attached server in the nPod. In a SPU server without local SSD storage, the SPU provides access to any other SPU server shared storage in the nPod. Nebulon nPods only works with flash storage, it doesn’t support spinning media.
The SPU can supply boot storage for its server. There’s no need to have the CPU running OS code to use nPod shared storage. Yes, the SPU needs power and an active PCIe bus to work, but the functionality of an SPU doesn’t require an operational OS to work. The SPU provides a SAS LUN interface to server CPUs.
Each SPU has dual port access to an inter-cluster (25GbE) interconnect that connects all SPUs to the nPod. The nPod inter-cluster protocol is proprietary but takes advantage of standard TCP/IP services across the network with standard 25GbE switching.
The SPU firmware insures that it stays connected as long as power is available to the server. Customers can have more than one SPU in a server but these would be used for more IO performance. Each SPU also has 32GB of NVRAM for caching purposes and it’s also used for power fail fault tolerance.
In the unlikely case that the server and SPU are completely down (e.g. power outage), clients can still access that SPUs data storage, if it was mirrored (see below). When the SPU server comes back up, it will be resynched with any data that had been changed.
Other Nebulon storage features
Nebulon supports data-at-rest encryption, compression and deduplication for customer data. That way customer data is never in plain text as it travels across the nPod or even within the server from the SPU to SSD storage. Also any customer data written to an nPod can be optionally mirrored and as noted above, is protected via erasure coding.
The SPU also supports snapshotting of customer LUN data. So clients can take copies of LUNs and use these for backups, test, dev, etc. SPUs also support asynchronous or synchronous replication between nPods. For synchronous replication and mirrored data, the originating host only sees the IO complete after the data has been received at the target SPU or nPod.
Metadata for the nPod that defines LUN configurations and which server has LUN data is kept across the cluster in each SPU. But metadata on the location of user data within a server is only kept in that server’s SPU.
We asked Siamak whether nPods support SCM (storage class memory). He said not yet, but they’re looking at SCM NVMe storage for use as a potential metadata and data cache for SPUs.
Nebulon Application Centric storage
All the above storage features are present in most enterprise class storage systems. But what sets Nebulon apart from all other shared storage arrays is that their control plane is entirely in the cloud. That is customers point their browser to Nebulon’s control plane and use it to configure, provision and manage the nPod storage pool. Nebulon supports application templates that can be used to configure nPod storage to support standardized applications, such as VMware VMs, MongoDB, persistent storage for K8S containers, bare metal Linux apps, etc.
With the nPod’s control plane in the cloud it makes provisioning, managing and monitoring storage services much more agile. Nebulon can literally roll out new control plane updatesy to their install base on an almost daily basis. Just like any other cloud based or SAAS application. Customers receive the updated nPod control plane functionality by simply refreshing their browser page.
Nebulon’s GoToMarket
Near the end of our podcast, we asked Siamak about how Nebulon was going to access the market. Nebulon’s goto market is to use server OEMs. That is, they have signed agreements with two (and working on a third) server vendors to sell SPU cards with Nebulon control plane access.
During server purchases, customers configure their servers but now along with SAS RAID card options they will now see an Nebulon SPU option. OEM server vendors will bundle SPU hardware and Nebulon control plane access along with all other server components such as CPU’s, SSDs, NICs, etc, This way, the customer will receive a pre-installed SPU card in their server and will be ready to configure nPod LUNs as soon as the server powers on in their network.
Nebulon will go GA in the 3rd quarter.
The podcast ran ~43 minutes. Siamak has always been a pleasure to talk with and is very knowledgeable about the problems customers have in today’s data center environments. Nebulon has given him and his team the way to rethink storage and address these serious issues. Matt and I had a good time talking with Siamak. Listen to the podcast to learn more.
Siamak Nazari, CEO Nebulon
Siamak Nazari is the CEO and Co-founder of Nebulon. Siamak has over 25 years of experience working on distributed and highly available systems.
In his position as HPE Fellow and VP, he was responsible for setting technical direction for HPE 3PAR and its portfolio of software and hardware. He worked on HPE 3PAR technology from 2000 to 2018, responsible for designing and implementing distributed memory management and the high availability features of the system.
Prior to joining 3PAR, Siamak was the technical lead for distributed highly available Proxy Filesystem (pxfs) of Sun Cluster 3.0.
103: Greybeards talk scale-out file and cloud data with Molly Presley & Ben Gitenstein, Qumulo
Jun 23, 2020
Sponsored by:
Ray has known Molly Presley (@Molly_J_Presley), Head of Global Product Marketing for just about a decade now and we both just met Ben Gitenstein (@Qumulo_Product), VP of Products & Solutions, Qumulo on this podcast. Both Molly and Ben were very knowledgeable about the problems customers have with massive data troves.
Qumulo has a long history of dealing with customer issues with data center application access to data, usually large data repositories, with billions of small or large files, they have accumulated over time. But recently Qumulo has taken on similar problems in the cloud as well.
Qumulo’s secret has always been to allow researchers to run their applications wherever their data resides. This has led Qumulo’s software defined storage to offer multiple protocol access as well as a completely native, AWS and GCP cloud version of their solution.
That way customers can run Qumulo in their data center or in the cloud and have the same great access to data. Molly mentioned one customer that creates and gathers data using SMB protocol on prem and then, after replication, processes it in the cloud.
Qumulo Shift
Ben mentioned that many competitive storage systems are business model focused. That is they are all about keeping customer data within their solutions so they can charge for capacity. Although Qumulo also charges for capacity, with the new <strong>Qumulo Shift</strong> service, customer can easily move data off Qumulo and into native cloud storage. Using Shift, customers can free up Qumulo storage space (and cost) for any data that only needs to be accessed as objects.
With Shift, customers can replicate or move on prem or in the cloud Qumulo file data to AWS S3 objects. Once in S3, customers can access it with AWS native applications, other applications that make use of AWS S3 data, or can have that data be accessible around the world.
Qumulo customers can select directories to Shift to an AWS S3 bucket. The Qumulo directory name will be mapped to a S3 bucket name and each file in that directory will be copied to an S3 object in that bucket with the same file name.
At the moment, Qumulo Shift only supports AWS S3. Over time, Qumulo plans to offer support for other public cloud storage targets for Shift.
Shift is based on Qumulo replication services. Qumulo has a number of patents on replication technology that provides for sophisticated monitoring, control and high performance for moving vast amounts of data.
How customers use Shift
One large customer uses Qumulo cloud file services to process seismic data but then makes the results of that analysis available to other clients as S3 objects.
Customers can also take advantage of AWS and other applications that support objects only. For example, AWS SageMaker Machine Learning (ML) processes S3 object data. Qumulo customers could gather training data as files and Shift it to S3 objects for ML training.
Moreover, customers can use Shift to create AWS S3 object backups, archives and DR repositories of Qumulo file data. Ben mentioned DevOps could also use Qumulo Shift via APIs to move file data to S3 objects as part of new application deployment.
Finally, using Shift to copy or move file data to AWS S3, makes it ideal for collaboration by researchers, analysts and just about other entity that needs access to data.
The podcast ran ~26 minutes. Molly has always been easy to talk with and Ben turned out also to be easy to talk with and knew an awful lot about the product and how customers can use it. Keith and I enjoyed our time with Molly and Ben discussing Qumulo and their new Shift service. Listen to the podcast to learn more.
Ben Gitenstein, VP of Products and Solutions, Qumulo
Ben Gitenstein runs Product at Qumulo. He and his team of product managers and data scientists have conducted nearly 1,000 interviews with storage users and analyzed millions of data points to understand customer needs and the direction of the storage market.
Prior to working at Qumulo, Ben spent five years at Microsoft, where he split his time between Corporate Strategy and Product Planning.
Molly Presley, Head of Global Product Marketing, Qumulo
Molly Presley joined Qumulo in 2018 and leads worldwide product marketing. Molly brings over 15 years of file system and archive technology leadership experience to the role.
Prior to Qumulo, Molly held executive product and marketing leadership roles at Quantum, DataDirect Networks (DDN) and Spectra Logic.
Presley also created the term “Active Archive”, founded the Active Archive Alliance and has served on the Board of the Storage Networking Industry Association (SNIA).
(Updated due to formatting problem, The Eds.)
0102 GreyBeards talk big memory data with Charles Fan, CEO & Co-founder, MemVerge
May 27, 2020
It’s been a couple of months since we last talked with a startup, so the GreyBeards thought it was time. We reached out to Charles Fan (@CharlesFan14), CEO and Co-Founder of MemVerge to find out about their big memory solution or as Charles likes to call it, “software defined (big) memory”. Although neither Matt or …
0101: Greybeards talk with Howard Marks, Technologist Extraordinary & Plenipotentiary at VAST
Apr 30, 2020
As most of you know, Howard Marks (@deepstoragenet), Technologist Extraordinary & Plenipotentiary at VAST Data used to be a Greybeards co-host and is still on our roster as a co-host emeritus. When I started to schedule this podcast, it was going to be our 100th podcast and we wanted to invite Howard and the rest …
0100: GreyBeards talk with Colin Gallagher, VP Dig. Infra. Prod. Mkt. @ Hitachi Vantara
Apr 21, 2020
Sponsored By: We have known Colin Gallagher (@worldc3), VP, Digital Infrastructure Product Marketing at Hitachi Vantara, for a long time and he has always been an all around smart storage guy. Colin’s team at Hitachi Vantara are bringing out a brand new, midrange storage system and we thought it would be a good time to …
099: GreyBeards talk Folding@Home with Mike Harsch, a longtime enthusiast
Apr 01, 2020
Mike Harsch (@harschness) is a personal friend, a computer enthusiast with a particular and enduring interest in distributed systems and GPU computing. MIke’s been a longtime user and proponent of Folding@Home, a distributed system focused on protein dynamics that anyone can download and run on their personal computer(s) or gaming devices. We started the discussion …
098: GreyBeards talk data protection & visualization for massive unstructured data repositories with Christian Smith, VP Product at Igneous
Mar 24, 2020
Sponsored By: Even before COVID-19 there was a lot of file data being created and mined, but with the advent of the pandemic, this has accelerated considerably. As such, it seemed an appropriate time to talk with Christian Smith, VP of Product at Igneous, (@IgneousIO) a company that targets the protection and visibility of massive …
097: GreyBeards talk open source S3 object store with AB Periasamy, CEO MinIO
Feb 07, 2020
Ray was at SFD19 a few weeks ago and the last session of the week (usually dead) was with MinIO and they just blew us away (see videos of MinIO’s session here). Ray thought Anand Babu (AB) Periasamy (@ABPeriasamy), CEO MinIO, who was the main presenter at the session, would be a great invite for …
096: GreyBeards YE2019 IT Industry Trends podcast
Jan 02, 2020
In this, our yearend industry wrap up episode, the GreyBeards discuss trends and technologdies impacting the IT industry in 2019 and what’s ahead for 2020. This year we have Matt and Keith on the podcast along with Ray. Just like last year, we start off with NVMeoF. NVMeoF unleashed This year just about every major …
095: GreyBeards talk file sync&share with S. Azam Ali, VP Customer Success at CentreStack
Dec 20, 2019
We haven’t talked with a file synch and share vendor in a while now and Matt was interested in the technology. He had been talking with CentreStack, and found that they had been making some inroads in the enterprise. So we contacted S. Azam Ali, VP of Customer Success at CentreStack and asked if he …
094: GreyBeards talk shedding light on data with Scott Baker, Dir. Content & Data Intelligence at Hitachi Vantara
Dec 05, 2019
Sponsored By: At Hitachi NEXT 2019 Conference, last month, there was a lot of talk about new data services from Hitachi. Keith and I thought it would be a good time to sit down and talk with Scott Baker (@Kraken-Scuba), Director of Content and Data Intelligence, at Hitachi Vantara about what’s going on with data …
93: GreyBeards talk HPC storage with Larry Jones, Dir. Storage Prod. Mngmt. and Mark Wiertalla, Dir. Storage Prod. Mkt., at Cray, an HPE Enterprise Company
Nov 12, 2019
Supercomputing Conference 2019 (SC19) is coming to Denver next week and in anticipation of that show, we thought it would be a good to talk with some HPC storage group. We contacted HPE and given their recent acquisition of Cray, they offered up Larry and Mark to talk about their new ClusterStor E1000 storage system. …
92: Ray talks AI with Mike McNamara, Sr. Manager, AI Solution Mkt., NetApp
Nov 04, 2019
Sponsored By: NetApp NetApp’s been working in the AI DL (deep learning) space for a long time now and announced their partnership with NVIDIA DGX systems, back in August of 2018. At NetApp Insight, this week they were showing off their new NVIDIA DGX systems reference architectures. These architectures use NetApp AFF A800 storage (for …
91: Keith and Ray show at CommvaultGO 2019
Oct 23, 2019
There was a lot of news at CommvaultGO this year and it was our first chance to talk with their new CEO, Sanjay Mirchandani. Just prior to the show Commvault introduced new SaaS backup offering for the mid market, Metallic™ and about a month or so prior to the show Commvault had acquired Hedvig, a …
90: GreyBeards talk K8s containers storage with Michael Ferranti, VP Product Marketing, Portworx
Oct 02, 2019
At VMworld2019 USA there was a lot of talk about integrating Kubernetes (K8s) into vSphere’s execution stack and operational model. We had heard that Portworx was a leader in K8s storage services or persistent volume support and thought it might be instructive to hear from Michael Ferranti (@ferrantiM), VP of Product Marketing at Portworx about …
89: Keith & Ray show at Pure//Accelerate 2019
Sep 18, 2019
There were plenty of announcements at Pure//Accelerate in Austin this past week and we were given a preview of them at a StorageFieldDay Exclusive (SFDx), the day before the announcement. First up is Pure’s DirectMemory. They have added Optane SSDs to FlashArray//X to be used as a read cache for customer data. As you may …
88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019
Aug 30, 2019
Sponsored by: This is another sponsored GreyBeards on Storage podcast and it was recorded at Vmworld 2019. I talked with Jon Hildebrand (@snoopJ123), Principal Technologist at Cohesity. Jon’s been a long time friend from TechFieldDay days and has been working with Cohesity for ~14 months now. For such a short time, Jon’s seen a lot …
Matt and Ray were both at VMworld 2019 in San Francisco this past week, and we did an impromptu podcast on recent news at the show. VMware announced a number of new projects and just prior to the show they announced the intent to acquire Pivotal and Carbon Black. Pat’s keynote the first day was …
86: Greybeards talk FMS19 wrap up and flash trends with Jim Handy, General Director, Objective Analysis
Aug 22, 2019
This is our annual Flash Memory Summit podcast with Jim Handy, General Director, Objective Analysis. It’s the 5th time we have had Jim on our show. Jim is also an avid blogger writing about memory and SSD at TheMemoryGuy and TheSSDGuy, respectively. NAND market trends Jim started off our discussion on the significant price drop …
85: GreyBeards talk NVMe NAS with Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data Inc.
Aug 01, 2019
As most of you know, Howard Marks was a founding co-Host of the GreyBeards-On- Storage podcast and has since joined with VAST Data, an NVMe file and object storage vendor headquartered in NY with R&D out of Israel. We first met with VAST at StorageFieldDay18 (SFD18, video presentation). Howard announced his employment at that event. …
84: GreyBeards talk ultra-secure NAS with Eric Bednash, CEO & Co-founder, RackTop Systems
Jul 09, 2019
We were at a recent vendor conference where Steve Foskett (@SFoskett) introduced us to Eric Bednash (@ericbednash), CEO & Co-Founder, RackTop Systems. They have taken ZFS and made it run as a ultra-secure NAS system. Matt Leib, my co-host for this episode, has on-the-job experience with ZFS and was a great co-host for this episode. …
83: GreyBeards talk NVMeoF/TCP with Muli Ben-Yehuda, Co-founder & CTO and Kam Eshghi, VP Strategy & Bus. Dev., Lightbits Labs
Jun 12, 2019
This is the first time we’ve talked with Muli Ben-Yehuda (@Muliby), Co-founder & CTO and Kam Eshghi (@KamEshghi), VP of Strategy & Business Development, Lightbits Labs. Keith and I first saw them at Dell Tech World 2019, in Vegas as they are a Dell Ventures funded organization. The company has 70 (mostly engineering) employees and …
82: GreyBeards talk composable infrastructure with Sumit Puri, CEO & Co-founder, Liqid Inc.
May 07, 2019
This is the first time we’ve had Sumit Puri, CEO & GM Co-founder of Liqid on the show but both Greg and I have talked with Liqid in the past. Given that we talked with another composable infrastructure company (see our DriveScale podcast), we thought it would be nice to hear from their competition. We …
81: Greybeards talk cloud storage with David Friend, Co-founder & CEO, Wasabi Technologies
Mar 27, 2019
This is our first time talking with David Friend, (@Wasabi_Dave) Co-founder and CEO, Wasabi Technologies, but he certainly knows his way around storage. He has started a number of successful companies, the last one prior to Wasabi was Carbonite, a cloud backup company. Before we get to the podcast, Howard Marks has retired from active …
80: Greybeards talk composable infrastructure with Tom Lyon, Co-Founder/Chief Scientist and Brian Pawlowski, CTO, DriveScale
Feb 26, 2019
We haven’t talked with Tom Lyon (@aka_pugs) or Brian Pawlowski before on our show but both Howard and I know Brian from his prior employers. Tom and Brian work for DriveScale, a composable infrastructure software supplier. There’s been a lot of press lately on NVMeoF and the GreyBeards thought it would be good time to …
79: GreyBeards talk AI deep learning infrastructure with Frederic Van Haren, CTO & Founder, HighFens, Inc.
Jan 28, 2019
We’ve talked with Frederic before (see: Episode #33 on HPC storage) but since then, he has worked for an analyst firm and now he’s back on his own again, at HighFens. Given all the interest of late in AI, machine learning and deep learning, we thought it would be a great time to catch up …
78: GreyBeards YE2018 IT industry wrap-up podcast
Jan 09, 2019
In this, our yearend industry wrap up episode, we discuss trends and technology impacting the IT industry in 2018 and what we can see ahead for 2019 and first up is NVMeoF NVMeoF has matured In the prior years, NVMeoF was coming from startups, but last year it’s major vendors like IBM FlashSystem, Dell EMC …
77: GreyBeards talk high performance databases with Brian Bulkowski, Founder & CTO, Aerospike
Dec 07, 2018
In this episode we discuss high performance databases and the storage needed to get there, with Brian Bulkowski, Founder and CTO of Aerospike. Howard met Brian at an Intel Optane event last summer and thought he’d be a good person to talk with. I couldn’t agree more. Howard and I both thought Aerospike was an …
76: GreyBeards talk backup content, GDPR and cyber security with Jim McGann, VP Mkt & Bus. Dev., Index Engines
Nov 14, 2018
In this episode we talkindexing old backups, GDPR and CyberSense, a new approach to cyber security, with Jim McGann, VP Marketing and Business Development, Index Engines. Jim’s an old industry hand that’s been around backups, e-discovery and security almost since the beginning. Index Engines solution to cyber security, CyberSense, is also offered by Dell EMC …
75: GreyBeards talk persistent memory IO with Andy Grimes, Principal Technologist, NetApp
Nov 06, 2018
Sponsored By: NetApp In this episode we talk new persistent memory IO technology with Andy Grimes, Principal Technologist, NetApp. Andy presented at the NetApp Insight 2018 TechFieldDay Extra (TFDx) event (video available here). If you get a chance we encourage you to watch the videos as Andy, did a great job describing their new MAX …
74: Greybeards talk NVMe shared storage with Josh Goldenhar, VP Cust. Success, Excelero
Oct 30, 2018
Sponsored by: In this episode we talk NVMe shared storage with Josh Goldenhar (@eeschwa), VP, Customer Success at Excelero. Josh has been on our show before (please see our April 2017 podcast), the last time with Excelero’s CTO & Co-founder, Yavin Romen. This is Excelero’s 1st sponsored GBoS podcast and we wish to welcome them …
73: GreyBeards talk HCI with Gabriel Chapman, Sr. Mgr. Cloud Infrastructure NetApp
Oct 26, 2018
Sponsored by: NetApp In this episode we talk HCI with Gabriel Chapman (@Bacon_Is_King), Senior Manager, Cloud Infrastructure, NetApp. Gabriel presented at the NetApp Insight 2018 TechFieldDay Extra (TFDx) event (video available here). Gabriel also presented last year at the VMworld 2017 TFDx event (video available here). If you get a chance we encourage you to watch the …
72: GreyBeards talk Computational Storage with Scott Shadley, VP Marketing NGD Systems
Sep 25, 2018
For this episode the GreyBeards talked with another old friend, Scott Shadley, VP Marketing, NGD Systems. As we discussed on our FMS18 wrap up show with Jim Handy, computational storage had sort of a coming out party at the show. NGD systems started in 2013 and have been working towards a solution that goes general …
Sponsored by: In this episode we talk data protection appliances with Sharad Rastogi (@sharadrastogi), Senior VP Product Management, and Ranga Rajagopalan, Senior Director, Data Protection Appliances Product Management, Dell EMC™ Data Protection Division (DPD). Howard attended Ranga’s TFDx session (see TFDx videos here) on their new Integrated Data Protection Appliance (IDPA) the DP4400 at VMworld last …
70: GreyBeards talk FMS18 wrap-up and flash trends with Jim Handy, General Dir. Objective Analysis
Aug 25, 2018
In this episode we talk about Flash Memory Summit 2018 (FMS18) and recent trends affecting the flash market with Jim Handy, General Director, Objective Analysis. This is the 4th time Jim’s been on our show and has been our go to guy on flash technology forever. NAND supply? Talking with Jim is always a far …
69: GreyBeards talk HCI with Lee Caswell, VP Products, Storage & Availability, VMware
Aug 17, 2018
Sponsored by: For this episode we preview VMworld by talking with Lee Caswell (@LeeCaswell), Vice President of Product, Storage and Availability, VMware. This is the third time Lee’s been on our show, the previous one was back in August of last year. Lee’s been at VMware for a couple of years now and, among other things, is …
68: GreyBeards talk NVMeoF/TCP with Ahmet Houssein, VP of Marketing & Strategy @ Solarflare Communications
Aug 08, 2018
In this episode we talk with Ahmet Houssein, VP of Marketing and Strategic Direction at Solarflare Communications, (@solarflare_comm). Ahmet’s been in the industry forever and has a unique view on where NVMeoF needs to go. Howard had talked with Ahmet at last years FMS. Ahmet will also be speaking at this years FMS (this week …
67: GreyBeards talk infrastructure monitoring with James Holden, Sr. Prod. Mgr. NetApp
Jul 26, 2018
Sponsored by: Howard and I first talked with James Holden, NetApp Senior Product Manager for OnCommand Insight and Cloud Insights, last month, at Storage Field Day 16 (SFD16) in Waltham, MA. At the time, we thought it would be great to also have him on the show. James has been with the NetApp OnCommand Insight (OCI) …
66: GreyBeards talk Midrange storage part 2, with Sean Kinney, Sr. Dir. Midrange Storage Mkt, Dell EMC
Jul 24, 2018
Sponsored by: Dell EMC Midrange Storage In this episode we talk with Sean Kinney (@SeanRKinney14), senior director, midrange storage marketing at Dell EMC. Howard and I have both known Sean for a number of years now. Sean has had multiple roles in the IT industry, doing various marketing and management duties at multiple vendors. He’s …
65: GreyBeards talk new FlashSystem storage with Eric Herzog, CMO and VP WW Channels IBM Storage
Jul 10, 2018
Sponsored by: In this episode, we talk with Eric Herzog, Chief Marketing Officer and VP of WorldWide Channels for IBM Storage about the FlashSystem 9100 storage series. This is the 2nd time we have had Eric on the show (see Violin podcast) and the 2nd time we have had a guest from IBM on our …
64: GreyBeards discuss cloud data protection with Chris Wahl, Chief Technologist, Rubrik
Jun 21, 2018
Sponsored by: In this episode we talk with Chris Wahl, Chief Technologist, Rubrik. This is our second time having Chris on our show. The last time was about three years ago (see our Chris on agentless backup podcast). Talking with Chris again was great and there’s been plenty of news since we last spoke with …
63: GreyBeards talk with NetApp A-Team members John Woodall & Paul Stringfellow
Jun 20, 2018
Sponsored by NetApp: In this episode, we talk with NetApp A-Team members John Woodall (@John_Woodall), VP Eng, Integrated Archive Systems and Paul Stringfellow (@techstringy), Technical Dir. Data Management Consultancy, Gardner Systems Plc. Both John and Paul have been NetApp partners for quite awhile (John since the beginning of NetApp). John and Paul work directly with …
62: GreyBeards talk NVMeoF storage with VR Satish, Founder & CTO Pavilion Data Systems
Jun 16, 2018
In this episode, we continue on our NVMeoF track by talking with VR Satish (@satish_vr), Founder and CTO of Pavilion Data Systems (@PavilionData). Howard had talked with Pavilion Data over the last year or so and I just had a briefing with them over the past week. Pavilion data is taking a different tack to …
61: GreyBeards talk composable storage infrastructure with Taufik Ma, CEO, Attala Systems
May 22, 2018
In this episode, we talk with Taufik Ma, CEO, Attala Systems (@AttalaSystems). Howard had met Taufik at last year’s FlashMemorySummit (FMS17) and was intrigued by their architecture which he thought was a harbinger of future trends in storage. The fact that Attala Systems was innovating with new, proprietary hardware made an interesting discussion, in its own …
60: GreyBeards talk cloud data services with Eiki Hrafnsson, Technical Director, NetApp
May 15, 2018
Sponsored by:In this episode, we talk with Eiki Hraffnsson (@Eirikurh), Technical Director, NetApp Cloud Data Services. Eiki gave a great talk at Cloud Field Day 3 (CFD3), although neither Howard nor I were in attendance. I just met Eiki at a NetApp Spring Analyst event earlier this month and after that Howard and I had …
59: GreyBeards talk IT trends with Marc Farley, Sr. Product Manager, HPE
Apr 19, 2018
In Episode 59, we talk with Marc Farley, Senior Product Manager at HPE discussing trends in the storage industry today. Marc been on our show before (GreyBeards talk Cloud Storage…, GreyBeards video discussing file analytics, Greybeards talk cars, storage and IT…) and has been a longtime friend and associate of both Howard and I. Marc’s …
58: GreyBeards talk HCI with Adam Carter, Chief Architect NetApp Solidfire #NetAppHCI
Mar 16, 2018
Sponsored by: NetApp In this episode we talk with Adam Carter (@yoadamcarter), Chief Architect, NetApp Solidfire & HCI (Hyper Converged Infrastructure) solutions. Howard talked with Adam at TFD16 and I have known Adam since before the acquisition. Adam did a tour de force session on HCI architectures at TFD16 and we would encourage you to view the …
57: GreyBeards talk midrange storage with Pierluca Chiodelli, VP of Prod. Mgmt. & Cust. Ops., Dell EMC Midrange Storage
Feb 21, 2018
Sponsored by: Dell EMC Midrange Storage In this episode we talk with Pierluca Chiodelli (@chiodp), Vice President of Product, Management and Customer Experience at Dell EMC Midrange storage. Howard talked with Pierluca at SFD14 and I talked with Pierluca at SFD13. He started working there as a customer engineer and has worked his way up to VP …
56: GreyBeards talk high performance file storage with Liran Zvibel, CEO & Co-Founder, WekaIO
Feb 15, 2018
This month we talk high performance, cluster file systems with Liran Zvibel (@liranzvibel), CEO and Co-Founder of WekaIO, a new software defined, scale-out file system. I first heard of WekaIO when it showed up on SPEC sfs2014 with a new SWBUILD benchmark submission. They had a 60 node EC2-AWS cluster running the benchmark and achieved, at …
55: GreyBeards storage and system yearend review with Ray & Howard
Dec 30, 2017
In this episode, the Greybeards discuss the year in systems and storage. This year we kick off the discussion with a long running IT trend which has taken off over the last couple of years. That is, recently the industry has taken to buying pre-built appliances rather than building them from the ground up. We can …
54: GreyBeards talk scale-out secondary storage with Jonathan Howard, Dir. Tech. Alliances at Commvault
Nov 30, 2017
This month we talk scale-out secondary storage with Jonathan Howard, Director of Technical Alliances at Commvault. Both Howard and I attended Commvault GO2017 for Tech Field Day, this past month in Washington DC. We had an interesting overview of their Hyperscale secondary storage solution and Jonathan was the one answering most of our questions, so …
53: GreyBeards talk MAMR and future disk with Lenny Sharp, Sr. Dir. Product Management, WDC
Oct 27, 2017
This month we talk new disk technology with Lenny Sharp, Senior Director of Product Management, responsible for enterprise disk with Western Digital Corp. (WDC). WDC recently announced their future disk offerings will be based on a new disk recording technology, called MAMR or microwave assisted magnetic recording. Over the last decade or so the disk …
52: GreyBeards talk software defined storage with Kiran Sreenivasamurthy, VP Product Management, Maxta
Sep 28, 2017
This month we talk with an old friend from Storage Field Day 7 (videos), Kiran Sreenivasamurthy, VP of Product Management for Maxta. Maxta has a software defined storage solution which currently works on VMware vSphere, Red Hat Virtualization and KVM to supply shared, scale out storage and HCI solutions for enterprises across the world. Maxta is similar …
51: GreyBeards talk hyper convergence with Lee Caswell, VP Product, Storage & Availability BU, VMware
Aug 25, 2017
Sponsored by: VMware In this episode we talk with Lee Caswell (@LeeCaswell), Vice President of Product, Storage and Availability Business Unit, VMware. This is the second time Lee’s been on our show, the previous one back in April of last year when he was with his prior employer. Lee’s been at VMware for a little over …
50: Greybeards wrap up Flash Memory Summit with Jim Handy, Director at Objective Analysis
Aug 19, 2017
In this episode we talk with Jim Handy (@thessdguy), Director at Objective Analysis, a semiconductor market research organization. Jim is an old friend and was on last year to discuss Flash Memory Summit (FMS) 2016. Jim, Howard and I all attended FMS 2017 last week in Santa Clara and Jim and Howard were presenters at the …
49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium
Aug 15, 2017
Sponsored By: In this episode we talk with Brian Biles, CEO and Co-founder of Datrium. We last talked with Brian and Datrium in May of 2016 and at that time we called it deconstructed storage. These days, Datrium offers a converged infrastructure (C/I) solution, which they call “open convergence”. Datrium C/I Datrium’s C/I solution stores persistent data …
48: Greybeards talk object storage with Enrico Signoretti, Head of Product Strategy, OpenIO
Jul 25, 2017
In this episode we talk with Enrico Signoretti, Head of Product Strategy for OpenIO, a software defined, object storage startup out of Europe. Enrico is an old friend, having been a member of many Storage Field Day events (SFD) in the past which both Howard and I attended and we wanted to hear what he …
47: Greybeards talk Storage as a Service with Lazarus Vekiarides, CTO & Co-Founder ClearSky Data
Jun 13, 2017
Sponsored By: In this episode, we talk with ClearSky Data’s Lazarus Vekiarides, CTO and Co-founder, who we have talked with before (see our podcast from October 2015). ClearSky Data provides a storage-as-a-service offering that uses an on-premises appliance plus point of presence (PoP) storage in the local metro area to hold customer data and offloads this data …
46: Greybeards discuss Dell EMC World2017 happenings on vBrownBag
Jun 02, 2017
In this episode Howard and I were both at Dell EMC World2017 this past month and Alastair Cooke (@DemitasseNZ) asked us to do a talk at the show for the vBrownBag group (Youtube video here). The GreyBeards asked for a copy of the audio for this podcast. Sorry about the background noise, but we recorded live at …
45: Greybeards talk desktop cloud backup/storage & disk reliability with Andy Klein, Director Marketing, Backblaze
May 11, 2017
In this episode, we talk with Andy Klein, Dir of Marketing for Backblaze, which backs up desktops and computers to the cloud and also offers cloud storage. Backblaze has a unique consumer data protection solution where customers pay a flat fee to backup their desktops and then may pay a separate fee for a large recovery. On their …
44: Greybeards talk 3rd platform backup with Tarun Thakur, CEO Datos IO
May 02, 2017
Sponsored By: In this episode, we talk with a new vendor that’s created a new method to backup database information. Our guest for this podcast is Tarun Thakur, CEO of Datos IO. Datos IO was started in 2014 with the express purpose to provide a better way to back up and recover databases in the cloud. …
43: GreyBeards talk Tier 0 again with Yaniv Romem CTO/Founder & Josh Goldenhar VP Products of Excelero
Apr 19, 2017
In this episode, we talk with another next gen, Tier 0 storage provider. This time our guests are Yaniv Romem CTO/Founder & Josh Goldenhar (@eeschwa) VP Products from Excelero, another new storage startup out of Israel. Both Howard and I talked with Excelero at SFD12 (videos here) earlier last month in San Jose. I was very impressed …
42: GreyBeards talk next gen, tier 0 flash storage with Zivan Ori, CEO & Co-founder E8 Storage.
Mar 15, 2017
In this episode, we talk with Zivan Ori (@ZivanOri), CEO and Co-founder of E8 Storage, a new storage startup out of Israel. E8 Storage provides a tier 0, next generation all flash array storage solution for HPC and high end environments that need extremely high IO performance, with high availability and modest data services. We first …
41: Greybeards talk time shifting storage with Jacob Cherian, VP Product Management and Strategy, Reduxio
Feb 17, 2017
In this episode, we talk with Jacob Cherian (@JacCherian), VP of Product Management and Product Strategy at Reduxio. They have a produced a unique product that merges some characteristics of CDP storage and the best of hybrid and deduplicating storage today into a new primary storage system. We first saw Reduxio at VMworld a couple …
40: Greybeards storage industry yearend review podcast
Jan 03, 2017
In this episode, the Greybeards discuss the year in storage and naturally we kick off with the consolidation trend in the industry and the big one last year, the DELL-EMC acquisition. How the high margin EMC storage business is going to work in a low margin company like Dell is the subject of much speculation. That …