Data everywhere. But where is that file?

After asking around in preparation for this focus, we stumbled on a few studios mentioning something called Caringo – an Austin-based provider of storage tools, hardware, software and other things. Since it came recommended, we asked what that “storage thing they do” is …

With a decade of experience in media, medical, high performance computing and adjacent fields, the people of Caringo have provided storage from very large to very small, in most reasonable configurations. Their tools work with all the big cloud storage providers like Microsoft Azure, Amazon WS or the Google Cloud.

DP: Hello Adrian, my drives are full and I can’t find anything anymore …

Adrian J Herrera: Sorry to hear that – but storage that degrades as it reaches capacity and lack of searchability and accessibility were among the issues that the Caringo founders set out to solve. When Caringo was founded in 2005, the storage landscape was different. Enterprise storage devices were large monolithic hunks of metal comprised of proprietary software and hardware. They needed a forklift to be moved into a data center; and “cloud” storage as we know it today wasn’t publicly available. Enterprise storage systems were expensive and difficult to manage, and the data they stored was laborious to find and deliver. Many organizations and businesses were using tape for a data archive. Tape was also difficult to manage and, of course, data stored on tapes wasn’t online and accessible.

The founders of Caringo knew there was a better way and set out to change the economics of storage by creating a software-­defined storage solution (now known as object storage) that installed on commodity hardware. The storage solution they created is easy to scale and employs automated management, with each file or object stored having a unique ID. All you need is that ID to find a specific item and to access it within your network or over the internet, regardless of where it is stored.

With this new object storage technology, you no longer needed to know the server
name, directory path and file name. It sounds pretty normal today, but when we released our first version in 2006, it was a revolutionary approach to storing and accessing data. Since then, we have continued to innovate and enhance our product suite – giving us the most flexible, stable and efficient object storage platform in the market.

DP: Have you worked with VFX / movie companies, and what special requirements did you see that differ from other industries?

Adrian J Herrera: We have production companies, film studios, video-on-demand providers, sports teams and broadcasters as customers, and all rely on VFX in some way. It’s important to note that object storage is a tier 2 or tier 3 storage technology, which means it’s used after something is produced – often as an archive or backup. That said, since object storage is basically a mash-up of storage and a web server – it enables on-­demand delivery of content within a network or over the web. Instant access to archives, the ability to stream content from the archive layer, and plugging into specific workflows and asset management solutions are the mission-critical requirements we see most often from M&E customers.

DP: Looking towards classic broadcasting, what are the problems of the people that the VFX studios are delivering to?

Adrian J Herrera: From an asset-archive perspective, classic broadcasters are struggling with reusing content and recalling project files, driven primarily by new on-
demand workflows. What they store on tape is taking too long to find and restore. More forward-thinking broadcasters are now deploying object storage as a layer of storage in front of tape, since the files on object storage are instantly available. From a VFX perspective, this means archived project files can now be found and delivered instantly. No need to wait for a tape (or many tapes) to load.

DP: If you see media productions, especially VFX with large-scale image and video files, as well as a load of smaller, KB-sized sidecars: What are your tips for keeping transfer rates reasonable?

Adrian J Herrera: File sizes, number of files and available bandwidth are all reasons why content-driven organizations need on-premises storage – a storage box in the studio network. It’s true that object storage is the enabling technology for all cloud storage services. But, when you are looking at file-access fees in the cloud, every API call (regardless of size) to a cloud service incurs a cost. To keep transfer rates reasonable, depending on your size, you need to keep the assets you will reuse instantly accessible in a location (like your own data center) where access costs are minimal.

For the world of movie production to become more organized, a metadata standard needs to be agreed upon so editing platforms and asset management solutions can leverage the metadata capabilities of the underlying storage layer. Alternatively, the industry could start adopting open NoSQL platforms like Elasticsearch. Of course, it’s easier said than done, but things are moving in the right direction. Artificial Intelligence and Machine Learning will likely play an important role here, automatically populating metadata.

DP: You are offering „S3“-Storage as well as object storage and Swarm Services and software. What does that have to do with my files?

Adrian J Herrera: As with any storage solution (or really any technology), you need to be able to actually use it for it to have value to you. Historically, to use object storage, the application you were using needed a direct integration because every solution had a proprietary interface. File-system-based storage doesn‘t have this issue because it relies on the file system to manage application access via standard storage protocols (like CIFS / SMB and NFS).

This is one of the reasons it has taken so long for object storage to become mainstream. But that’s all changing because of the Amazon S3 protocol. With Amazon’s dominance in the cloud storage space, their S3 API has become a de facto standard. The M&E application ecosystem is now finally catching up and almost all editing and digital asset management solutions either already support the S3 API or are planning to within the year. And, all major object storage solutions also support the S3 API so you can actually use object storage, specifically Caringo Swarm, with your existing applications and files.

DP: So, for a medium-sized studio, how would the transfer work to build a  “bulletproof” Swarm environment?

Adrian J Herrera: One of our most recent products, Swarm Single Server, was developed specifically for small studios with limited IT staff. Swarm Single Server is an on-prem, S3-accessible, object-based storage device with built-in content management. The appliance contains all the hardware and software you need to keep archived content online, searchable and web-accessible – secure within your network. It includes 120 TBs of capacity and 3 years of support and maintenance and retails for 50K US-Dollars. That comes out to a little over 0.01 US-Dollar per GB per month over 3 years. If you need more capacity, simply plug in another Single Server. For medium-sized studios storing 500 TBs to multiple PBs, it will be more cost-effective for us to design a solution for you on the hardware of your choice.

DP: Does that tie into the different access points like pipeline management ­tools (for example ftrack or Autodesk Shotgun), the user clicking in some OS, the different playblast handlers and review tools and the backup process?

Adrian J Herrera: As I mentioned earlier, we can plug into any application that supports the S3 protocol. You can mount Swarm via Mac OS, Windows, NFS or SMB. We also have the ability to tier data from Windows Storage Servers or NetApp to Swarm (or even Amazon AWS, Microsoft Azure or Google Cloud) via FileFly. We haven’t specifically tested with Shotgun or ftrack.

DP: With long-term storage and years-long shows: What would be your recommended way of keeping retrievable files without breaking the bank?

Adrian J Herrera: As with any software-defined storage solution, it depends on your performance requirements. If you don’t need high performance, you can go with dense servers and optimize for cost. If you need to serve or stream content directly from Swarm or are frequently accessing assets, you probably want to optimize for throughput and use smaller hard drives in a high-capacity chassis.

DP: With that long-term storage: If money wouldn’t play a part, what would the perfect system be in your personal opinion?

Adrian J Herrera: If money didn’t play a part, then any setup that I can manage from
my mansion on the Amalfi Coast or remotely monitor from my McLaren Senna would be ideal. All jokes aside, it depends. Object storage is about economically storing massive amounts of content and enabling efficient throughput. It can go as fast as the underlying infrastructure, so compute, HDD (or SSD), network speed and available bandwidth all play a big part. For a specific example, we have a performance benchmark overview for one of our customers who optimized their cluster for throughput vs. storage capacity. They used 12 supermicro chassis each with 45 12 TB drives, 2x 25 GB NIC ports, 256 GB RAM and 24x cores. They also employed a 100 GB leaf/spine, super low-latency network configuration.

DP: Let’s keep looking at that team: What would you recommend in terms of fast storage for smaller teams, let’s say 10 people with about 20 TBs of active data?

Adrian J Herrera: We offer a free 20 TB license with our full-featured Developer
Edition. So, for smaller teams with 20 TBs or less, our software is completely free. You can run everything in your VM farm if you wish or you can deploy on dedicated hardware. Running our software on dedicated hardware will always perform better than a VM-based solution. If you are interested, go to http://bit.ly/caringo_register and select “Swarm Developer Edition – complete VM environment” or “Swarm Evaluation software – bare metal deployment” in the “I am interested in” field.

DP: If Swarm and Caringo is too large a feature set, what would be the next step down the ladder in tools you would recommend (thinking about freelancers, one-man bands and specialists)?

Adrian J Herrera: One-man shops probably can’t afford to spend a lot of time managing infrastructure beyond their own workstation, so a wise move would be to use cloud-based services. We recommend using BT’s cloud storage service.

DP: If there is something you could tell people on how not to suffer from data-­overload and delivery anymore, what would that be?

Adrian J Herrera: The first step is accepting the facts: data is no longer deleted, file sizes are increasing, file count is increasing, and access from any device in any location is now a requirement. You will need to take a tiered approach and understand what type of storage you need for your specific requirements. We have an educational blog and webinar on this specific topic that you might find helpful: What are the 5 Tiers of Storage for New Video Production Workflows? And, of course, the Caringo team is also available to help. If you have any questions, just send us an email or give us a call!

Kommentar schreiben

Please enter your comment!
Please enter your name here

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.