ARTICLE
10 June 2024

Recapping 35 Years In 5 Industry Transitions: The Past Is Prologue - Part 2

In Part 1, I covered five major transitions that have occurred in the last 35 years in the M&E industry and with which I have had direct experience in inventing and creating solutions.
United States Corporate/Commercial Law

In Part 1, I covered five major transitions that have occurred in the last 35 years in the M&E industry and with which I have had direct experience in inventing and creating solutions. To recap, they are:

  • The Introduction of Digital Nonlinear Editing Systems (DNLEs)
  • The Music Industry: From Physical to a Digital Supply Chain
  • File-Based Workflows and Worldwide Digital Media Supply Chains
  • Clouds and Real-Time Live Production
  • AI: Too Early and Now Poised for Real Implementation

The first four on the list have proven that they heralded advances in the content creation, distribution and consumption chain. In this blog installment, I will highlight some of the aspects of AI and ML for M&E from NAB 2024, the National Association of Broadcasters' annual conference, as well as some of the opportunities and challenges for new and incumbent vendors.

Developments in Audio Manipulation

There are certain aspects of AI and ML that clearly fit into a euphemistic category of “things you couldn't do before that you can now do.” For example, automated speech-to-text systems for closed captioning have historically suffered higher levels of inaccuracy when faced with situations of multiple speakers, environmental noise and poor recordings. There are now solutions to these situations where neural networks trained on thousands of hours of test audio cases now enable users to isolate audio from noisy conditions, separate multiple speakers and even remove unwanted pauses and utterances, and all of this can be accomplished automatically. In other words, in one fell swoop, a multitude of past and major impediments have been addressed.

The Rise of Text-Driven Video Editing Systems

In the second volume of one of my textbooks, “Digital Nonlinear Editing,” I wrote, some 30 years ago, about a day when digital nonlinear editing systems (DNLEs) would not require the direct manipulation of video by choosing In and Out markers. Instead of manipulating picture frames, new systems (which would take decades to finally get to market) work differently. After content is digitized, speech-to-text functions are used to create text that is aligned with the content's video and audio tracks. Then, by the well-known word processing actions of cut, copy and paste, words and sentences are moved around and manipulated while, in the background, video and audio are automatically reordered to reflect their newly desired positions.

There are opinions that this new approach does not lend itself to the editing of major motion pictures or television shows, but that is not the point. Providing these solutions holds much promise in that we now have a means by which masses of individuals who are not classically trained at video and audio editing can manipulate content. Consider, for example, the amount of content that is generated for nontheatrical film or television. Corporate enterprise, educational, marketing and training videos are just some of the areas where such easy-to-use solutions can be utilized.

Interestingly, text-based manipulation is not necessarily a new approach, given that the Ediflex system in the 1980s indexed text to video and audio timecode and, by highlighting a section of text, video could be rearranged. In essence this created a video playlist which, in its earliest incarnation, was then performed via six VHS machines.

Generative AI

At NAB 2024, the term “AI” was touted in every hall — AI and ML for automatic image retouching, object removal, fixing audio (see above), and the list goes on. Prior to NAB, there have been significant introductions of generative AI which have captured attention (and some notoriety) and have simultaneously amazed and caused concern. Heretofore, the word “provenance” was used within the world of artwork, paintings, literature and collections of different kinds. Today, establishing provenance, ownership and ensuring that there is proof that content has either not been manipulated or, if it has, that the method and audit trail of those manipulations can be preserved, are major concerns.

At NAB, there were examples of AI-assisted and generative AI applications, at least in these early days of technology introduction and adoption. While it would be foolhardy for me to attempt to list all the companies that had AI-related offerings, one can easily search for, say, “Generative AI + NAB 2024” to explore the wide variety of introductions and enhancements. To be sure, however, the number of companies actually providing solutions for purchase is, today, a much smaller percentage.

Automatic Metadata Extraction and Tagging

By now, many of us have seen the generation of imagery based on text prompts. At NAB, some practical usage of AI could be found in automating some of the more time-consuming but necessary tasks. Automatic metadata extraction continues to improve but at NAB took a leap forward in the array of metadata inspection, tagging, and content and scene descriptions that can be generated. Locating similar content across data stores that are local or across multiple cloud service providers (CSPs) is becoming easier to accomplish while preserving security.

When the major hurdle that content owners still lament is “We don't know what content we have or where it is,” solutions that address these issues are now becoming much easier to implement. Further, shot and scene description and automatic derivation of the nature of the content is becoming more powerful. Individuals who were tasked with logging and tagging and preparing content for usage can now concentrate on other activities. Also, automation is the only sustainable method of being able to process the large amount of content being generated across multiple cameras and multiple hours.

Data Generation, Analysis and Generative for Live Events

As outlined in the first blog in this series, content acquisition, manipulation and distribution for live events from public clouds is no longer in the test and experimentation phase. REMI productions are now the norm and scaling is a logical focus. Key points of effort at NAB in this area include the merging of REMI, machine learning and generative AI driven by natural language processing (NLP) direction.

Sports athletes wearing a variety of sensors as well as multiple cameras at sites combine to generate tens of terabytes of data which must be analyzed and combined with historical statistical data for player, play type, opponent, etc. Most of these components have been in use for several years. The new factor on display at NAB is the funneling of these components into generative AI engines in order to create visuals, trajectories and animations that previously had significant delays or required multiple staff members to create. The goal, of course, is to generate more enriched programming and to, hopefully, build audience viewership and fan loyalties — and to do so as quickly as possible to capitalize on the real-time fan experience.

Generative Not Only To Fix but Also To Create

Although we briefly discussed audio earlier, there were a variety of impressive introductions relating to the use of generative, not only for fixing but for creating new audio content. As you consider the following examples, think about what the historical process would have required to accomplish the following:

  • The creation of multiple languages with perfect synchronization to lip movement
  • Replacing mispronounced words with correct pronunciation regardless of the spoken language
  • Creating words that were not originally spoken via text prompts
  • The creation of artificial voices for live sports narration in multiple languages

All of these would be inherently time-consuming, expensive and potentially difficult to achieve, given a strict timeline.

Creating Things That Never Existed and Replacing Those That Do

Deriving and creating new frames from existing frame samples is markedly improving. Missing a shot or a shot that is too short would, in prior years, require a perceptible slowing of the shot to fit a required length or necessitate seeking an alternative. With predictive frame derivation, pixel matching and duplication, it is possible to solve problems such as these. Key breakthroughs today are automation, quality, speed, better economics and less human intervention.

Object replacement has progressed by leaps and bounds. State-of-the-art had been mask and matte creation, replacement, 3D tracking, adjustments and so forth. At NAB, one could observe interesting demonstrations that combined text-to-video creation, object replacement, automatic tracking and, in very seamless fashion albeit not in real-time (yet!), at full resolution, but those barriers will be addressed going forward. Picture a case not full of a desired number of an object and then, via the above components, overflowing with the desired object and you will get a sense of the applications for content creation.

A range of technologies can now be employed to identify, for example, the watch on a character's wrist and replace it with a different watch brand without requiring laborious human-assisted touchpoints. That cold weather drink can be replaced with a warm weather drink without any reshooting. As improvements continue, we will begin to experience product replacement in the same manner in which dynamic ad insertion operates today.

Defensibility and Balance Sheets

One of the most famous lines from the 1975 film “Jaws” is when Brody says to Quint, “You're gonna need a bigger boat.” As I walked through the various booths at NAB, I thought of that and changed the words to: “You're gonna need a bigger moat.”

For at least six months, I have seen prototype applications that used AI, generative text to video, text to speech and thematic derivation. Recently, I showed a public example of this to a colleague who said, “Give me 48 hours.” And, lo and behold, 48 hours later I was presented with a fairly accurate replica of what I had seen weeks earlier.

The point is clear: As large organizations develop and make available AI engines and technologies, what is the advantage of one company versus another if functionality can be rapidly duplicated by your competitor? Also, if these technologies require licensing fees, can those fees be absorbed by the licensee and built into the cost of an offering?

Further, as AI-driven generative engines increase in capability and availability, what is the defensible moat that an organization needs to stave off competition? It may be the case that an organization with a significant installed base and a robust channel for distribution, that can offer generative AI to its core offerings that are additive to existing functionality, may well have a defensible position and a wide enough moat.

Contrast that with startup organizations that may not have significant cash on hand, free cash flow and the ability to license appropriately and are without an installed base or distribution channel, and it becomes clear that remaining competitive becomes significantly more difficult.

Established Legacy Organizations

Long-time vendors serving the NAB community are companies that have a loyal and typically large installed base of users. Further, over the many years that their customers have been served, much good will has been established. There are some obvious next step items that these companies are encountering. Among them are: 1. Transitioning users from purchase to subscription to SaaS in terms of how products are offered and 2. Deciding if cloud-based implementations make sense from a performance and operational perspective and architecting these solutions to be cloud-first.

The issue of a defensible moat will be recurring. There must be sufficient capital and time to respond to the quickly evolving nature of open-source and licensable technologies that fundamentally changes the nature of an established offering and will almost certainly enable new entrants into the vendor landscape.

In closing, the preamble to this blog series posited that the very nature of substantial transitions is simply accelerating. For example, in the past, technology changes within, say, the broadcasting industry were measured in five and 10 year increments (some would say multiple decades!) but significant changes will now be measured in months.

As always, the convergence of technology and artistry forming the content we interact with remains exciting and inspiring.

Originally published 06 June 2024

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.

Mondaq uses cookies on this website. By using our website you agree to our use of cookies as set out in our Privacy Policy.

Learn More