Home » Media Types » Newsletters

DSC Weekly 27 Sept 2022 – Corpus Wars

  • Kurt Cagle 



  • Join us for Reach 2022, three days packed with thought-provoking sessions as we dive into what will really make a difference in driving revenue and business growth in 2023. Experts from across the B2B landscape will share their experiences, insights and methodologies for achieving the meaningful buyer engagement that delivers revenue growth.
  • The workplace is still reckoning with how the pandemic changed ideas on remote work, work/life balance, productivity and generational attitudes towards what defines a successful career. In the three-day Transforming the Future of Work Summit, leading HR visionaries and experts are joining forces to share how businesses can negotiate the lasting impacts of the changes Covid-19 wrought on the workplace, looking at the full employee experience from onboarding and recruitment, to culture, engagement and innovation. 

Escher’s staircases done in the style of Norman Rockwell, courtesy of Stable Diffusion.

Corpus Wars

You’re an artist. You have worked for a number of years developing a particular style, honing your skills, and developing a reputation. A corporation picks up your image along with millions of others by crawling a search engine. Shortly after that, artwork that looks a lot like yours but that you never produced starts showing up on the web, and your income begins to drop as generative AI copycat versions of your work begins to outcompete yours.

You’re a writer, a programmer, a musician, an industrial designer, an architect, a researcher. This is the reality now facing millions of people who have made their living as an artist or artisan. Generative Adversarial Networks, or GANs, work by taking large amounts of signal data – images, music, text – along with labeling data for classification, and uses this data to create a large machine learning pipeline that will most closely match text descriptions.

This is the same mechanism that search uses, with one important distinction: Whereas search retrieves pre-indexed content, GANs take the indexes as a map to identify and assign weights to multiple sources, then uses algorithmic kernels to blend the resulting images. The nVidia GET3D algorithm goes one step further and generates, from multiple images, a 3D mesh or representation, ostensibly as a mechanism for populating virtual worlds in a metaverse. A related algorithm can then generate multiple “skins” using the same kind of adversarial system to paint and add textures to these meshes.

The issue this brings up is a subtle but profound one: Can one copyright a style, and more to the point, is such copyright enforceable? Already, artists have begun to sue companies that have used their images as part of the corpus set for such GANs. This week, Getty Images, one of the largest stock photo companies in the world, announced a moratorium on purchasing AI-generated artwork for precisely this reason.

However, such artwork (and increasingly videos and literary works) can be difficult to adjudicate because what is being produced is not original work but only echoes existing works stylistically. Artists such as Norman Rockwell or M.C. Escher, for instance, had very distinctive styles. Still, one can argue that a never-ending staircase done in the style of Norman Rockwell is not something that either artist would have produced naturally. It is this gray area that will likely be the foot in the door that GANs producers will use, at the expense of other artists.

Ironically, it may be this particular use case where NFTs, which I’ve characterized as being a technology in search of a problem, may actually find such a problem. Suppose that a phone camera, when it takes a picture, places an NFT onto a blockchain and embeds the public key for that NFT in the image itself. In this case, the images that one creates through that phone provide an identifier, and any GAN system would be required by law to notify and compensate the owner of that NFT before adding the image (or other media work) to the corpus in question. One could arguably extend this to registering likenesses, regardless of the photographer in question.

This approach solves several problems at once, as it would also serve to stem the use of deep fakes and pays creatives for their work. It is also something that social media companies, in particular, will fight tooth and nail against, as it dramatically increases the cost of building out a metaverse system.

It also highlights another significant problem with machine learning systems – they are utterly dependent upon media corpora to train their models. I expect that once the cost of corpus creation (including the NFT fees) is taken into account, machine learning will likely be restricted to where it is now – tracking transactional data within organizations rather than unfettered intellectual property mining. However, such a framework does not exist today. Ultimately the resolution of such a framework will likely depend upon whether those with large content portfolios are willing to challenge the media miners.

Fortunately, such AI-generated work is still in its early days regarding quality. Producing a high-quality image still requires experimentation and time, in effect requiring a human filter to determine when a given work hits a certain threshold as being acceptable. However, things will not likely remain static in this space for long.

In media res,

Kurt Cagle
Community Editor
Data Science Central

DSC Editorial Calendar: October 2022 

Every month, I’ll update this section with many topics I’m especially looking for in the coming month. These are more likely to be featured in our spotlight area. If you are interested in tackling one or more of these topics, we have the budget for dedicated articles. Please contact Kurt Cagle for details. 

  • ESG (Environment-Social-Governance)
  • Digital Privacy
  • The Electric Economy
  • VUCA (Volatility-Uncertainty-Complexity-Ambiguity)
  • Labeled Property Graphs
  • Inferential Machine Learning
  • Geospatial Data
  • Drone Traffic Control
  • Linguistic Intelligence
  • Ethical AI

If you are interested in posting something else, that’s fine too, but these are areas that we believe are hot right now. 

DSC Featured Articles

Picture of the Week

DSC Weekly 27 Sept 2022 – Corpus Wars