Machine learning based on large datasets is disrupting fundamental business models across multiple industries and is expected to affect all sectors and society as a whole. There will be winners and losers. Matt Hervey looks at best practices for protecting investments in AI and unlocking the value in data.
Jocelyn Paulley: Good morning everyone and welcome to the fourth in our IT Masterclass webinar series developed specifically for in-house counsel. Good morning, hello everyone and thank you for joining us. I'm Jocelyn Paulley, I'm a Partner in our Commercial IT & Outsourcing team at Gowling WLG, and I'm going to be chAIring our Q&A session today with Matt Hervey, a Partner and Head of our AI team and today's topic is on intellectual property strategies for data and artificial intelligence. The background for today's session is that machine learning based on large data sets is disrupting fundamental business models across multiple industries in all sectors and ultimately even society itself is going to be affected. Clearly there's going to be winners and losers in that process so Matt is going to talk to us today about how to protect your AI investments and how to unlock and realise the value in your data.
Just some housekeeping before we begin, please do put your questions in the Q&A box on your screen as you go along and as they occur to you, we'll pick them up. Any we don't pick up as we go along we will deal with at the end. I can also confirm that the session today is going to be recorded, it will be avAIlable online afterwards and we will also circulate a recording to all of those of you who are attending.
The format today will be slightly different to our traditional IT Masterclasses if you attended the others in the series. Matt and I are going to have more of a Q&A conversational style session so we're not heavy on slides today so don't be concerned if you think your slide hasn't changed that you have in front of you.
OK, without further ado here we go. Good morning Matt.
Matt Hervey: Morning Joss, how are you?
Jocelyn: Very well thank you. So you are Head of the AI team here at Gowling WLG. AI will strike many of our attendees as a new legal discipline, so how have you come to be interested in the area, what's your expertise and who else in your team?
Matt: So I've always been a technophile and before I got involved with the AI project I always did tech patent cases so lots of telecoms work, video compression, that sort of thing. My interest was sparked back in about 2011 in a wide magazine article on autonomous vehicles which was research ongoing under the Egis of DARPA in the US since 2001, I hadn't previously heard of it and as a tech lawyer I immediately saw the issues for liability, for standard essential patents and all sorts of unresolved legal issues. Then brilliantly actually ThinkHouse got me thinking about this in terms of AI, and I gave a couple of talks over the last few years on that, and then I realised actually there are so many general issues, cross-practice issues, so affecting competition law in terms of access to data, IP rights about how you protect this, product liability, radical changes to employment if AI automates more tasks, and so I realised that really it actually required someone to pull all this together and so I've written a book on that with Sweet & Maxwell or I've edited it with about 20 odd authors from academia, the bar and private practice to really capture that sort of holistic view others required. As a firm we fully embrace that and we are gathering together a cross-practice team of experts, globally as well I should add, so that when any client comes in with any sort of project involving AI we can immediately give them an holistic view because you may want to do one thing in terms of the business case but you, Joss, me say there's an issue with data protection, our regulatory specialist may say there's an issue with disclosure or proof of safety and I as an IP lawyer may have concerns about how to protect it and then Bernardine Adkins, our competitional specialist, may say that there will be this is competitional. So I think, whereas some firms are looking at this mAInly as a corporate transaction issue, we are very much making sure we can give holistic advice on here.
Jocelyn: As an IT lawyer, I am used to hearing lots of buzz words and say fads in the technology space as new technologies come to the fore, everyone gets very excited and then we go through a year or so of all being very excited, and they don't deliver or they're not adopted by as many or as quickly as we had hoped. Do you think that'll also be the case that they are, or are they going to be different, somehow?
Matt: I fully believe it's going to be different, and I have spent a decent chunk of time on other fads. 3-D printing was expected to have a big impact on IP rights because if you could manufacture any product at home, it would change the whole market, counterfeiting would be a real issue, and I'm glad to say that I called it "hype" at the time. I personally am a bit of a block chAIn sceptic as well. I think, in the vast majority of cases, a mere database would be perfectly adequate but really it's like crypto-currency or for immutable records held by governments where there's a real application, and a lot of it's just hype and will go by-the-by.
But AI I think is really, really different, and that is because since about 2012, there's been a real step change in what can be achieved with AI, and it really affects almost every industry and I think that's because AI has now unlocked just a couple of really intractable human skills, particularly vision - so the ability to see and understand the world - and also to understand language - and if you think of it on those two bases, once you've unlocked those two, you've unlocked so many human activities. So self-driving cars is a remarkable technology, but really, the difficult bit was vision. Everything else is just physics and mechanics. It's not that interesting in terms of the technical challenge, but once you have vision, you can literally change your cities as a result, and the way we live. So the implications are vast, but also the reality is actually here, so self-driving cars are really imminent in terms of technology, at least at what we call level four, where, in certAIn circumstances, they could be wholly autonomous. It's really actually the legal issues, the legal impediments which are holding them back, so the likes of Tesla, in the States they say they're ready to deploy, but the estimates, at least for last year, were that there might be six years' work just to get the legislation changed to make it possible. Hence, our role in all of this to make this possible, and I want to make a few other points about why it really matters.
First of all, there's a McKinsey study on the adoption of AI by businesses, and back in 2018 they'd already found that a fifth of all European businesses were actually investing in AI and in some industries, it's closer to 100%. So for example, in the insurance industry, everyone seems to be piling in, and it just gives you a sense of its extent when the ball gets rolling. The importance has been recognised by governments, so AI is now the focus of the industrial strategies of most developed nations including the UK, and various governments and inter-governmental bodies are working on specific regulation, and that never happened for 3D printing and it never happened for block chAIn - they were fit into the normal regulations.We have specific regulatory frameworks being drawn up for automotive, for aviation, for medicine, for example, and at the EU level, and at our national level, there are general regulations being written for AI specifically. We have an office for AI and we have the centre for Data, Innovation and Ethics in the UK, so it's actually created new Government bodies, so I think that gives you a sense of how important this is.
It's also been recognised by investors, and the UK enjoys the third highest level of investment in this field of any country in the world, and we have an amazing track record, so Deep Mind was sold to Google for half a billion, Benevolent AI is valued at well over £1b and they are both AI poster-children for the UK, we have a Centre of Excellence at the Alan Turing Institute which pulls together our five leading universities and we have clusters at Cambridge and in London of serious expertise. We advise many start-ups out of the Cambridge-London corridor, particularly bringing together AI and life science. I think the other issue is its potential to disrupt, so life sciences, agAIn, big pharma are now looking at a field of over 1000 new start-ups, really trying to eat their lunch, and they're beginning to invest in the technology as well, and automotive is a full-on existential race to see who wins the market and wins the future of autonomous driving. So really, I would suggest strongly that any in-house lawyer should at least know the general shape of the area in terms of the technology and in terms of its legal implications.
Jocelyn: It certAInly seems very different to previous fads where you have world start-ups but not necessarily that engagement from existing larger companies and particularly what you might call more traditional industries - you mentioned insurance, you didn't see them rushing to adopt some other technologies. It does feel like it's going to be different.
Conversations I've had with clients and in the works that you and I increasingly seem to do together in this space, the other fundamental element of AI in its value is the data that's either pushed in to help trAIn, or indeed the data that results once the artificial intelligence has crunched all its numbers and worked its way through the process and the software and technology is nothing without the data that feeds into it and results from it, so it seems to me very much that the two go hand-in-hand. So should companies also be thinking about the data as well as an AI strategy or AI investment?
Matt: Yes, and I think that's a really important point because when you think about it, even if a company doesn't want to develop AI itself, even if it's not a tech company, it might well be sitting on data which would be of use to those people who wish to do that. I'll explAIn why the data matters, because AI is really old. People have been researching it since at least the 1950s, but there's the step change I mentioned, in around 2012, was really the leap forward in a particular form of AI called machine learning, and in essence, it's a computer programming a computer. It's really that step change which has allowed us to solve those impractical problems like vision, because we don't have to know how the system works - we just need to know the data so that it can figure it out for itself, and really it's supervised learning where data comes into its own, and in particular, large, even ultra-large labelled data sets. So ideally you'd find something pre-existing, and so you might have to have, for example, data bases in medicine of x-rays which have been labelled with the diagnosis or even better what was finally established to be the case of a x-ray, and you can use that body of x-rays with the labelled information, so that a machine can learn to label a new x-ray and therefore to diagnose what's going on with a patient. Or you have to generate the labels - but agAIn, you need the raw data of some sort, so for example, if you have a network of CCTV cameras, there will be companies who will want to purchase your video feed and then get your video feed hand-labelled to say that's a pedestrian, that's a car, that's a cyclist, to generate trAIning data to go into machine learning, and so there's a hunt constantly for suitable data sets, and the really interesting thing is the value is so unclear at the moment, and changing, and I'll just give you a flavour of that.
First of all, the value of your data may be of totally different value to different potential collaborators depending on what their AIms are, and to give you an example, CCTV agAIn of a street - an individual data source may contAIn so many different data points for different people, so CCTV might be useful for geography, for road layout, for driver behaviour, for traffic flows, for weather, and it might be very valuable in terms of weather and not very valuable in terms of road layout, but you never know, you just need to experience this and to value as best you can. Also your data may be radically different in value to a collaborator depending on what data they have. Your data may unlock the rest of their data in a key way, so that it becomes particularly valuable to them and the other thing about data is it's inexhaustible in the sense that I can licence it to multiple parties and the data doesn't disappear - it has different values to different people, so you have to constantly consider that, and the market itself is evolving.
The EU is deliberately introducing measures to create public data pools and to create the EU as a data market place. They're also looking at enforcing data sharing, so to break down the incumbents the big incumbents, particularly, the big US big tech companies who sit on so much data, they have such an advantage, so that may radically change the value of data. Then, of course, there are all these concerns via regulation on privacy, data bias - and that will affect the value of your data. Is it actually GDPR compliant? If it's not, does it have any value? Can it be used safely? Is your data legal, and are you liable for that data? So if your data isn't clean, if it's actually biased, what sort of warranties are you going to have to give? What effect will that have on the value?
So, really, I think at the moment, the simple best-practice rule is giving the uncertAInty of value, do not give away your data, do not exclusively licence your data. Keep as much control of it as you can, because it may be more valuable tomorrow.
Jocelyn: I think that's a new mind-set for a lot of companies, particularly in traditional industries to think of their data itself as an asset where they have to think about protecting it, controlling it and valuing it, as you sAId, which is quite a difficult jump if, traditionally, you make cars or you're in insurance, or your producing drugs - it's a very different world to be thinking about. So what, as lawyers, are the tools we could help give people to help people to help them think about protection, you are an IP lawyer so I am imagining there's a strong IP bent towards certAIn IP rights you could use to help protect both the data and the AI developments themselves?
Matt: Yes, the interesting thing is the IP is my mAIn concern and the thing I'd always turn to first, and it is undoubtedly, key consideration in AI. Some purists would say that trade secrets isn't an IP right but really that's where the game is in town, trade secrets and contract. But let me just talk about traditional IP rights proper, just to give you a sense of the challenge. So I think the key assets when it comes to AI is your development tools, so you can have a platform for developing AI so tends to flow from Google as a platform anyone can use and that will obviously attract copyright as a programme and branding and the like. You got your learning techniques, you got data processing methods, because you can't just put data into an AI, you need to process in various ways, cleaning up the data, reduce the number of pixels, that sort of thing.
You have your trAIned AI models, so once you've actually done the trAIning process, you may have a freestanding piece of software essentially, or something you baked into a camera which would have your visual analysis, so a camera that can identify pedestrians and the like. Then you have products of AI, so this is their predictions or maybe you have a journalist AI that's writing copy for newspapers or you've got a creative AI that's creating pAIntings or just identifying targets of pharmaceutical research.
So you've got all of these potential assets and you're trying to fit them into traditional IP rights which simply were never designed with AI in mind and deliberately excluded data in the form of information, so just to give you a feel for that, European patent convention was finalised in 1973 and that is still the rules for what inventions can be protected, and at that point, computers were using punch-cards - they just weren't thinking about inventions by AI and the like. Also, information is expressly excluded from copyright protection by very old international treaty, so it was never the intention that these sorts of assets, well, they were never properly considered, let's put it that way.
So, if I look at patents first. There's been a massive increase in patents relating to AI over the last decade, I mean at least 800% rise, but they're applications, not necessarily granted patents, and it's relatively hard to get patents in the AI space, and that's because, as I mentioned earlier, research has been going on since the 1940s and 50s, and a lot of the fundamental ideas are actually very old and can't be patented as they are already known. Secondly, patent protection excludes protection for mathematical methods, methods of doing business, and computer programs as such, and that carves out many potential AI inventions. Now, I can protect, for example, methods of preparing data, and methods of trAIning on data, you know, clever little twiddles on how AI is done, but certAInly not the data itself, that just not within the scope of patents.
Then on copyright, data in the field of AI is quite a broad term, so it can just mean mere information, or it can mean the stuff you put into your AI, so when a computer scientist talks about data, they literally mean photos or video streams or handwritten notes, so they might well be copyright, individually - photographs, medical descriptions, articles, that sort of copyright, but not mere information, so not the extracted data, not weights and measures of your goods, so that sort of thing. Copyright is also suitable for computer programs, so if you've written your AI platform, if a human has written it, the form of expression, bizarrely, it's a strange thing to think about computer programs, but that will be protected, so someone can't just copy your computer program. But sadly, they are entitled to copy its functions, so copyright is no way to protect the functions of a computer program. When it comes to works generated by AI, so its predictions or an artwork or an article for a newspaper, it's not really clear where we've ended up on that, because the UK had the intention of protecting such works and has a specific provision - section 93 for computer-generated works - but since that, the EU has harmonised the test for originality to be the author's own intellectual creation, and even our own UK IPO now says they don't know if section 93 still works, because of the later law in Europe and they're consulting on that point at the moment.
Then finally, in terms of traditional IP rights, we have database rights, and they expressly exclude the data itself. It's all about the right kind of investment when it comes to sui generis database rights, so that's an EU-specific right and it won't cover databases created by UK entities from the beginning of the year and we're hoping for some sort of reciprocal right we'll see, but frankly, it was never much used, because there was never any reciprocity for similar rights outside of Europe anyway, and very rapidly, the CJEU really cut down its effect, because it's all about investment - you have to have the right kind of investment for a sui generis database right and that's investment in obtAIning or verifying or presenting the contents. The case law appeared to say that if you were generating the data for your day-to-day business, so if you were a pharmaceutical company and you had a load of clinical trial data to prove the efficacy of your drug, you didn't invest for obtAIning, verifying or presenting the database, you were investing to get your clinical trial done - and so there's this sort of rule of thumb that it doesn't apply to spin-off data, which is a huge carve-out, and has really emptied the database right of most of its potential value. I think recent commentary has suggested that's an un-nuanced approach and there is no absolute rule agAInst spin-off data, and the other thing is the rise of AI in particular in machine-learning, is that there's so much pre-processing of data that actually there's potentially the right kind of investment there, so if you can show that to verify and present the data correctly for ingestion, as it's called in machine-learning, you may have made that investment, and you may get a database right.
Jocelyn: As you sAId, the existing traditional IP rights do not lend themselves easily to the new structures, the new process, so we are doing as lawyers often have to do in the technology space and working with the rules that we have to try and see and understand how they might apply or be useful to us in our new technology. So you have talked about a few traditional sorts of IP rights there, and sAId there is some level of assistance but there are quite a few difficulties and caveats as well, so what is your advice on the best way or the way that is most likely to succeed to use the existing powers as a protection?
Matt: So I think there is still scope for traditional IP rights and they should be pursued where they are likely to work but it's so new, some of this, that in terms of the importance of data, that there is just a lot of uncertAInty of how it's going to come out. So I think the absolute best practice, the first ingredient is trade secrets because they are clearly broad enough to protect information itself so mere data. Broad enough to protect your algorithms even though they are functions that cannot be protected by copyright. So that is definitely the way to go and this is really illustrating the states with the litigation between Waymo and Uber. So they had a dispute as to autonomous vehicle technology and it was not a patent dispute, it came down to trade secrets, and I think that is very telling for where the real rights are probably going to be.
So in the UK we have always had, in recent history anyway, a common law right to confidential information. Then we have got a harmonised EU Trade Secrets regime which has been moved into our national laws. Really, in order to protect your trade secrets I would recommend a mixture of practical and legal steps. So practical measures to keep information secret. It is things like restricting physical access to your secrets. Restricting electronic access, so make sure employees have levels of permissions and passwords and their access logs to core secrets. Then you can use electronic measures to monitor suspect activity, so if data has been transferred by emAIl or to cloud storage or to memory sticks you can get an alert. Then it is just things like staff trAIning and labelling stuff as secret so people know if what they are dealing with is a secret. Then the second fork of that is legal measures. So make sure with collaborators and guests NDAs are in place; make sure that there are proper terms of employment that deal with your secrets; staff handbooks and policies, trAIning, that sort of thing. Ideally have a plan for if there is some sort of breach because the problem with a trade secret is, if it gets out, it is too late but you may be able to use legal and practical measures fast enough to prevent dissemination even if one person is trying to take your trade secrets.
But the point to really emphasise here is, under the harmonise regime, the very definition of a trade secret requires you to have taken measures. So it literally says that a trade secret has to have been subject to reasonable steps under the circumstances by the person normally in control of the information to keep it secret. So my other practical piece of advice is, any measures you choose to protect your trade secret should be measures you can easily use as evidence in court. So things which generate their own logs, things which are documented so you won't have a problem there.
I would just warn of one other thing. There is a particular risk when it comes to AI and that is because the demand for people with skills in AI so far outstrips the supply, that there is a shortage and people are in high demand and they make good money and they are a very mobile workforce. So trade secrets is challenging. They are techy, they are millennial so they don't necessarily see jobs for life, they are part of the start-up culture, and so I think you have to be extra specially careful with trade secrets when it comes to data science basically and the AI technology every more than traditional areas of research.
And then the second key issue I think here, and this is why Joss and I have ended up working together more than ever, is contracts. So the issue I have with IP is, one it may not apply, and two you don't know who is going to own it for sure. There are some default rules in copyright and patents but you can clear all of that up with a good contract. So take it away, tell me what I should do with contracts.
Jocelyn: Yes of course, as you say contracts are such an important tool in this area because I think, and Matt has made this point but just in case anyone did miss it, there does seem to be a myth that you can own data. Often if I have conversations with clients they say, well it's my data of course a supplier can't just go off and re-use it. But the reasons as you have explAIned that the fundamental proprietary legal concept of ownership in fact some figures are out, as you say the world as we know it, it just wouldn't go around. So what we have to do in contracts is exert rights contractually to control access to, use of, many uses of that data. So contracts are always important in English law because we don't have a constitution, you set down what you want to be agreed in that contract and in this area, more important than ever, around use of data. You can use all the language and tools that we do for traditional software licencing and indeed of IP rights because if you look at a SaaS contract it often talks about granting a licence despite the fact that technically there is no need to move that IP or you are just accessing a service which is using software in which there are intellectual property rights. So it is still totally valid to use all those same concepts that you would do for licencing, IPR, or software around data. You can treat that asset in a contract as if there is IPR in it even though technically that does not exist outside of the contract. You can use all that terminology so you can talk about a right to use data being revocable or irrevocable, just for the term of the contract obviously thinking about it as an important point. Is it further sub-licensable by the party to which you are granting this licence? We are involved in schemes where we have data being licenced to re-sellers who go off and licence it further and clearly there are big players in the data market. You look to the likes of Experian and companies like that whose whole market is bringing data sets together. You can talk about data being licenced within a territory or outside of it and the key that is always the permitted purposes; what are you setting out this other party can or cannot do with that data. Usually couched in terms of business type purposes or for use in particular products, but you can frame it how you need to for the context of your particular contract.
As well as all the traditional aspects we are used to thinking about there are some new and different things to think about in an AI context. So if you are putting data into a system, as Matt sAId that is going to be used by the trAIning system or it is going to be ingested and then run through a set of criteria and learnings so it is going to be combined quite usually with other data. Are you happy for that to happen to your data? It might be then if you are not happy that is not the right tool to use but as with SaaS contracts and anything in IT it is understanding the processes that are going to happen to understand if it is a risk that is appropriate or not. The data can be combined, the question is, is that combined in a way it can then be separated at the end of the day or is it then inherently within something else, some kind of derived data or output or imbedded in something else. And who then has what rights around that something else that is inextricably linked with someone else's data sets or even multiple third parties' data sets if you are working with an aggregator and they are combining sources from multiple partners. And this multi-faceted point is another different element to play around with in contracts because typically what Matt and I are seeing are these are not just two-way contracts, we don't yet have an established eco-system where you have one supplying to another and building on a technology stack. These are multiple players from start-ups to big companies to traditional players coming together to collaborate.
Matt: In different roles as well, so you have the data sites, the data supplier, the customer, the data platform and all these people have to work together.
Jocelyn: And all bring value. So it's not clear if you are just sort of following a traditional customer supplier thinking, who is going to have what rights in these combined outputs at the end, so it is critical you think through as clear as you can be in your contracts, with your defined terms, it can be quite difficult. I have seen contracts talking about combined data, manipulated data, underived data and trying to separate out those threads to be clear as to who is doing what.
Particularly on exit, make sure it is clear if you want to get back what you put in, put that as the case. Think about people needing to keep copies for backup for insurance, other regulatory reasons. I know, Matt, we are going to talk later about how critical regulatory aspects can be when you are thinking about AI. Also you rAIsed the point earlier about warranties. If I am providing data into a system, who is an expectation that comes with any kind of warranty that this data is accurate, up to date, it has retAIned its integrity, what effect a duplicate, contradicting values? Or if you are going to be a user of that data, what is the impact on your use case if in fact the output you have got from a machine learning exercise where it has learnt either incorrect things or its extrapolated correlations that we don't agree with in the real world? We have seen plenty of cases recently of discriminatory results in criminal law enforcement or in recruitment exercises where the data that the AI has used to understand its world has inherent discriminatory aspects because of the way the world has developed.
To view the full original article please click here.
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.