Big Data: Legal Challenges (Full Report)

Copyright typically extends to cover original compilations of data. However the Courts have been careful to make clear that such protection does not cover the data itself but rather the particular collection of it. This distinction is in line with the classic division in copyright law between non-protectable ideas and protectable expression of those ideas.

The way in which the balance between expression and ideas has been achieved in different jurisdictions, and the test for originality in the context of databases varies from country to country.

It is instructive to consider precedents set in several territories:


The position in relation to copyright in compilations and databases in Australia was defined by the decision of the High Court in IceTV Pty Ltd v Nine Network Australia Pty Ltd10 ("IceTV") and the subsequent decision of the Full Federal Court in Telstra Corporation Limited v Phone Directories Company Pty Ltd 11 ("Phone Directories").

Prior to these cases, Australian law appeared to follow the traditional common law approach often referred to as the 'sweat of the brow' doctrine, under which the requirement of originality could be satisfied as a result of the application of skill, labour or judgment. The earlier Australian and English cases were considered carefully in Desktop Marketing Systems Pty Ltd v Telstra Corporation Ltd12, another case concerning telephone directories, and the approach exemplified in the following statement of Sackville J13:

"The course of authority in the United Kingdom and Australia recognises that originality in a factual compilation may lie in the labour and expense involved in collecting the information recorded in the work, as distinct from the 'creative' exercise of skill or judgment, or the application of intellectual effort."

Thus it was recognised that labour, as opposed to some form of creative or intellectual input was sufficient to satisfy the originality test.

In IceTV the High Court was called upon to consider copyright in Channel Nine's television programming guides, the Weekly Schedules. Copyright in the broadcast schedules as a whole was conceded, but the question of originality, and authorship, arose in the context of the issue of whether the taking of time and title data was taking of a substantial part of the copyright work.

In the Federal Court, the primary question considered in this context was whether IceTV had appropriated the skill and labour expended on the program listings. At first instance, Bennett J concluded it had not. The 'slivers' of information taken were not of a sufficiently substantial quality to be considered a substantial part. In particular, she held that only the skill and labour involved in putting together the Weekly Schedules was relevant (the expression of the information), which was routine, not the skill and labour involved in the actual programming decisions (the creation of the information).

The Full Federal Court however took a wider view and was prepared to take into account the skill and labour in the programming decisions. In doing so, it found that the time and title data was the "centerpiece" of the Weekly Schedules and so amounted to a substantial part.

The High Court delivered two complex judgments. French CJ, Crennan and Kiefel JJ focused on the ideas/ expression dichotomy, finding that there was insufficient originality in the arrangement of the time and title information for that to amount to a substantial part. Similarly Gummow, Hayne and Heydon JJ found that the originality of the Weekly Schedules lay in the selection and presentation of the time and title information together with additional program information and synopses as a composite whole. The preparatory work involved in producing the time and title information was not relevant to substantiality and there was left only "the extremely modest skill and labour" in setting down the programs already selected. They also cautioned against reliance on the Desktop Marketing emphasis on appropriation of skill and labour, suggesting that the reasoning in Desktop Marketing may have been "out of line with the understanding of copyright law over many years" 14.

The other significant aspect of the IceTV judgments was the emphasis on authorship, noting the centrality of the concept to the statutory protection provided and warning of 'new challenges in the paradigm of the individual author' 15.

The following year it fell to the Full Federal Court to apply these principles in Telstra Corporation Ltd v Phone Directories Co Pty Ltd 16 ("the Phone Directories case"), in the context of the subsistence of copyright in the White and Yellow Pages directories.

The three judgments of the Full Court were unanimous in rejecting the submission from Telstra that 'industrious collection' alone would be sufficient for originality and emphasise the importance of activities directed at the fixation of a work, Keane CJ noting 17:

"The dicta in IceTV shift the focus of inquiry away from a concern with the protection of the interests of a party who has contributed labour and expense to the production of a work, to the 'particular form of expression' which is said to constitute an original literary work, and to the requirement of the Act 'that the work originates with an author or joint authors from some independent intellectual effort'.

The judgments shy away from laying down a specific test in the wake of the demise of the 'industrious collection' test. Gordon J noted at first instance that after IceTV, the new formulation of the originality threshold is either 'independent intellectual effort' (per the French judgment)18 or 'sufficient effort of a literary nature' (per the Gummow judgment)19. In any event, the Telstra telephone directories did not meet the test.

The second important aspect of the Phone Directories case was the approach to authorship of the telephone directories. At first instance, Gordon J appeared to take a very strict approach to authorship based on the centrality of that concept in the IceTV judgments, to the extent that it appeared that the trial judge considered that it was necessary to name each individual author. In the case of telephone directories, the difficulties in this approach are apparent and the 91 affidavits filed by Telstra were not considered sufficient because they did not identify every author.

In the Full Court, Keane CJ and Perram J concluded that it is not necessary to identify every author by name. As noted by Perram J 20:

"All the Act requires in the case of s 32(2) is that there be an original work first published in Australia. The necessity for there to be an original work carries with it the necessity for there to be an author or authors but all that needs to be demonstrated is that such persons exist. Their identification is not legally required by the concept of an original work. The statement by Gummow, Hayne and Heydon JJ in IceTV that "[t]o proceed without identifying the work in suit and without informing the inquiry by identifying the author and the relevant time of making or first publication, may cause the formulation of the issues presented to the court to go awry" (at [105]) is, I think, a counsel of wisdom rather than a legal stipulation."

In the present case, identification of the authors raised again the problem of division of collection of the data, and arrangement of the database itself 21:

"The information in the directories was collected through processes which I would accept involved human industry and the results of which were stored in a substantial and sophisticated database. However, the creation of the material form of the directories was carried out by a computer program overseen by persons who had no substantive input into those forms. The questions which arise are, therefore, two. First, granted that there must be independent intellectual effort or sufficient effort of a literary nature, is that effort required to be directed at the creation of the material form of the work (here the form of the directories) or does it suffice that the effort was directed at some anterior activity (here the collection of information presented in the directories)? Secondly, if the intellectual effort must be directed at the creation of the material form of the directories, was there sufficient human effort involved in that process in this case to mean that the directories were reduced to a material form by an author or authors?"

In answer to those questions, Perram J held that although the directories were not copied from elsewhere, neither were they created by a human author. The human involvement in the collection of the data was not relevant because it predated the reduction of the collected information into material form. Although humans were ultimately in control of the software which reduced the information to a material form, their control was over a process of automation and they did not shape or direct the material form themselves (that process being performed by the software). The directories did not, therefore, have an author and copyright did not subsist in them.

In the wake of IceTV and Phone Directories, the prospects of copyright protection for many databases look slim. The requirement that a human author be involved in the reduction of the database to material form, and that there be some intellectual effort in the creation of that material form will rule out protection for many relational databases, let alone Big Data databases, in which the form of the database is effectively irrelevant.

New Zealand

New Zealand statement of relevant principle rests primarily with the Court of Appeal in University of Waikato v Benchmarking Services Ltd22. In that case, the Court of Appeal considered a claim of copyright infringement in respect of a survey which compiled financial data relating to New Zealand businesses. Information was compiled by way of questionnaire of accounting firms. The data was analysed by the plaintiff and various ratios were calculated. The defendant was alleged to have created an exact copy of the layout and ratios used in the plaintiff's study. Unusually the case concerned an application for summary judgment, which was ultimately granted.

The respondents did not dispute that copyright existed in the appellant's works. However the Court of Appeal set out the principles relating to copyright in compilations in order to ascertain precisely what was protected by such copyright.

The Court noted that the definition of "literary work" in section 2 of the Copyright Act 1994 includes a table or compilation. The expression "compilation" is defined non-exclusively to include a compilation consisting wholly of works or part of works, partly of works or parts of works and a compilation of data other than works or parts of works. The Court stated that it was well established that copyright could subsist in publications such as dictionaries, directories, maps or lists. However in such cases, there was no claim to any right in the information contained in the compilation where the compiler of factual information is not the author or originator of the individual facts recorded in the compilation.

The Court of Appeal found that in order to qualify as original for the purposes of copyright protection, the threshold test is not high. The determining factor is whether sufficient time, skill, labour or judgment has been expended in producing the work. This may arise in the case of databases, through the manner in which the information is selected for inclusion in the publication, the format or presentation of the data or the selection and calculation of relevant ratios, percentiles, averages and other details. The Court, in particular, cited the decision of the House of Lords in Ladbroke (Football) Ltd v William Hill (Football) Ltd23, indicating that New Zealand followed the English 'sweat of the brow' approach.

In the case before it, the Court of Appeal found that while the raw data did not attract copyright, there were a number of unusual or unique features which clearly resulted from the expenditure of significant creative effort and skill on the appellant's part including the headings adopted, the order in which they appeared, the selection and calculation of ratios, the presentation and calculation of figures and percentages and the overall format and presentation of the report.

Similar principles were applied by Allan J in YPG IP Limited v Yellow Pty Ltd 24, in which an interim injunction was sought to restrain publication of an alleged copy of the Yellow Pages business directory. Applying the University of Waikato principles, Allan J found that the plaintiffs had at least raised a serious question as to the entitlement of the Yellow Pages directories to copyright, based on the many hundreds of hours of employee time spent collecting, verifying, recording, assembling and maintaining the relevant data.

In a further interlocutory application in that case for further particulars, the defendants relied on the intervening Australian Telstra decision. The judge noted that the applicability of the principles established in the Australian decision would be a matter for the full trial. It appears that the New Zealand Yellow Pages case settled before trial and so the question of whether the Australian principles will apply in New Zealand has not yet been tested.

Although New Zealand has therefore to date adopted an approach more favourable to databases than that of IceTV and Phone Directories in Australia, given that labour alone will be sufficient to meet the originality threshold, it is still questionable whether a Big Data database would meet the 'sweat of the brow' test, given that very little labour, skill or judgment might be said to be expended on the collation of the database.

United Kingdom

Prior to 1998, when amendments to the Copyright, Designs and Patents Act 1988 (the "CDPA") relating to databases came into effect25, English copyright law looked very similar to that of Australia and New Zealand, and indeed the Australasian law was based on English precedent.

However, the amendments to the CDPA implemented significant changes in the protection of databases, in particular giving effect to the European Database Directive, and making consequential changes to copyright law. The database right is dealt with further below.

With respect to copyright, the CDPA introduced a new definition of a database as "a searchable collection of systematically or methodically arranged works, data or other materials"26. Databases were included in the category of literary works, however the standard of originality for database copyright was raised to require that the database be the author's own intellectual creation by reason of the selection or arrangement of the database contents27. Databases were also excluded from protection as a table or compilation, so that the only way to obtain copyright protection for a database was to overcome the 'author's own intellectual creation' test.

In 2010, the English High Court was called upon to consider this test in the context of football fixture lists in Football Dataco Ltd v Brittens Pools Ltd28. It was clear on the evidence that significant intellectual effort was involved in the determination of the fixtures i.e. the creation of the data. Floyd J held that29 :

"This work is not mere "sweat of the brow", by which I mean the application of rigid criteria to the processing of data. It is quite unlike the compiling of a telephone directory, in that at each stage there is scope for the application of judgment and skill. Unlike a "sweat of the brow" compilation, there are some solutions which will simply not work, and others which will be better. Mr Thompson explained that it might be the case that the computer would say that there was no solution for a given set of constraints. The quality of the solution depends in part on the skill of those involved."

However the question which arose was whether this intellectual effort was directed towards "selection or arrangement" of the database contents as required under the CDPA. Floyd J held that the selection or arrangement required was not confined to selection or arrangement performed after the data was created. The process of selection and arrangement of the contents of a database could, and often will, commence before all the data is created. In this case the relevant data included at least the dates on which matches in general would be played, the matches which were to be played and the dates of specific matches. The authors exercised choice over the dates on which the fixtures were played and the identity of the teams to play in each match on those dates. If this was not selection, then it was an arrangement proceeding from the starting materials of the clubs in the league and the dates of the rounds of matches, to produce an arrangement which brings them together.

In arriving at this conclusion, Floyd J rejected the applicability of the judgment of the European Court of Justice in Case C-203/02 British Horseracing Board Ltd v William Hill Organisation30 ("the BHB case"), which related to the sui generis database right in the 'runners and riders' database prepared by the British Horseracing Board. The database right requires investment in the selection, arrangement or verification of the contents of the database. In the BHB case, the process of entering a horse in the relevant database for a race required a number of prior checks as to the identity of the person making the entry, the characteristics of the horse and the classification of the horse, its owner and the jockey. However the ECJ held that such prior checks were made at the stage of creating the database for the race in question and therefore constituted investment in the creation of data and not in the verification of the contents of the database.

In noting caution in applying reasoning from sui generis database rights cases to database copyright cases, Floyd J emphasised the difference in the basis of the two rights, the purpose of copyright being to provide encouragement for creative endeavour, whereas the sui generis right is designed to encourage investment in particular types of data gathering.

As to what was meant by the author's own independent intellectual effort, Floyd J held that although the court would not apply a qualitative or subjective assessment, the author must have exercised judgment, taste or discretion (good, bad or indifferent) in selecting or arranging the contents of the database. He also stated that "author's intellectual creation" did not require the reader of a database to be able to identify the author. Finally he held that the work done in the selection/arrangement of the database was quantitatively sufficient to amount to intellectual creation.

The Dataco case was appealed to the Court of Appeal, which referred to the ECJ the questions of whether intellectual effort and skill of creating data should be excluded from consideration for meeting the 'author's intellectual creation' test, whether this included adding important significance to a pre- existing item of data and whether the test required more than significant labour and skill of the author.

Noting that the purpose of the Database Directive was to stimulate the creation of data storage and processing systems in order to contribute to the development of an information market, the ECJ immediately directed attention towards the fact that the copyright protection provided for by that directive concerns the 'structure' of the database, and not its 'contents' nor, therefore, the elements constituting its contents. Similarly, it was apparent from Article 10(2) of the Agreement on Trade- Related Aspects of Intellectual Property Rights and from Article 5 of the WIPO Copyright Treaty, that while compilations of data which, by reason of the selection or arrangement of their contents, constitute intellectual creations are protected as such by copyright, that protection does not extend to the data itself.

In that context, the concepts of 'selection' and of 'arrangement' within the meaning of the Database Directive refer to the selection and the arrangement of data, through which the author of the database gives the database its structure. They do not extend to the creation of the data contained in the database.

Furthermore, the criterion of originality in a database's structure is satisfied when, through the selection or arrangement of the data which it contains, its author expresses his creative ability in an original manner by making free and creative choices. The criterion is not satisfied when the setting up of the database is dictated by technical considerations, rules or constraints which leave no room for creative freedom.

Therefore the fact that the setting up of the database required, irrespective of the creation of the data which it contains, significant labour and skill of its author, cannot as such justify the protection of it by copyright under Directive 96/9, if that labour and skill do not express any originality in the selection or arrangement of the data. Thus the ECJ firmly rejected the 'sweat of the brow' approach to originality in respect of databases.

The case will now return to the English Court of Appeal to apply these legal principles to the facts in the case. It seems likely that the outcome will depend on whether the "data" contained in the database is seen as the identity of the football teams and the dates for fixtures, in which case the organisation of which teams will play on what date may be construed as an arrangement of the data, or whether the data is seen as the identity of two teams chosen to play each other on a particular date, in which case it appears that all intellectual effort has gone into the creation of the data itself.

Again, under United Kingdom test, there is little room for copyright protection for Big Data databases, given the usual lack originality, or even effort, in the selection and arrangement of the data.

United States

Databases are protected by the United States Copyright Act 31 as compilations, defined as a "work formed by the collection and assembling of pre-existing materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship.32"

The touchstone for copyright protection for databases is originality as incorporated in the above definition of "compilation" and the general originality requirement in respect of works entitled to copryight 33.

The extent of such protection was in relation to databases was considered by the Supreme Court in Feist Publications, Inc. v. Rural Telephone Service Company, Inc. 34, another case concerning telephone directories. The Supreme Court noted the key distinction between facts, in which there is no copyright, and a compilation of facts, which can, by its originality, become entitled to copyright. It also noted that the corresponding protection for a compilation is thin, since the underlying facts can be taken as long as the original compilation is not.

This distinction underlies the definition of compilation outlined above. Choices by the author of a database as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original to obtain copyright protection. Facts are never original, so the compilation author can claim originality, if at all, only in the way the facts are presented.

The Court noted that the originality requirement is not particularly stringent and that the vast majority of compilations would pass this test. However the telephone directories at issue did not. The Court held that the selection of listings could not be more obvious: the information included was the name, town, and telephone number of each person applying for a telephone service. This was a "selection" of sorts, but it lacked the "modicum of creativity necessary to transform mere selection into copyrightable expression".

In coming to its conclusion, the Supreme Court firmly rejected the "sweat of the brow" or "industrious collection" test for originality, expressly stating that such analysis was faulty.

Is copyright protection available for Big Data databases?

As can be seen from above, most of the jurisdictions considered have now adopted a test under which some level of creativity is required in the selection of contents, or arrangement, of a database, in order for it to qualify for copyright protection. Under such a test, Big Data, at least in its unstructured or semi structured form is very unlikely to be entitled to copyright protection, since it is the very nature of Big Data that it is usually unorganised, and collected in a mechanical, rather than creative, fashion.

Related links


10(2009) 239 CLR 458
11[2010] FCA 44
12(2002) 119 FCR 491
13Ibid [407]
14Supra note 16 at [188]
15Supra note 16 at [23] per French CJ, Crennan and Kiefel JJ
16(2010) 194 FCR 142
17Ibid 169 [82]
18474 [33], 479 [48]
19494 [99]
202010 194 FCR 142, 181 [127]
21Ibid, per Perram J [101]
22[2004] NZCA 90; 8 NZBLC 101,561
23[1964] 1 All ER 465, 469
24[2007] NZHC 1947; (2008) 8 NZBLC 102,063
25As a result of the Copyright and Rights in Databases Regulations 1997, implementing Council Directive No. 96/9/EC on the legal protection of databases ("the Database Directive")
25Section 3A, CDPA
26Ibid, implementing Article 3(1) of the Database Directive
28[2010] EWHC 841 (Ch)
29Ibid at [43]
30[2004] ECR I-10415; [2005] E.C.D.R. 1; [2005] R.P.C. 13
3117 U.S.C
32Ibid § 101
33As required under under 17 U.S.C. §102(a)
34499 U.S. 340 (1991)

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.