At a time when patent infringement suits such as Eolas v. Microsoft can result in $1/2 Billion dollar awards, fast access to accurate patent data has reached peak demand. Finding a "Silver bullet" prior art patents can result in invalidity of the asserted patent, or it describes the technology another company is developing, it can stop R&D in its tracks, saving the company 10 of millions in waster R&D investment.

Why is finding the "silver bullet" patent amongst more than 20 million issued patents so difficult? The answer is in two parts: a) inadequate search technology, and more importantly, b) pood patent data quality.

What is 6-Sigma? More importantly, what is "quality patent data"?

GE, Motorola and Allied Signal pioneered the implementation of a statistical quality process known as 6-Sigma, to improve quality, lower the rejection rate, and to control the variables that have a measurable impact on product manufacturing quality.

Since that time, thousands of companies have shown that 6-Sigma, when applied throughout an organization, can increase corporate overall net profits by 10% per each Sigma level improvement! But until now, patent data didn't even appear on the "6-Sigma radar screen".

6-Sigma's underlying 5-part problem-solving methodology is Define, Measure, Analyze, Improve, Control" (DMAIC), all aimed to deliver quantifiable performance improvement.

6-Sigma is the highest quality level of the "Sigma" system, which sets forth the acceptable number of Defective Parts Per Million Operations (DPMO).

Sigma Level /


6 Sigma /


5 Sigma /


4 Sigma /


3 Sigma /


2 Sigma /


But what does all this have to do with patent quality - or the implied "poor" quality of patent data?

While it's rather easy to count the number of defective plastic parts being produced by a molding machine, it's a little more difficult to count the errors in patent data.

There are currently about 2.5 million active US patents - all of them available in digital form, so they are searchable on various commercial and government databases. Other authorities, such as the European Patent Office, WIPO, and others similarly contain most of the active patents searchable in digital form.

At 6-Sigma quality, there would be about 8.5 total patent data errors in a database containing US patents. However, at 4 Sigma, there would be roughly 8,000 patents containing errors. Other patent issuing authorities fare about the same.

The primary cause of these digital errors can be traced back to the quality of the OCR scanning technology employed to convert paper patent documents to digital files. Therefore, it is more a matter of "technology employed" than mismanagement of patent data. In fact, in most countries, there are not legislative requirements to keep digital patent data accurate - only the paper files. So the digital files are made publicly available more as a convenience to society. Consequently, there is no pressing need to ensure quality patent data … and unfortunately, it's obvious.

As it turns out from PatentCafe's analysis developed during the creation of it's global patent database (, the currently available from most patent offices in developed countries generally falls between 4 Sigma and 5-Sigma quality. Here is a sampling of the errors encountered.

  • transposed patent issue dates (ie: 9197 rather than 1997)
  • missing patent claims
  • missing patent abstracts
  • truncated data (i.e.: claims 1-5 may appear on a patent that issued with 37 claims)
  • and many more.

Conducting a search for a keyword(s) in the CLAIMS of a patent, using a traditional Boolean search engine to access the patent database, will result in patents without claims being missed 100% of the time. And the search could be missing 1,000s of highly relevant patents.

Finding errors in patent data presents a paradox: If a search for patents containing errors does not return the erred patent, then you simply don't know what errors you did not find.

So, compared to the defective plastic parts that you can see coming off of a production line, one is unable to "see" patent data containing errors.

In an industry where a single patent can mean the difference of millions of dollars won or lost in litigation, 10s of millions spent on R&D for a technology that's already been invented, or the loss of patent rights resulting from a successful invalidity challenge, missing ANY patent during a search for patentability, due diligence, or freedom to operate is inherently dangerous.

So how does one develop confidence in a patent database? One way is to develop confidence in the process by which the data provider tests, corrects and maintains the patent data that goes into the database.

The "quality" components for patent data already been defined by the World Intellectual Property Organization (WIPO) which instituted STANDARD 32 - a complex document that defines every data field, data format and patent structure. It also sets the standard of compliance with various digital text standards established by the international Standards Organization - the organization that maintains the well-know ISO-9000 quality standards.

Following the WIPO standard as the foundational structure of ICO Global Patent search, PatentCafe implemented a system to aggressively screen, test and reject patents that do not meet the quality process. The rejected patents are then manually corrected, using the accurate paper patent copy as the data source.

This process ensures that 100% of the patents entered into the ICO-GPS database are error free compared to the baseline accuracy parameters established by the company.

Finally, there is a way to make sure the needle is IN the haystack, and a process to confidently find it.

For a more technical outline of 6-Sigma processes relating to patent data, download the white paper 6-Sigma as applied to Patent Data Quality and IP Management at

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.