Setting A Proper Confidence Factor Threshold In Speech Recognition: Non-Technical

This decision relates to a European patent application that concerns improving the speed and accuracy of speech recognition when one or more expected responses are likely. Speech recognition per se is typically recognized as being technical. However, the Board decided that the contribution of the distinguishing feature, to set a proper confidence factor threshold of the generated hypothesis based on the expected response, was based on the selection of certain mathematical operations, and thus not technical. Here are the practical takeaways from the decision T 1898/17 (Speech recognition/Vocollect) of October 5, 2021 of Technical Board of Appeal 3.4.01:

Key takeaways

Finding an implementation of how to set the proper acceptance threshold for comparison with the confidence factor of the generated hypothesis, considering the expected response involves only non-technical considerations in the form of the selection of certain mathematical operations.
Comparing the hypothesis to an expected response, and in the subsequent step of adjusting the acceptance threshold in case of a favorable outcome is non technical. Comparing two parameters and, depending on the result, adjusting a third parameter is a purely mathematical operation.
It is true that speech recognition per se is typically recognized as being technical. However, it is not the technical character of the claim as a whole that is put in question, but the technical contribution of the distinguishing features to the prior art.

The invention

The invention concerns improving the speed and accuracy of speech recognition when one or more expected responses are likely.

When analyzing acoustic features of a speech by matching to an acoustic speech model, word (or words) that was likely to be spoken is identified as a hypothesis. The likelihood of the hypothesis is represented by a confidence factor that is assigned to each hypothesis.

If the confidence factor of the hypothesis is lower than an acceptance threshold, it is not accepted and the user may be asked to repeat or spell it, resulting in loss of time.

The invention is aimed at situations in which a certain speech content is expected, such as a voice command for picking items from a warehouse where a worker may have to confirm an expected location and number of the picked items. By using the knowledge of an expected response, the invention facilitates voice recognition.

Fig. 1 of WO 2006/084228 A1

Claim 1 (main request)

A method for recognizing speech, the method comprising the steps of:

analyzing speech input to generate a hypothesis and a confidence factor associated with the hypothesis;

comparing said confidence factor to an acceptance threshold for accepting the hypothesis; and

comparing the hypothesis to an expected response, and:

if the comparison is not favorable, then not adjusting the acceptance threshold prior to comparing the confidence factor to the thereto,

if the comparison is favorable, adjusting the acceptance threshold, prior to comparing the confidence factor thereto, in order to make acceptance of the hypothesis more likely.

Is it patentable?

The first-instance Examining Division found that the independent claims lacked novelty (Article 54 EPC) in view of document D3 (EP 1 377 000 A1), which relates to speech recognition in automated directory services providing users with information (phone number, name or address) based on automated dialogue. A confidence factor is assigned to each hypothesis and that is compared to an acceptance threshold.

The applicant disagreed with the Examining Division on an interpretation of the embodiment of document D3 which receives a town name and the confidence level of the hypothesis is based on the knowledge of the user location.

The Board decided that neither the applicant nor the Examining Division interpretation was unambiguously disclosed in document D3:

17. Hence, there is no explicit or implicit disclosure of the step "comparing the hypothesis to an expected response". Consequently, there is also no disclosure of a threshold adjustment that depends on the outcome of said comparison.

However, the Board then considered if these features were non-technical using the established COMVIK approach:

19. In the previous paragraph it was established that the subject-matter of claim 1 differed from D3 in the step of comparing the hypothesis to an expected response, and in the subsequent step of adjusting the acceptance threshold in case of a favorable outcome. Comparing two parameters and, depending on the result, adjusting a third parameter is a purely mathematical operation. Therefore, the steps are non-technical by themselves. In other words, the distinguishing features are non-technical.

The Board continued with analysing if the feature provide a technical purpose compared to the teaching of the prior art:

21. D3 achieves the same overall goal as the invention: The a priori knowledge of an expected response is used for setting the acceptance threshold such that it is more likely to accept a hypothesis that corresponds to the expected response. Thereby, unnecessary fallback actions, like a repetition of the response, can be avoided. This saves time. The value of the acceptance threshold is not dependent on the setting process. Namely, it does not depend on whether it is assigned after comparison of the hypothesis with the expected response, or whether it is assigned from the start, taking due account of the expected response. Hence, no effect can be derived from the value of the acceptance threshold, as such. The effect of the distinguishing steps merely lies in the way, in which the expected response is used in order to compare the confidence factor of the hypothesis with the proper acceptance threshold. This effect does not serve any technical purpose and does, therefore, not have a technical character.

Therefore, the Board formulated the problem and determined that the solution was merely a selection of mathematical operations:

22. The objective problem can be seen as finding an implementation of how to set the proper acceptance threshold for comparison with the confidence factor of the generated hypothesis, considering the expected response.

23. The solution to this problem involves only non-technical considerations in the form of the selection of certain mathematical operations. The distinguishing, non-technical features, therefore, do not contribute to the technical character of the method defined in claim 1.

The Board disagreed with the applicant that the features have a technical effect:

26. In the opinion of the appellant, the distinguishing steps identified above did have technical effects. The comparison and threshold adjustment enabled a greater variety of responses to be recognized, because the speech was not directly matched to a limited list of expected responses, as it was the case in D3. Further, storage space could be saved, because the invention required the saving of only one threshold value, and not a separate value for every possible hypothesis. In addition, processing time and resources could be saved, because no threshold adjustment was necessary at all if the hypothesis did not match the expected response. The appellant added that the objective technical problem was to provide an alternative way of speech recognition. Speech recognition was commonly accepted to be of technical nature.

27. The arguments are not persuasive. As noted further above, the claim does not imply the variety of speech to be detected. The claim encompasses restricted speech recognition, limited for example to numbers one to six as envisaged in paragraph 47 of the application.

28. As to the storage capacity, independently of the fact that potential saving would be counterbalanced by the additional memory space required for the program incorporating the steps of comparing and adjusting, it is observed that the alleged effect cannot be derived from the claims wording. The claims wording also encompasses acceptance thresholds being defined for each possible response. That aside, D3 does not imply the presence of a large number of threshold values. There might well be only two values, one for the home-town of the user and one for the other towns.

29. There is also no apparent saving of processing time of the response. In D3, the threshold has either been set previously, after acquiring the knowledge on the caller's home town, or it is set only if the requested town corresponds to the home town. Hence, the processing of the response does not require more resources.

While the Board agreed that speech recognition is technical per see, the technical character of the contribution is not technical:

30. It is true that speech recognition per se is typically recognized as being technical. However, it is not the technical character of the claim as a whole that is put in question, but the technical contribution of the distinguishing features to the prior art, in this case to D3. As shown above, the distinguishing features are neither technical by themselves, nor do they contribute to solve a technical problem.

Therefore, claim 1 was considered not to involve an inventive step.

More information

You can read the whole decision here: T 1898/17 (Speech recognition / Vocollect) of 5.10.2021 of Technical Board of Appeal 3.4.01.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.