The success of any forensic investigation hinges on information. Sources of information that now are standard for many investigations—such as e-mails, working papers and transactional data—help the investigator answer critical questions. E-mails and working papers detail who knew about key activities and who was responsible for them, as well as what the business policies and procedures were. Transactional data are quantitatively analyzed to answer questions involving “When?” and “How much?” Those sources of information, however, may not offer a complete view of how the events occurred. The logic for automated operations, known as source code, is a source of information for answering key questions about how the events occurred.

WHAT IS SOURCE CODE?

Source code, taken broadly, is a term used to describe computer instructions that are written by people and executed by computer programs. A source code file contains the logic that is to be executed by a program in a form that can be read and understood by a person. Source code is either compiled into an executable file or is interpreted by another program that translates the source code into computer operations.

Source code files are written in a programming language that is understood by the program used to execute the logic. Each programming language is different and typically is used for particular purposes. Some examples of source code include logic written for Microsoft Visual Basic for Applications, database scripts and compiled .NET programs. Older financial firms occasionally have programs that initially were developed in the 1970s using mainframe programming languages such as AS/400 Control Language and COBOL. Web-based startup companies, on the other hand, favor modern programming languages like Ruby and Python.

Program source code is developed by a company’s IT and business function groups to perform specific operations. A software developer translates business requirements into a source code file that then is executed via a compiler. For example, an insurance company’s claims processing group writes a program to assign a score to each of the claims received that day. The program is scheduled to automatically run every day, and the program assigns a score and stores the claim data and score in a database. The scoring logic can be reviewed by an investigator to determine how the company assigns scores.

WHY ANALYZE SOURCE CODE?

Source code is a valuable resource for an investigator for a number of reasons. First, source code contains information about the business rules used to perform operations. A company’s policies and procedures documents do not always match the actual operations of a company, and source code can be used to identify those discrepancies. Second, the source code for key business operations contains information about how and where the data from that operation were stored. An investigator can more quickly identify the key databases and other sources of information by knowing where the source code program stored them.

Source code also can be analyzed vis-à-vis the data repositories to identify discrepancies. Source code provides the business rules for how the data were to be stored. If an investigator believes that data should not have been altered by anything but that program, the data can be tested to identify anomalies. These anomalies, in turn, may point to non-standard or fraudulent activity.

Transactional data help form the backbone of virtually all forensic investigations, but, with the recent explosion in corporate data volume and complexity, additional information about the data can assist the investigator in answering questions about the data. The storage volumes of one form of transactional data, known as Big Data, are growing at exponential rates. A 2011 McKinsey Global Institute study found that companies in every sector had at least 100 terabytes of data stored and that one financial firm employing less than 1,000 employees had 3.8 petabytes of stored data.1 Analyzing that much data may not be feasible given time and budget constraints.

Analyzing the logic used to create and process that data, however, can expedite the investigation by allowing the investigator to uncover relationships between the systems, as well as identify the business rules used to generate particular values, thereby allowing an investigator to know more about the data and potentially cull which data need to be analyzed. Source code for key business operations can be analyzed to serve such a purpose.

WHY NOT ALWAYS ANALYZE SOURCE CODE?

Source code analysis is a time-consuming process that necessitates programing language expertise. Much like spoken languages, reading and understanding a programming language require detailed recognition of the syntax and semantics of the programming language in order to be able to decipher the source code. An investigator may or may not need to testify about the findings, which mandate a court-acceptable and demonstrable level of expertise.

To expedite the time-consuming process, virtually all companies have automated the majority of their data-related operations. The automation of the complex operations requires thousands, if not millions, of lines of code. While automated source code analysis techniques exist, the majority of the source code analysis is performed by reading each line of code.

Source code is not always complete or easily identifiable. Source code can reside in numerous places such as the programmer’s computer, a network file server or a source code database repository. Many versions of the source code may exist, and, sometimes, the source code is converted into a program, and the original source code is lost. These factors can severely hinder a source code analysis.

HOW DOES SOURCE CODE FIT INTO AN INVESTIGATION?

Source code, like other forms of information, is analyzed entirely or partially, depending on the nature of the investigation. A full analysis is warranted when the amount of source code is small or the business operations must be analyzed in depth. In that scenario, a business operation is in question, and data and working papers alone are not sufficient. The source code is analyzed by hand to identify the key business logic and data sources involved. The analysis findings will be a complete picture of the automated operations.

An analysis of a subset of the source code is performed when specific information is required, but the entire automated process does not need to be known. When an investigator wants to know how a particular task was performed or how a data point was generated, automated searches of the source code are performed to pinpoint the source code file, and an analysis of that file is completed.

The requirements of an investigation will determine how the source code is analyzed. The investigation can be performed against every line of code or a subset of a specific operation. The source code can be analyzed in conjunction with working papers and transactional data or on its own. In any of those scenarios, if a company has developed its own processing logic, an investigator needs to remember that source code is another source of information that can be the key piece of the investigation.

Footnotes

1. Big Data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, May 2011, p. 4: http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation.

The views expressed herein are those of the author and do not necessarily represent the views of FTI Consulting, Inc. or its other professionals. (c)FTI Consulting, Inc., 2011. All rights reserved.