Lockheed Martin – 2000 Decennial Census
Image quality assessment, OCR, ICR, image editing, forms processing
Very few software firms can claim to affect every American. Yet touching “every hand in the land” accurately describes TMSSequoia (now Accusoft), thanks to its participation in the 2000 decennial census. Compared to the numbers involved with most document technology applications, the census statistics are flabbergasting: Over a quarter-billion people registered on forms from over 100 million families, making this the largest, most complex data capture project in history!
For the United States Bureau of Census, approximately 8,000 staffers in four processing centers worked two shifts, seven days a week to complete the project within its 171-day schedule. They occupied almost a million square feet of office space, operating over 160 high-speed scanners and approximately 8000 PCs running under MS Windows NT. During the peak period when citizens mailed in their census questionnaires, over 6,000,000 forms arrived daily.
Constitutional law mandates that the census will be successful, and the accuracy of the count – and hence, the software used to record the count — is tightly regulated. TMSSequoia joined with a consortium of hardware and software developers, under the coordination and general direction of Lockheed Martin, to deliver the largest census in history. They finished ahead of schedule with accuracy above the stringent requirements, approaching 100 percent. The project completed data capture on mailed-in forms two weeks ahead of schedule, giving the door-to-door enumerators a much-needed cushion.
TMSSequoia has been a leading software developer since 1981, providing products and toolkits to some of the world’s largest corporations and governmental agencies. Recipient of several notable awards, it has installed over a million copies of its software for more than 2,200 customers.
Lockheed Martin knew the stakes were high and it would only get one chance at success. The Census Bureau was under considerable congressional pressure to deliver the census accurately and on time. The 1990 census had fallen short of standards set in 1980, and any further erosion of accuracy and confidence was unacceptable.
Thus when it assembled a team to deliver the solution, it determined that only the best was acceptable. “We chose TMSSequoia, as one of the facilitating component software vendors, to perform image processing and image quality assessment,” reports Sean Murphy, Lockheed Martin Census Project Manager. “We also required best-of-breed software technology for workflow, scanner control, and data correction.” TMSSequoia’s accuracy of forms registration and robust image enhancement features were a significant part of Lockheed Martin’s competitive bid for the project.
“This was not only the most extensive use of technology to process a census, it was also the first time the Census has used automated recognition technology to read handwriting,” comments Terry Drabant, president of Lockheed Martin Mission Systems. “Our DCS 2000 helped the Census Bureau capture more data in less time and with greater accuracy than every before.”
TMSSequoia’s image processing software and image quality assessment software joined with high-speed scanners and controllers; workflow; optical mark recognition (OMR); optical character recognition (OCR); and data correction software to complete the total solution.
The FormFix forms processing software registered an original form with a master form. The information extracted in zones from the form underwent OCR and Intelligent Character Recognition (ICR) to populate the census database. (OCR typically describes the automatic reading of machine-generated text, while ICR generally refers to handwriting recognition.) The software’s image enhancement features straightened skewed images and automatically removed specks and smudges from the documents.
The workflow of the DCS 2000 project was ambitious but successful. Every Census form had a unique bar code identification. When the forms arrived at the processing centers, each one was scanned and “checked in.” Then, as each form passed through DCS 2000, the captured data was matched to that form’s bar code. If the hand-written data could not be captured, the system alerted an operator who removed the form to determine the problem. Some forms needed to be keyed by hand, while others just needed to be run through again. Before the forms left the processing center, each one was scanned once more and “checked out.” The computer checked to make sure it had captured all of the data from that particular form. If not, it alerted a human operator who would pull out the form and re-run it through the system. In all, 0.8 percent or about 1.2 million of the 148 million forms were re-run through the system to ensure that all of the data was captured off of each form.
The overall Census processing was divided into two “passes.” The first pass captured the basic data needed for congressional apportionment, while the second pass captured more detailed social and economic data. The DCS 2000 team forwarded results daily to the Census Bureau headquarters.
TMSSequoia technology de-skewed and enhanced each image, and it aligned the forms properly. Only after alignment did forms pass on through the workflow for optical mark recognition and optical character recognition. Preprocessing images by de-skewing and performing image enhancement and form registration was a critical portion of the overall process, and it made possible the exceptionally high accuracy rates required.
During the first pass, the DCS 2000’s accuracy rate for the critical optical mark recognition was 99.89 percent, well above the 99 percent standard established by the Census Bureau. For OCR, the system boasted a 99.4 percent accuracy rating, again far exceeding the 98 percent requirement. The second pass resulted in 99.88 percent accuracy for OMR, and 99.7 percent for OCR. The accuracy rates for manual keying of data were greater than 97.6 percent, also above the 96.5 percent goal.
The system’s accuracy also led to significant cost savings. “This system was so dependable and efficient, we reduced the number of human operators needed to support manual keying by as much as 75 percent,” said Richard Taylor, DCS 2000 system architect. “That reduced our customer’s labor costs substantially.”
With the decennial census successful and complete, the DCS2000 System, including TMSSequoia’s forms processing and image enhancement functionality will speed the tallying of information on more than 30 million returns in the 2001 British Census.
“It isn’t often a company gets the chance to be a part of history,” said former TMSSequoia president Debbie Mosier. “We believe TMSSequoia earned the opportunity because our development team is passionate about producing great software, which makes our image enhancement and forms processing technology world class. It’s definitely technology that the Census could count on.”