|

Recommendations
for Phase 2:
Selection of Service Providers and Funding (July 04 - June 05)
Phase 1 of the California Newspaper Digitization
Project had five goals:
- Determine the amount of information able to be captured digitally
from existing microfilm. Conclusion: While the quality of capture
varies among service providers, much of existing microfilm appears to
be adequate as a source for digital capture.
- Document the strengths and limitations of available search and retrieval
software.
Conclusion: While comparisons among products were frustrated
by the absence of a common database (see Recommendations on performance,
point 1, below), reviewers felt that the results were sufficiently valuable
to justify undertaking a project to create online access to historical
California newspapers.
- Compare benefits and costs among service providers.
Conclusion: Features varied widely as did costs; however, because
only a minority of products were considered usable without further development,
or with minor further development, cost options were narrowed to a relative
few.
- Estimate production requirements and costs for a one million page
database.
Conclusion: Folding in costs for project operations, but excluding
institutional indirect costs, the project can be estimated at $1.5-3
million, depending on content management system/service provider. Production
requirements were not addressed in detail.
- Create a publicly accessible website to report the findings of the
study and to showcase the benefits of online access to historical newspapers.
Conclusion: the California Newspaper Digitization Project located
at http://cpc.stanford.edu/cndp. The Project very much has achieved
its goal to showcase the benefits of online access to historical newspapers,
and continues to achieve as more visitors explore the site. Many thanks
are extended to the service providers who participated in the Project
and helped create a broad base of public and professional support for
this type of information resource.
Recommendations on features
- Select a system with careful attention to both "essential"
and "important" features identified in the Phase 1 evaluation.
- Ensure that the system has a search feature "select publication"
when the database consists of several-or hundreds-of newspaper titles.
- Explore a capacity to export machine-readable text files of desired
articles if text images can be ocr'd with reasonable accuracy.
Recommendations on performance
- Require that all respondents to the RFP digitize the entire roll of
test microfilm. Side-by-side comparisons of products in terms of image
capture, ocr, and retrieval were rendered impossible in Phase 1 because
not all service providers digitized the whole roll of test microfilm.
To the best of the evaluation team's ability to tell, there wasn't even
a single page digitized by all service providers in order to make visual
comparisons, let along comparisons among search capabilities and results.
- With comparable databases from the RFP respondents:
- determine the best capture settings (e.g., resolution of film
imaging and bit depth) for readability of the facsimile image by
visual inspection of the display of a given page digitized by the
different systems;
- determine ocr accuracy by comparing outputs among samples of ocr'd
texts;
- compare search results across the several participating service
providers for relative comprehensiveness and accuracy.
- Ascertain from the service providers that the highly desirable short
response times demonstrated at most of the test sites will remain short
as the database grows to accommodate full runs of titles and many titles.
Recommendations on cost elements
Some price estimates in response to Phase 1 were confidential, so specifics
for individual service providers are not included on this website. However,
some guidance can be derived from providers' several estimates, with full
recognition that they indeed are estimates:
- For a one million page newspaper database project, price estimates
for digitization and image processing (including ocr) ranged from about
$400,000 to $2,600,000. Generally, the more specialized the search capability
and sophistication of retrieval, the more expensive the digitization
and image processing services.
- The RFP should be specific with regard to a need for hosting services.
If hosting services on provider-owned hardware and software are required,
some providers may choose not to respond to the RFP. Some service providers
offered to host public access to the database only on client-purchased
equipment, others offered only digitization and content management software,
and yet others offered a service to procure the appropriate hardware
and train local staff on systems administration rather than host themselves.
- Much additional exploration needs to be undertaken relative to preservation
repository services for the database. There may prove to be few preservation
providers to evaluate relative to conversion providers; the nature of
preservation services beyond security, back up, and periodic copying
from medium to medium, need to be clearly understood.
|