Tag Archives: TEI

Examining Database Projects – The 1881 Canadian Census Report

First and foremost, the major problems and challenges with the 1881 Canadian Census project that Lisa Y. Dillon discusses in her article can be summarised as a result of the collaboration (or lack thereof) between social science historians and genealogy groups, who employ conflicting project standards, goals and objectives of data processing. The extremely large file size of the 1881 census project and limited funds available for data processing led to this collaborative union between social science historians and genealogy groups, but it led “to compromised data quality to a certain extent” (166).

Data cleaning and checking difficulties were not only caused by the “hundreds of geographically-dispersed volunteers using fairly basic data-entry programs” (166), but also the decentralisation of data entry efforts. LDS issued a different set of objectives to social science historians. Data entry was further compounded by LDS’s use of Universal Data Entry Software instead of CFP or MHCP that offer automatic prompts or checks (167). LDS chose software simplicity over quality verification software features thus causing the presence of “preventable typos” in the database (167).

In addition to the issues with LDS’s guidelines and choice of software, LDS also instructed their volunteers to enter the same data twice and compare results. This instruction, in my opinion, appears to be a laboriously counterproductive task not only because appropriate software could have eliminated this unnecessary task, but also it’s a time consuming task that does not eliminate the repetition of data entry errors from the first inaccurate entry made.

Furthermore, the division and uneven distribution of microfilm strips, the lack of distinction between volunteers’ and enumerators’ comments, the partial exclusion of crossed-out line entries, and the omission of family numbers further impeded data entry accuracy.

Having said that, however, these aforementioned issues were possible to address in comparison to other irresolvable issues such as the mistranscriptions of last names — which could be a result of an original enumerate error, poor microfilm quality or illegible handwriting — and the Francophone-Anglophone linguistic translational discrepancies.

Dillon notes that the omission of French accents by the Anglophone volunteers and its inclusion by Francophone volunteers created a significant problem with the 1881 Census project. Subsequently, additional work is required to restore the French accent in the database.

Although Dillon highlights a significant number of problems with the 1881 Census project, the genealogy group volunteers were only responsible for a surprisingly low percentage of errors – 1%. (It is worth noting, however, that the 1% Dillon quotes is the percentage for ‘detected’ errors; not to mention the percentage of undetected errors not yet accounted for.)

With these problems in mind, Dillon claims that the 1881 Census project was affected less by transcription errors than decisions made at the commencement of the project (173). Therefore, Dillon proposes that the conflicting collaborative standards of social science historians and genealogy groups could be eliminated by centralising data entry rules and processes with the identification of a central institution that could dedicate sufficient personnel to inspect the project work (174).

The latter point Dillon makes about the inspection of geographically dispersed volunteers’ work may not have been as feasible at the time the 1881 Census project was being created. In our modern day society, however, project work collaboration and inspection could be easily conducted in a Web 2.0 environment because the vast majority of society have adopted social media as one of their primary sources of communication.

When marking-up Prof. Ruth Sherry’s bibliographic index cards in XML and TEI P5 for the Frank O’Connor Research website, I found it extremely beneficial to be able to collaborate with my colleagues and supervisor via social media to resolve any issues. Some of the difficulties encountered in the 1881 Census Project that Dillon mentions, I experienced when marking-up the bibliographic index cards, such as illegible handwriting, crossed-out material, typos and errors made by the author, and occasionally some poor quality material that had been soiled and thus making the text indecipherable. Marking-up, however, wasn’t the challenge; deciding what quantity of the material to mark-up was though. Dillon refers to this discrepancy between the needs of academic and genealogical researchers because social science historians desire to record everything while genealogists aim for 100%, but limit their variables. When it comes to this decision, I tend to agree with the social science historians record-everything rule because I believe in the adage that, “If you’re going to do it, do it right,” which is, I suppose, the message of Dillon’s article.

Dillon, Lisa Y. “International Partners, Local Volunteers and Lots of Data: The 1881 Canadian Census Report.” History Computing 12.2 (2000): 163-176. JSTOR. Web.

Frank O’Connor Research Website

Add to FacebookAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to TwitterAdd to TechnoratiAdd to Yahoo BuzzAdd to Newsvine