The Kenny Report - help needed

Heya lads

I’ve started the process of digitising significant historical government reports, starting with Ansbacher and Kenny.

I have scanned and uploaded three versions of the Kenny report and need your help to create a clean digital version. The optical character recognition process is not 100% (especially on old and scanned text) and it would be nice to have a new digital version that could be printed in the future - the report is long since out of print.

My suggestion is this:

Go here: thestory.ie/2009/12/17/the-kenny-report/

Click on the third “raw version” which is on Google Documents and seek to become a collaborator with me to correct misspellings and faulty OCR processes. With a few people working on it we could have a correct and clean copy of the report in a small amount of time. You can check against the original scan which is on the first link.

I hope some of you can help out!

Do we want to pick a group of pages to work rather than have everyone cover the same ground again?
For instance I can start at page 100 and continue through to 120?
Someone else can do 120 to 140 etc?

Coudl we invoke the collective power of play to do this to this document and others using the pin as a catalyst.

Fucking sure we can! I’ve a great idea.

Actullay scratch that google have made it so :unamused:

What I was thinking if we had a program that gave out a page of text per user log-in voluntarily accepted of course to be spell checked. It would be great to have some automation to dole out the pages efficiently. This little routine might only need a minute or two (possibly less) as its not hard to spell check one page quickly when the spell checker is highlighting the problematic words.

Integrating it into the log to phpBB or as a “donate a minute” of your time button could do wonders for the openness of information in general. This is the real power of the net collective interaction the net provides us.

I see you can achieve this with google docs I see but what happens when you have people working on the same page what does google do in this instance to mitigate unnecessary overlap?

Well you can turn on the spell check first, it will highlight the main mistakes in yellow…I dont think theres a need to allocate pages, most of the mistakes are OCR ones… and are obviously in need of correction. The changes are also tracked… click file, revision history.

Leave formatting issues to the end, i will reimport back into MS Word and create a final version.

I can just edit it directly - it doesn’t offer me the “collaborate” option.

Is this because I’m not French?

There are 124 pages in the report.

It is broken down into Majority report and Minority report.

The reports are further broken down into chapters I → XI

chapter II of the majority report has been cleaned up. note to those editing subsequent chapters, there were some paragraphs missing from that chapter II compared with the original document. Something to bear in mind.

If you have a high resolution display The handiest way to edit the google collaborative document is to open the original PDF side by side with the google collaborative document (tip: open both windows and select tile vertically on windows desktops).
You can zoom the PDF to make it legible.

If you want to pick two to 4 chapters each (or how ever much time allows you), we can probably get it done in several days.

I decided to open the doc open completely to make it easier. Watch out for vandalism though.

The original OCR is packed with errors?

Should we correct those?

I’m looking at paragraph 29 as a random starting point and all the errors I see so far are in the original print! (i.e. the OCR has worked well!).

Grand - had a play yesterday but didn’t want to do anything serious without checking first. Forgot to check. Thanks.

As long as the new doc is the exactly the same as the original… you will see character errors in the google doc… wayward symbols etc… but I did run an extensive OCR process to remove as many as I could automatically.

Bollocks, I’m looking at the wrong comparator, I think. I should be looking at the pdf on the Labour site as the original?

I’ll do chapter V of the majority report.

Yes: here is is irishlabour.com/dublinopinion/kenny%20Report.pdf

Maybe, use the Labour one if you like…

Chapter III

Just an example of the OCR errors:

That’s the kind of thing that would need looking at.

How to link footnotes?

edit: Ignore - going blind…

Chap V done moving onto Chap VI