Fiat Coupe Forum
- Founded by Kayjey & James Northam
- Funded by the Club for the benefit of all owners
Fiat Coupe Club UK
join the club
Fiat Coupe Forum
 
» Announced
    Posting images


» Related sites
    Main club site
    fiatcoupe.net


» External data
    owners listed
 
Who's Online Now
0 registered members (), 263 guests, and 3 spiders.
Key: Admin, Global Mod, Mod
Forum Statistics
Forums70
Topics113,851
Posts1,341,476
Members1,753
Most Online2,655
Aug 7th, 2025
Top Posters(All Time)
barnacle 33,792
stan 32,122
Theresa 23,331
PeteP 21,934
bockers 21,079
Edinburgh 18,026
JimO 17,919
Nigel 17,379
RSS Feeds
Club Events
Club Information
Track Events
Rolling Road/RWYB
Social Events
Non-UK Events
Coupé Related Chat
Coupé Spotting
Coupé News/Press
Buying/Selling Advice
Insuring a Coupé
Basic FAQ's
How to Guides
Forum Issues
Technical Problems
General Maintenance
Styling
Tuning
Handling
ICE and Alarm
Coupés for Sale
Coupés Wanted
Parts for Sale
Parts Wanted
Group Buys
Business Forum
Other Vehicles for Sale/Wanted
Other Items for Sale/Wanted
Haggling/Offers
Ebay links
Other Cars
Other Websites
General Chat
Previous Thread
Next Thread
Print Thread
Forum powah! #1413440
28/02/2013 11:56
28/02/2013 11:56
Joined: Dec 2005
Posts: 33,792
Berlin
barnacle Offline OP
Club Member 18 - ex-Minister without Portfolio
barnacle  Offline OP
Club Member 18 - ex-Minister without Portfolio
Forum Demigod

Joined: Dec 2005
Posts: 33,792
Berlin
Here's a good one.

I'm looking for a reference corpus of scanned data for OCR testing. It has to be free, it has to be scanned at at least 200dpi and preferable 300dpi, and it has to be available to non-academic researchers like me.

I've found some of the free newspapers work moderately well, but they all seem to be scanned at too low a resolution to be able to get sane results from OCR - though they can usually be read ok.

I need about a million words, preferably three or four million... that's about five King James Bible's worth.

Any thoughts?


[Linked Image]
Don't get no respect! Coupe Fiat 1994-2000 - an owner's guide <-- clicky!
Re: Forum powah! [Re: barnacle] #1413461
28/02/2013 13:54
28/02/2013 13:54

L
Lego
Unregistered
Lego
Unregistered
L



You could possibly try Car Workshop repair Manuals.
Might just mean registering at Car Club first. I do remember downloading gigabytes of scanned repair manuals for one of my old cars years ago.
I'll try to find the site again - if it still exists.

Last edited by Lego; 28/02/2013 13:54.
Re: Forum powah! [Re: barnacle] #1413470
28/02/2013 14:22
28/02/2013 14:22
Joined: Dec 2005
Posts: 8,671
Lightwater, Surrey
DaveG Offline
Club Treasurer Member 311
DaveG  Offline
Club Treasurer Member 311
Je suis un Coupé

Joined: Dec 2005
Posts: 8,671
Lightwater, Surrey
Well what about our own (fccuk) scanned copy of the Coupe workshop manual? Might as well produce something useful at the end of it!


1996 Portofino 20vt & 2000 Pearl White Plus
2008 Ferrari F430 & 2017 Fiat 124 Spider
Re: Forum powah! [Re: barnacle] #1413475
28/02/2013 15:28
28/02/2013 15:28
Joined: Dec 2005
Posts: 33,792
Berlin
barnacle Offline OP
Club Member 18 - ex-Minister without Portfolio
barnacle  Offline OP
Club Member 18 - ex-Minister without Portfolio
Forum Demigod

Joined: Dec 2005
Posts: 33,792
Berlin

Good thoughts but text is preferable to images.


[Linked Image]
Don't get no respect! Coupe Fiat 1994-2000 - an owner's guide <-- clicky!
Re: Forum powah! [Re: barnacle] #1413476
28/02/2013 15:32
28/02/2013 15:32
Joined: Dec 2005
Posts: 21,934
Aldershot
PeteP Offline
Hon Club Member 005, Membership Secretary
PeteP  Offline
Hon Club Member 005, Membership Secretary
Forum Fossil

Joined: Dec 2005
Posts: 21,934
Aldershot


16VT and X1/9 1500

We must all do our part for the planet.
I unplugged a row of electric cars that nobody was using.
I even unplugged my own.
Re: Forum powah! [Re: PeteP] #1413482
28/02/2013 16:12
28/02/2013 16:12

L
Lego
Unregistered
Lego
Unregistered
L



Needs to be scanned text Pete.

Re: Forum powah! [Re: barnacle] #1413536
28/02/2013 21:09
28/02/2013 21:09
Joined: Dec 2005
Posts: 8,671
Lightwater, Surrey
DaveG Offline
Club Treasurer Member 311
DaveG  Offline
Club Treasurer Member 311
Je suis un Coupé

Joined: Dec 2005
Posts: 8,671
Lightwater, Surrey
There is text in the Coupe workshop manual, pictures too, but still a lot of text wink


1996 Portofino 20vt & 2000 Pearl White Plus
2008 Ferrari F430 & 2017 Fiat 124 Spider
Re: Forum powah! [Re: barnacle] #1413580
01/03/2013 07:33
01/03/2013 07:33
Joined: Dec 2005
Posts: 33,792
Berlin
barnacle Offline OP
Club Member 18 - ex-Minister without Portfolio
barnacle  Offline OP
Club Member 18 - ex-Minister without Portfolio
Forum Demigod

Joined: Dec 2005
Posts: 33,792
Berlin
But not three million words.

The problem is that the good OCR products are too good: if I just feed it a clean image of a book I get *no* errors in a hundred thousand words. As I'm trying to do so tests that require errors to be there, that's useful but a little pointless...

I've been playing with a complex but repeatable pre-process method that involves turning the text into a pdf, extracting the pages from the pdf with a low resolution and saving as jpegs, and then doing the OCR. That gives me introduced noise from the scaling and the jpeg artifacts which introduces some errors to the OCR.

I guess it will have to do.


[Linked Image]
Don't get no respect! Coupe Fiat 1994-2000 - an owner's guide <-- clicky!
Re: Forum powah! [Re: barnacle] #1413616
01/03/2013 10:30
01/03/2013 10:30
Joined: Aug 2000
Posts: 9,729
Zele, Belgium
Kayjey Offline
Club Member #10
Kayjey  Offline
Club Member #10
Je suis un Coupé

Joined: Aug 2000
Posts: 9,729
Zele, Belgium
I found the ultimate test for OCR was scanning in pages from a book my father wrote on an electronic typewriter 7 years ago. The ink ribbon was past its best and the top of the characters were only half or so the density of the bottom half. I resorted to retyping almost everything because the OCR programs really couldn't make much out of it. I probably still have the scans. Can you do something with that?


- Kayjey -

[Linked Image]
[Linked Image]
Re: Forum powah! [Re: barnacle] #1413648
01/03/2013 12:51
01/03/2013 12:51
Joined: Dec 2005
Posts: 33,792
Berlin
barnacle Offline OP
Club Member 18 - ex-Minister without Portfolio
barnacle  Offline OP
Club Member 18 - ex-Minister without Portfolio
Forum Demigod

Joined: Dec 2005
Posts: 33,792
Berlin
It would be interesting to find out, though I couldn't use it as a reference for a published paper. You must have an email for me? Current OCR programs are very very good...


[Linked Image]
Don't get no respect! Coupe Fiat 1994-2000 - an owner's guide <-- clicky!
Re: Forum powah! [Re: barnacle] #1413650
01/03/2013 12:58
01/03/2013 12:58
Joined: Dec 2005
Posts: 16,603
Corridor of Uncertainty
J
Jim_Clennell Offline
Forum veteran
Jim_Clennell  Offline
Forum veteran
J

Joined: Dec 2005
Posts: 16,603
Corridor of Uncertainty
Was your father's book written in English, Klaas? Neil, I thought your algorithm only worked on English text?

Re: Forum powah! [Re: barnacle] #1413666
01/03/2013 14:25
01/03/2013 14:25
Joined: Dec 2005
Posts: 33,792
Berlin
barnacle Offline OP
Club Member 18 - ex-Minister without Portfolio
barnacle  Offline OP
Club Member 18 - ex-Minister without Portfolio
Forum Demigod

Joined: Dec 2005
Posts: 33,792
Berlin
Ah, yes... forgot that. Though it would still be interesting to see what happens. I theorise it would work with Latin languages, less so with French (yes, yes, I know) and German. Though there are enough French and German influences in British spelling that Low German might work...


[Linked Image]
Don't get no respect! Coupe Fiat 1994-2000 - an owner's guide <-- clicky!

Powered by UBB.threads™ PHP Forum Software 7.7.1
(Release build 20190129)
PHP: 7.3.33 Page Time: 0.010s Queries: 14 (0.004s) Memory: 0.7963 MB (Peak: 0.9026 MB) Data Comp: Off Server Time: 2025-08-19 17:32:49 UTC