Rhadon - 24-11-2002 at 17:19
Hi
I'm just scanning some chemistry books and will upload them for you once they are done. But I cannot decide if I shall do an OCR or publish the books
in the form of a PDF file being composed of bitmap images.
The OCR will take significantly longer and will be much more work, but it looks better and the filesize is smaller.
The bitmap version will be done much faster, but it will not look as well.
Some of the books will be OCR'ed, some won't. But there are still ones that I'm not sure what to do with, so please help me to decide.
raistlin - 24-11-2002 at 17:36
Hey Rhadon do an OCR. Drop me a U2U if you want any help, I think my dad might have some OCR software around here somewhere...
Rhadon - 24-11-2002 at 17:48
Thank you for offering your help, Raistlin. Unfortunately the the OCR itself is not the most laborious thing: Proofreading the text, correcting
mistakes, positioning images, labelling images and giving the document a nice layout are more work by far.
Thank you also for offering software, but I do already have FineReader 6.0 which I suppose is the best program for this job.
how about this
Polverone - 25-11-2002 at 00:49
I notice that archive electronic journal articles from the ACS are bitmap scans of the original paper but also include an OCR version somehow embedded
in the same file beneath the bitmap. The OCR information that is included has not been hand-corrected - so it has a rather high error rate - but it is
still very useful if you're just looking for certain words in the text. The bitmap version, of course, retains all the diagrams and formulae that OCR
wouldn't handle very well. Can you do this? It won't save space but it will make the texts more useful than plain bitmaps and a lot faster to process
than complete human-proofed OCR.
Rhadon - 25-11-2002 at 11:01
The books which will be published in the form of bitmaps will most likely be processed with Adobe Acrobat Capture. I didn't have the time to test it
yet, but it should create a bitmap PDF file as you described it (text "beneath" the bitmap).
The same system was used for many eBooks on mathematics which I do own, and they have a very low error rate (except for the special characters)! I
think that this depends strongly on the resolution of the source images, which will be 600 DPI in most cases.
Rhadon - 26-11-2002 at 16:24
Damn, I have some severe problems with Acrobat Capture. I tested two releases now, and none of them works properly. Some of the problems are the same
in both versions, others do only occur in one of them.
This will at least delay the release date of the books.
Blind Angel - 28-11-2002 at 18:56
Stupid questio but what are OCR?
Rhadon - 29-11-2002 at 08:58
Blind Angel: It's not a stupid question. OCR is standing for "Optical Character Recognition". Here is what it is and why we do it:
When you scan a text, you get an image at first. The image shows the text page exactly as it can be found in the book. When I was talking of a
bitmap version, I meant that those images would b put together to a PDF file. Unfortunately, the filesize of such PDF files is either quite
large or unreadable because of lack image quality. Usually you are also unable to search the images like you can do it with ordinary text (what can be
a great disadvantage if you are looking for a particular information).
So, we use an OCR program, such as "Abbyy FineReader", which is able to recognize the characters in the image and enabled you to export the
whole text thus obtained. Since there are always some characters that are not recognized correctly (e.g. the small 'l' looks quite similar to the
capital 'I' and 'O' to zero), the text has to be proof-read for mistakes. If it contains images, they will have to be placed in appropriate positions
in the text.
bitmaps
blazter - 30-11-2002 at 09:00
personally, I think bitmaps are the way to go. Just make sure that they can be extracted from the pdf or whatever format they are put in. that way
if someone is ambitious enough they can do the OCR themselves and proof it according to the bitmaps that they have.
Sometimes chemistry books are better in bitmap format because of the images. But they would be even better if they were converted to an easily
printed format like pdf or even html which could be searched easily.
Blind Angel - 30-11-2002 at 11:58
That answer more question thant i though, i always wondered how they were doing these e-Book you can find on the net
Rhadon - 1-12-2002 at 09:45
It wasn't until today that I realized that FineReader offers the user an option to do exactly what I wanted to do with Acrobat Capture. The release of
the first book is coming closer
Announcement: The first book is done
Rhadon - 1-12-2002 at 15:08
Those who have access to EliteForums FTP will be able to download "Nitration and aromatic reactivity" by J. G. Hogget in an hour or
so.
I will upload it to some webspace which can be accessed by everyone when I can find the time to do so. Anyway, I'd be glad if someone else could do
that since I'm quite busy.
Eliteforum - 2-12-2002 at 18:27
I might set up Apache (or IIS depending) so people can upload via FTP and download via HTTP.
If it sounds like a good idea, drop me a line.
Only downside is that, we may have a lot of people downloading/leeching and speeds may be affected.
Rhadon - 2-12-2002 at 23:25
The idea is a good one, but we must find a way to bypass the leechers. A password protection would be nice, but things like that are tricky.