Need PDF Editor

Recommended Videos
Jan 12, 2012
2,113
0
0
I'm doing some editing of a story that a friend wrote, and I've run into a technical problem; the PDF is 500+ pages of text, and I have no way to edit it. It won't open in Microsoft Word (which keeps returning errors, regardless trying to open in it in Repair or Read-Only, or in AbleWord, which just flat won't work.

Is there a good program for editing massive text PDFs?

(To answer the obvious question, it's a surprise cleaning up of a story he wrote a few years ago, so I can't ask him what program he used to originally write it.)
 

Frezzato

New member
Oct 17, 2012
2,448
0
0
You might want to start here [http://www.pcadvisor.co.uk/how-to/software/how-edit-pdfs-for-free-3460855/] as it's fairly recent info.

I don't recommend any PDF products from Nuance.

I once used a program called Nuance PDF Converter [http://www.amazon.com/PDF-Converter-Professional-5-0-VERSION/dp/B001327K8O/ref=sr_1_22?ie=UTF8&qid=1434990303&sr=8-22&keywords=nuance+pdf], although I can't remember if it was #5 or #6. I think they stopped numbering it at 8 and moved onto a naming convention like "Power PDF Standard" or something like that.

Nuance PDF 5/6 was excellent, with caveats. When it worked, it was golden, but some PDFs couldn't be edited outright, so you would have to run the OCR and convert it to Word. If the PDF contained anything other than straight paragraphs, the conversion would be filled with all sorts of artifacts and paragraph returns that were made in an effort to match any offset text blocks or captions. It always took so long to correct the botched conversion that I often felt it would be faster to just retype everything from scratch, which I sometimes did (I type 120 wpm on a good day).

I used Nuance PDF 5 at work though, and when I bought my own copy at home I went with the then-latest version, PDF Converter 8. It ran like shit. Oh right, both PDF Converter 5 and 8 have to "phone home" as part of a DRM scheme. PDF 8 was phoning home so much that it killed my already poor internet connection and slowed my PC to a crawl. It was a very stupid experience.

I have a Pro copy of Adobe Acrobat myself, but according to the article above, I suspect that Libre Office [https://www.libreoffice.org/] is your best (free) choice. The author did note that each line in the conversion gets its own text box, which doesn't sound all that convenient. Still, I would give it a try first.
 

DoPo

"You're not cleared for that."
Jan 30, 2012
8,663
0
0
To my knowledge, you can't really edit PDFs. Or rather, some programs could, but that's not how you usually create them - what you normally do is write a normal[footnote]In the sense that "it's not a PDF", there are multiple available options, though.[/footnote] document and then convert/compile it into a PDF.

LaTeX is the one of the most common things that can generate PDFs - it's a markup language which means you take your raw text, add some markup tags and then that is then used to produce the PDF. Alternatively, what's easier, is to create a different document - word processor documents (.doc, .docx) can be straight converted into a PDF from MS Word (or other word processors), you could also create an HTML page and use a browser to save it as PDF. There are few other methods, but these are probably the most common.

All in all, this is to say that the easiest way to edit the thing is to get the source or the text and re-do the PDF document.

There is some software that allows editing of PDFs but I don't know how good they are - all the editing of PDFs I've done so far is when it's just a form you fill in, not using it as a writing tool. And I'm not sure how many, if any, of the applications available would handle a massive file. Dunno, maybe somebody else can help on this front.

If the source is not available, and formatting isn't an issue (it can either be redone relatively easily, or there isn't any, or it's irrelevant), then you could try extracting the text and just working on that, then (if needed) export it to PDF. That would probably be your best bet, if you ask me. There are few tools that do that, and as long as the PDF is sensible, they can just parse out the text and spit it out in some convenient format. I believe Acrobat Reader can do it, too, but there is also this [http://www.extractpdf.com/] or anything you can find for "extracting text from PDF". Once you get the text, just edit it in anything convenient - be it a text editor or a word processor.

If you find there is a tool that chokes on the large file, you can slice up your PDF into smaller chunks - this seems as a relatively simple and straight forward way to do it [http://www.wikihow.com/Extract-Pages-from-a-PDF-Document-to-Create-a-New-PDF-Document].

Finally, it's worth mentioning, although I wouldn't recommend it - you can decompile the PDF. Again, few options tool-wise for this, Google would be your best friend if you want to give this a try. This should, in theory, take a PDF, do some magic, and return you the source of it you can edit then recompile and you should get a PDF in the exact same layout and formatting. However, a decompiler would give you LaTeX, not the actual file that was used - so depending on how exactly the PDF was created, this technique is not guaranteed to give you really good results. If it was an MS Word document that was converted to a PDF, for example, chances are you're going to get a lot of garbage spat out as the source.
 

DoPo

"You're not cleared for that."
Jan 30, 2012
8,663
0
0
Frezzato said:
You might want to start here [http://www.pcadvisor.co.uk/how-to/software/how-edit-pdfs-for-free-3460855/] as it's fairly recent info.
I read that article and I had to laugh at the conclusion

Article said:
After trying many free and commercial tools it's clear that PDFs simply aren?t designed for editing
Wow, jeez, it's almost as if that was clear without even needing to try these tools. Because PDFs really aren't designed to be edited. That is an actual part of their design, yes. That is, in fact, a very prominent part of their design. That's pretty much a big part of their purpose. The intended lifecyle of a PDF document is 1. Create 2. Distribute. 3. Done.

The PDF editors do something that I can only assume can be described as "horror" in order to offer the editing functionality, hence it's doable but not necessarily pretty.
 

Frezzato

New member
Oct 17, 2012
2,448
0
0
DoPo said:
Heh, yes, this is true. And in retrospect, I remember now what I was using Nuance for, and it wasn't always for converting entire documents, which, again, were usually a nightmare to convert. My manager sometimes had me "correcting" PDFs by changing a few things here and there because I was the only one who understood how to use it. It was nothing illegal mind you, but done to conform our documents to a level required by the customer, which was usually some draconian standard dictated by a petty despot. It's best not to dwell on details though.

I will say, however, that unless a strict chain of custody is maintained, and a PDF is produced with all possible anti-tampering measures in place, that the PDF is not truly secure. You would be surprised what can be accomplished with a PDF program like Nuance and MS Paint.
 

DoPo

"You're not cleared for that."
Jan 30, 2012
8,663
0
0
Frezzato said:
I will say, however, that unless a strict chain of custody is maintained, and a PDF is produced with all possible anti-tampering measures in place, that the PDF is not truly secure. You would be surprised what can be accomplished with a PDF program like Nuance and MS Paint.
I'm not talking about security - the availability of PDF editors would prove that it's not supposed to be a secure format by default. It's just not designed to be editable, which is also easily proven by the abundance of PDF editors which work to varying degrees and in different ways. Also, since it's just the standard.

Yes, they can be edited but that's either down to manipulating the raw PDF which is messy and horrendous, or converting it to a different format internally and recompiling it back into a PDF, which is also messy and horrendous to possibly a smaller extent.
 
Jan 12, 2012
2,113
0
0
Thanks for the help, guys; It was resistant to converting to .doc and .docx, but hopefully LibreOffice or that text extractor will work.

I had no idea that PDFs were so heinous to work with. Related question: Is there a format that researchers, copyeditors, etc. use when they need to handle large documents? I've written a bit in the past and I've noticed that Word starts acting up once you pass a certain threshold, plus it's a pain if you have any images in the document.
 

Frezzato

New member
Oct 17, 2012
2,448
0
0
DoPo said:
Frezzato said:
I will say, however, that unless a strict chain of custody is maintained, and a PDF is produced with all possible anti-tampering measures in place, that the PDF is not truly secure. You would be surprised what can be accomplished with a PDF program like Nuance and MS Paint.
I'm not talking about security - the availability of PDF editors would prove that it's not supposed to be a secure format by default. It's just not designed to be editable, which is also easily proven by the abundance of PDF editors which work to varying degrees and in different ways. Also, since it's just the standard.

Yes, they can be edited but that's either down to manipulating the raw PDF which is messy and horrendous, or converting it to a different format internally and recompiling it back into a PDF, which is also messy and horrendous to possibly a smaller extent.


Yes, that's all true. I have yet to find an ideal program even for converting. OCR technology has come a long way, but not far enough.

.

Thunderous Cacophany I think you stand a fighting chance if you're dealing with plain paragraphs. I would start basic.

Depending on how the PDF was created, you might be able to just highlight and select text. Adobe provides a free Acrobat reader [https://get.adobe.com/reader/] (beware any stupid additional program installs). If you can highlight text then you might be able to just 'select all' and copy. For a 500-page document, this might take a while to copy to the clipboard. If you have MS Word (really any word processor), you should be able to paste the entire doc. The result might not be ideal as Word may struggle to match the typography.

I would try the free Acrobat reader before trying another program. All you really need (if free Acrobat doesn't work) is an editor that has the ability to highlight/copy/paste from the PDF to do the above steps. Well, that and have the ability to save a Word doc as a PDF once you're done. I say skip the "PDF editing" portion and just grab the text.

Again, I'm using Acrobat Pro, but I just copied a PDF book into word and, while it isn't exactly pretty, it did the job. I believe there's a function within Acrobat Pro that allows an author to block even copying, but let's hope your friend didn't know about stuff like that.
 

DoPo

"You're not cleared for that."
Jan 30, 2012
8,663
0
0
Thunderous Cacophony said:
I had no idea that PDFs were so heinous to work with. Related question: Is there a format that researchers, copyeditors, etc. use when they need to handle large documents? I've written a bit in the past and I've noticed that Word starts acting up once you pass a certain threshold, plus it's a pain if you have any images in the document.
Yep, they use PDF.

Or, rather, LaTeX, which they then compile into a PDF. Academics do use it a lot, especially in the science fields. In theory, it's not that bad, you do something like

\paragraph{My paragraph text goes here}
and then compile it and you get a paragraph of your text. If that paragraph then obeys the text flow rules you can set up - say, you may want two (or three) columns per page, you can set each line to end in a whole word, or you can allow word breaking, you can specify how the text would flow around images (for example, it may not, or you can always have it on the left side), you can even start each paragraph in a fancy way (say, a giant calligraphed) first letter, or just slightly indented).

You can also add other markup that would decorate all your pages, say, with a border, or a header, or a footer (or any combination thereof).

You can specify other things that aren't paragraphs, like citation blocks or any other thing you can think of (citation references, example sections, tooltip boxes, info segments, just to name a few possibilities). You can make up markup, too for anything custom you want.

You can not only do text control and styling, the LaTeX processing can give you dynamic[footnote]-ish - it's done once per compilation[/footnote] content, such as table of content (same as with a Word document - just takes headings and stuff) or image insertion (you mark them up with just a reference, the engine then inserts them into the document) or even references. That one is quite cool - you just have a completely separate file with references - in there you describe the sources (Authors, publishing year, ISBN, yadda yadda, plus a simple name) and just reference the simple name - the LaTeX processor then inserts the correct format of the reference in the text and all of the references after the document also in the format you want if you want a reference to be a "[1]" and then at the bottom to have "[1] , ", that's what you get, if you want to then change all references everywhere in the document to be "[]" and at the end have "[] <Name of book/source>, , , , , " you just need to recompile with that option as opposed to changing everything.

And there is more. It's not entirely dissimilar to HTML or forum markup, but it's quite more powerful. Also, it's not the same mind you, but has the same idea and somewhat similar execution - it's, after all, a markup language.

As I said in the beginning, though, that's in theory. In practice, you might find some parts of LaTeX to be a black art. It's not that bad, for the most part, as you can just use (or create, if you wish) a template and stick to that, but sometimes if you want to do something fancy without fully understanding how it may require a goat sacrifice and/or black candles and chants.

Still, it's better for editing large documents since, as you've seen, Word tends to choke up on them badly. Heck, it's better today - in the past some arbitrary, but still strict, limit was in place (say, 37 pages - consistent but weird and not generally advertised) beyond which Word would completely crap out - corruption of the document would not be uncommon, often with bizarre effects, say, you may be able to save, but never open or similar. Nowadays it tends to be a tad more graceful when it starts dying in agony due to having too much content. Also tends to handle it better, if I recall correct, in that it doesn't really die it just stays in the process of dying in (slightly less) agony.

I do recommend giving it a look, I just want to warn you it's not as user friendly as Word and may have a learning curve.
 

Smooth Operator

New member
Oct 5, 2010
8,156
0
0
Sadly PDF remains one of the many relics of weird shit, every piece of software can display it properly unlike all other text documents in this universe, but almost none can edit properly because everyone does their LaTeX conversion differently.
On top of that if the author had the file set to secure you have no chance to ever get the contents out directly, you are stuck with screen prints which are of no use for editing.

You can try Foxit reader also, it's very fast and compact so you can put to use quickly but I've had no more luck getting that thing to edit then with anything else.

Edit: Professionally distributed works are usually PDF on account of them always appearing as they were created, no matter if they came from 50 years ago or they will be read 50 years in the future you can count on the formatting to remain as intended. Also it isn't expected that you hand out an easy to manipulate piece because anyone could change your work and send it forward with your signature still there, the advised way to distribute serious work is by setting your PDF for reading only so it's integrity remains no matter how it gets distributed.
Internally you can use whatever you want as long as it keeps working, but it's never smart to make a thousand page document because it's a ***** to open/edit/save as you are dragging the entire data around for the ride and should anything go wrong with the file the entire thing is fucked.
I suggest OpenOffice with the open file format because those guys do work towards keeping their stuff working across all platforms and cross version at all times, and you don't need their software to open it again because that is a widely used format. Even MS got forced into supporting the format, but of course they made sure it will never convert properly on their stuff.