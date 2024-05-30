This is an overview article, not a comparison of all the different programs available for editing PDF documents. We’ll go through the available free options and touch on data security, but first, let’s outline the main problems.

The PDF format itself is quite versatile; it can be used as a container to store a wide variety of information: videos, raster and vector objects, fonts, and even other documents. A PDF can be digitally signed or password protected. Also, depending on the need, the document can display a layer with either a recognized or scanned version of the document. But what is not available to many users is the ability to make changes.

Test document

The file Welcome_PDF.pdf was found for demonstration. It looks simple, has only five pages and takes up 154 kilobytes. But it has a rich design with pictures framed by text and internal fonts. This file is the welcome message when you first start Adobe Acrobat, so you can study it yourself if you want.

To process the scans, we print and scan the original file and get Welcome_PDF_scan.pdf. The scanned version with 600 DPI takes 20 megabytes. This is quite a lot for a text file, but it only emphasizes the problem. Various MFPs most often offer to save a document with 400-600 dpi quality as a tiff or pdf file. These formats can store many pages in a single file, but in both cases, specific tools are required for further actions with documents. And there is a demand for editing scanned documents, such as removing blank pages, sorting them in the right order, or reducing the space occupied (if you need to submit a scan of a document to a government agency that accepts files up to 1-2 MB).

IrfanView as a Swiss knife

The program is free for everyone except commercial organizations. It is constantly evolving, has many additional plugins, supports a large number of graphic files, and has good automation tools. With this program you can:

scan documents in batches (via USB);

disassemble a pdf document into separate pages;

edit or delete some pages;

select contrast, gamma, add filters;

reduce size in batches, increase sharpness;

collect individual pages into a pdf.

We installed the program iview467_x64 and the official plugins iview467_plugins.

You can also install the Tesseract text recognition module (select the required languages when installing it).

There’s also an add-on called GhostScript. It will open all vector formats quickly, and when converting pdfs, you’ll get a little better quality than the standard plugin.

After installation, go to the settings and check the automatically detected plugin in [Plugins] and [Ghostscript Plugin Options]. If you have to process documents full of text, tables, and graphics, it is better to set the [Set DPI] parameter to 120×120 or even 160×160. After that, go to [PDF Options] and disable the internal plugin.

Please enable JavaScript in your browser to view the gallery.

Open the Welcome_PDF_scan.pdf file in irfanview and first go to the [Edit Multipage PDF] tab to demonstrate quick and easy editing of a pdf document. Here, you can only sort or delete unnecessary pages.

Please enable JavaScript in your browser to view the gallery.

Now we go to [Multipage Images] and see a rather laconic export window. In the jpg settings, you need to choose a quality of about 85-90 units. Remove the additional storage options EXIF, IPTC, XMP to reduce the size by removing possible accompanying information. We get pages in jpg format.

Please enable JavaScript in your browser to view the gallery.

Now go to [Multipage Images] again, select the pages you need, and leave Compression at 95. The result is an image.pdf file with a size of 497 kilobytes.

Please enable JavaScript in your browser to view the gallery.

What else can be done? You can reduce the size by using additional batch actions. Select [File] and add the files you want to process. Then, in Advanced, you can set a new size, add contrast, crop, and much more.

Please enable JavaScript in your browser to view the gallery.

You can even add some of the supported filters if you find them on the Internet and, for example, batch reduce noise. Save the result with jpg quality at 80, and when creating a pdf, select compression at 80 units. We get the image2.pdf file with a size of 335 kilobytes.

Now let’s compare the files with the original. We can see that the left patch is more contrasty, takes up the least amount of space, but we can already see the artifacts. It is followed by a version that is quite similar to the scanned one. Then comes the scan, and then the original document itself. And it’s all pretty fast and with many options to get the desired size.

Firefox for reviewing pdf

What else can you use to edit a pdf file? Take Firefox! Yes, this browser can do more than just view. The idea of editing is to create a layer over the original document. On the one hand, it won’t help you correct a typo or add a page. On the other hand, it is a universal way to review a document, add a picture, or paint over an unnecessary element.

Open a document or a scan and edit both. The following tools are available:

“Marker” – In the case of a scan, it is like a real underline, but in a text document, you can underline it arbitrarily, or exactly in the text field.

Please enable JavaScript in your browser to view the gallery.

“Text – you can choose the size and color of the text, but not the font.

Please enable JavaScript in your browser to view the gallery.

“Pencil” – you can choose the thickness, color, and transparency. “Add image” – even avif and webp are supported. Added images can be stretched.

Please enable JavaScript in your browser to view the gallery.

When editing, there is no way to undo the last action, but all the changes made are separate objects, each of which can be moved or deleted.

After saving, the file size has changed by several kilobytes, and when you reopen it in Firefox, the new objects cannot be deleted or edited.

Adobe Acrobat as an editor

If you’ve been using this program for many years, you’ll remember some of the renaming. Adobe used to release the free viewer as Adobe Reader and the paid editor as Adobe Acrobat. Now the viewer is called Adobe Acrobat and the editor is called Adobe Acrobat Pro. The paradigm has changed again, and the viewer itself can do some editing.

Let’s install Adobe Acrobat v24.2 and open the two files that have already been edited in Firefox. We can see how many edits were made to each page, and we can also write a response to each change. It is also possible to edit or delete previously made changes.

Adobe Acrobat itself offers more tools than Firefox. It is possible to fill out questionnaires (you can add a cross, check mark, or dot). You can also add primitives in the form of a line, circle, or rectangle. You can use them to sketch out unnecessary text or create an improvised table. But of course, you can’t edit the original text.

It’s annoying that some tools hide the activation window for the paid version. I was also surprised by the lack of the ability to add a graphic object to the document, you can only attach an arbitrary file in the place you need.

After saving, the file size also changes by several kilobytes, so you don’t have to worry about getting a “bloated” document.

Full pdf editing (almost)

For the tests, we’ll only need Welcome_PDF.pdf, as there’s little point in editing a scanned document.

LibreOffice

We install LibreOffice v24.2. In this package, pdf files are opened by LibreOffice Draw, which does a good job of editing simple files. Due to the specific structure of pdf files, we cannot select a paragraph, because each line is a separate object. But we can make changes to a line. You can also move or delete other objects.

Add changes to the document and save it via [File] and [Export as PDF].

Microsoft Office

Now let’s open the file in Microsoft Office 365, see the error, and close it.

For a test, this is not a big problem because you know in advance what will happen next. After editing, both office suites save the file using their own fonts. Therefore, if you use them for editing, be sure to compare the result with the original to look for possible formatting problems:

Please enable JavaScript in your browser to view the gallery.

What can online services do?

There are a lot of different online services, and you can find free options for recognition and translation, or just conversion to other formats.

If you are an employee of a large company, then such services are either technically blocked or administratively prohibited for use. For the rest of you, we would like to inform you that data from your documents can be used against you or your company. When writing this article, the goal was not to rank all services by the level of danger. However, the author has encountered schemes in which, having a lot of information from documents processed online, unknown persons on behalf of a partner or bank very convincingly offered to transfer money.

Stirling Tools

Let’s take a look at one open-source service that can be installed locally in a company – Stirling-Tools.

Please enable JavaScript in your browser to view the gallery.

The service has a lot of features, a list of which would take a separate article. It has concentrated almost all possible functions. It can recognize text, number pages, convert to other formats, and much more. Let’s turn a pdf into a doc and see that this service also has problems with formatting our document. But this does not happen every time, and other documents may be fine.

Google Docs

For comparison, the files were edited in Google Docs. This service recognized Welcome_PDF_scan.pdf and converted Welcome_PDF.pdf to odt format (because the created docx format could not be opened by Microsoft Word). What can we say? The formatting in the created document is also not very good. We won’t attach screenshots with all the errors, but imagine a document that has grown from five pages to seven! However, if you process the document without internal fonts and text wrapping around the image, the result will improve in all editors.

How to get only text without formatting?

For a text document:

LibreOffice Draw

File > Export. When saving, select Htm. Open in a browser, copy the text.

File > Export. When saving, select Htm. Open in a browser, copy the text. Adobe Acrobat

[Menu] > [Save as …] > [save as *.txt], but it depends on the file’s security settings. Sometimes you can just select the text and save it to the clipboard.

For a scanned document:

Google Docs

Save the scan to Google Drive and open it in Google Docs. A recognized Welcome_PDF_scan.pdf appears along with the Welcome_PDF_scan

Tesseract

This OCR tool can be attached to irfanview or Stirling Tools. In irfanview, it works like this:

What we have

As you can see, the browser is good for reviewing, the official viewer for filling out questionnaires, and LibreOffice can even edit documents with an express prohibition on editing. We hope you learned something new and that you can keep your data safe. Write in the comments about your own experiences and life hacks!