提取pdf文件中的图片_如何在Linux中从PDF文件提取和保存图像

提取pdf文件中的图片

You can easily convert PDF files to editable text in Linux using the “pdftotext” command line tool. However, if there are any images in the original PDF file, they are not extracted. To extract images from a PDF file, you can use another command line tool called “pdfimages”.

您可以使用“ pdftotext”命令行工具在Linux中轻松将PDF文件转换为可编辑文本。但是，如果原始PDF文件中有任何图像，则不会提取它们。要从PDF文件提取图像，可以使用另一个名为“ pdfimages”的命令行工具。

NOTE: When we say to type something in this article and there are quotes around the text, DO NOT type the quotes, unless we specify otherwise.

注意：当我们说要在本文中键入某些内容并且文本周围有引号时，请不要键入引号，除非我们另外指定。

The “pdfimages” tool is part of the poppler-utils package. You can check to see if it’s installed on your system and install it if necessary using the steps described in this article.

“ pdfimages”工具是poppler-utils软件包的一部分。您可以检查它是否已安装在系统上，并根据需要使用本文中描述的步骤进行安装。

To extract images from a PDF file using pdfimages, press “Ctrl + Alt + T” to open a Terminal window. Type the following command at the prompt.

要使用pdfimages从PDF文件提取图像，请按“ Ctrl + Alt + T”打开“终端”窗口。在提示符下键入以下命令。

pdfimages /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

pdfimages /home/lori/Documents/SampleWithImages.pdf / home / lori / Documents / ExtractedImages / image

NOTE: For all the commands shown in this article, replace the first path in the command and the PDF filename to the path and filename for your original PDF file. The second path should be the path to the root folder into which you want to save the extracted images. The word “image” at the end of the second path represents whatever you want to preface your filename with. The filenames of the images are numbered automatically (000, 001, 002, 003, etc.). If you want to add text to the beginning of each image, enter that text at the end of the second path. In our example, each image filename will start with “image”, such as image-001.ppm, image-002.ppm, etc. A dash is added between the text you specify and the number.

注意：对于本文中显示的所有命令，请将命令中的第一个路径和PDF文件名替换为原始PDF文件的路径和文件名。第二个路径应该是要将提取的图像保存到的根文件夹的路径。第二个路径末尾的“图像”一词代表您想用文件名开头的任何内容。图像的文件名会自动编号(000、001、002、003等)。如果要将文本添加到每个图像的开头，请在第二个路径的末尾输入该文本。在我们的示例中，每个图像文件名都以“ image”开头，例如image-001.ppm，image-002.ppm等。在您指定的文本和数字之间添加了一个破折号。

The default image format is PPM (portable pixmap) for non-monochrome images, or PBM (portable bitmap) for monochrome images. These formats are designed to be easily exchanged between platforms.

对于非单色图像，默认图像格式为PPM(便携式像素图)，对于单色图像，默认图像格式为PBM(便携式位图)。这些格式旨在在平台之间轻松交换。

NOTE: You may get two image files for each image in your PDF file. The second image for each image is blank, so, you’ll be able to tell which images contain the images from the file by the thumbnail on the file in the File Manager.

注意：您的PDF文件中的每个图像可能会得到两个图像文件。每个图像的第二个图像为空白，因此，您可以通过文件管理器中文件的缩略图来判断哪些图像包含该文件中的图像。

To create .jpg image files, add the “-j” option to the command, as shown below.

要创建.jpg图像文件，请在命令中添加“ -j”选项，如下所示。

pdfimages -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

pdfimages -j /home/lori/Documents/SampleWithImages.pdf / home / lori / Documents / ExtractedImages / image

NOTE: You can also change the default output to PNG using the “-png” option or TIFF using the “-tiff” option.

注意：您还可以使用“ -png”选项将默认输出更改为PNG或使用“ -tiff”选项将TIFF更改为默认输出。

The main image file for each image is saved as a .jpg file. The second blank image is still a .ppm or .pbm file.

每个图像的主图像文件另存为.jpg文件。第二个空白图像仍然是.ppm或.pbm文件。

If you only want to convert images on and after a certain page, use the “-f” option with a number to indicate the first page to convert, as shown in the example command below.

如果只想在特定页面上和之后转换图像，请使用带有数字的“ -f”选项来指示要转换的第一页，如下面的示例命令所示。

pdfimages -f 2 -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

pdfimages -f 2 -j /home/lori/Documents/SampleWithImages.pdf / home / lori / Documents / ExtractedImages / image

NOTE: We combined the “-j” option with the “-f” option so we would get .jpg images and did the same with the “-l” option mentioned below as well.

注意：我们将“ -j”选项与“ -f”选项结合使用，因此我们将获得.jpg图像，并且也对下面提到的“ -l”选项进行了相同的处理。

To convert all images before and on a certain page, use the “-l” (a lowercase “L”, not the number “1”) option with a number to indicate the last page to convert, as shown below.

要转换特定页面之前和页面上的所有图像，请使用“ -l”(小写的“ L”，而不是数字“ 1”)选项，并带有一个数字来指示要转换的最后一页，如下所示。

pdfimages -l 1 -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

pdfimages -l 1 -j /home/lori/Documents/SampleWithImages.pdf / home / lori / Documents / ExtractedImages / image

NOTE: You can use the “-f” and “-l” options together to convert images in a specific page range in the middle of your document.

注意：您可以同时使用“ -f”和“ -l”选项来转换文档中间特定页面范围内的图像。

If there is an owner password on the PDF file, use the “-opw” option and the password in single quotes, as shown below. If the password on the PDF file is a user password, use the “-upw” option instead with the password.

如果PDF文件上有所有者密码，请使用“ -opw”选项和单引号中的密码，如下所示。如果PDF文件上的密码是用户密码，请使用“ -upw”选项代替密码。

NOTE: Make sure there are single quotes around your password in the command.

注意：确保命令中密码周围有单引号。

pdfimages -opw ‘password’ -j /home/lori/Documents/SampleWithImages.pdf /home/lori/Documents/ExtractedImages/image

pdfimages -opw'password'-j /home/lori/Documents/SampleWithImages.pdf / home / lori / Documents / ExtractedImages / image

For more information about using the pdfimages command, type “pdfimages” at the prompt in a Terminal window and press “Enter”. The command usage displays with a list of options available for use in the command.

有关使用pdfimages命令的更多信息，请在“终端”窗口的提示符下键入“ pdfimages”，然后按“ Enter”。显示命令用法，并列出命令中可用的选项。

翻译自: https://www.howtogeek.com/228796/how-to-extract-and-save-images-from-a-pdf-file-in-linux/

提取pdf文件中的图片

提取pdf文件中的图片_如何在Linux中从PDF文件提取和保存图像

相关推荐