Εξαγωγή Εικόνων από Έγγραφα και Φιλτράρισμα Πόρων με το GroupDocs.Parser Java

Η εξαγωγή εικόνων από έγγραφα είναι μια κοινή απαίτηση κατά την κατασκευή pipelines επεξεργασίας εγγράφων. Σε αυτό το tutorial θα ανακαλύψετε πώς να εξάγετε εικόνες από έγγραφα χρησιμοποιώντας το GroupDocs.Parser για Java, και επίσης θα μάθετε πώς να φιλτράρετε πόρους ώστε να φορτώνονται μόνο τα αρχεία που χρειάζεστε. Θα περάσουμε από τη ρύθμιση της βιβλιοθήκης, τη δημιουργία ενός προσαρμοσμένου ExternalResourceHandler, και την εφαρμογή λογικής φιλτραρίσματος για να διατηρήσετε την εφαρμογή σας γρήγορη και ασφαλή.

Quick Answers

  • What does GroupDocs.Parser do? It parses a wide range of document formats and gives you access to text, images, and other embedded resources.
  • Can I skip unwanted images? Yes—by implementing a custom ExternalResourceHandler you can decide which resources to load.
  • Which Maven version is required? Use GroupDocs.Parser Java 25.5 or newer.
  • Do I need a license? A free trial works for evaluation; a permanent license is required for production.
  • Is this approach thread‑safe? Parsing objects are not shared across threads; create a new Parser instance per thread.

What is “extract images from documents”?

Όταν ένα έγγραφο περιέχει ενσωματωμένες εικόνες, διαγράμματα ή άλλα μέσα, το “extract images from documents” σημαίνει την προγραμματιστική ανάκτηση αυτών των δυαδικών αρχείων ώστε να μπορείτε να τα αποθηκεύσετε, να τα εμφανίσετε ή να τα επεξεργαστείτε περαιτέρω εκτός του αρχικού αρχείου.

Why filter resources while extracting images?

Το φιλτράρισμα πόρων σας βοηθά να:

  • Μειώσετε την κατανάλωση μνήμης αγνοώντας μεγάλα ή άσχετα αρχεία.
  • Βελτιώσετε την ασφάλεια αποτρέποντας τη φόρτωση πιθανώς μη ασφαλούς περιεχομένου.
  • Επιταχύνετε την επεξεργασία, ειδικά με τεράστια έγγραφα που περιέχουν πολλούς ενσωματωμένους αντικειμενους.

Prerequisites

  • Java Development Kit (JDK) – version 8 or higher.
  • Maven – for dependency management.
  • Basic familiarity with Java I/O and exception handling.

Setting Up GroupDocs.Parser for Java

Add the GroupDocs repository and the parser dependency to your pom.xml:

<repositories>
   <repository>
      <id>repository.groupdocs.com</id>
      <name>GroupDocs Repository</name>
      <url>https://releases.groupdocs.com/parser/java/</url>
   </repository>
</repositories>

<dependencies>
   <dependency>
      <groupId>com.groupdocs</groupId>
      <artifactId>groupdocs-parser</artifactId>
      <version>25.5</version>
   </dependency>
</dependencies>

Alternatively, download the latest version from GroupDocs.Parser for Java releases.

License Acquisition

  • Free Trial – explore core features without cost.
  • Temporary License – unlock full functionality during evaluation.
  • Purchased License – required for commercial deployment.

How to filter resources while extracting images

Step 1: Create a custom handler

Define a class that extends ExternalResourceHandler. Inside the onLoading method you decide which resources to keep.

import com.groupdocs.parser.options.ExternalResourceHandler;
import com.groupdocs.parser.data.ExternalResourceLoadingArgs;

class Handler extends ExternalResourceHandler {
    @Override
    public void onLoading(ExternalResourceLoadingArgs args) {
        if (!args.getUri().endsWith("installation.png")) {
            args.setSkipped(true);
        }
        super.onLoading(args);
    }
}

Step 2: Configure ParserSettings with the handler

Pass your Handler instance to ParserSettings and use it when opening a document.

import com.groupdocs.parser.Parser;
import com.groupdocs.parser.data.PageImageArea;
import com.groupdocs.parser.exceptions.IOException;
import com.groupdocs.parser.options.ParserSettings;

public class LoadExternalResources {
    public static void run() throws IOException {
        ParserSettings settings = new ParserSettings(new Handler());
        
        try (Parser parser = new Parser("YOUR_DOCUMENT_DIRECTORY", settings)) {
            Iterable<PageImageArea> images = parser.getImages();
            
            for (PageImageArea image : images) {
                System.out.println(image.getFileType());
            }
        }
    }
}

Step 3: Fine‑tune the filtering logic

If you need more sophisticated rules—such as filtering by image size, format, or URI pattern—extend the onLoading method accordingly:

@Override
public void onLoading(ExternalResourceLoadingArgs args) {
    if (!args.getUri().endsWith("installation.png")) {
        args.setSkipped(true);
    }
}

Practical Applications

  1. Document Management Systems – Pull only the necessary images from scanned contracts to generate thumbnails.
  2. Data Extraction Services – Skip decorative graphics and focus on charts that contain valuable data.
  3. Web Scraping Tools – Filter out tracking pixels while retrieving meaningful media from HTML‑based documents.

Performance Considerations

  • Filter early: Apply your custom handler before iterating over resources to avoid loading unwanted data into memory.
  • Dispose promptly: Use try‑with‑resources (try (Parser parser = …)) to free native resources.
  • Async processing: For large batches, process documents in parallel streams while keeping each Parser instance confined to a single thread.

Common Issues & Solutions

IssueWhy it HappensFix
No images returnedHandler skips all resources inadvertentlyVerify the if condition and ensure args.setSkipped(true) is only called for unwanted URIs.
IOException on large filesInsufficient heap memoryIncrease JVM heap (-Xmx2g) or process pages in smaller chunks.
License not recognizedUsing trial DLL with production codeApply the correct license file path via License.setLicense("path/to/license").

Frequently Asked Questions

Q: What is the primary purpose of using a custom ExternalResourceHandler?
A: It lets you control which external resources are loaded, enhancing security and performance by filtering out unnecessary files.

Q: Can I use GroupDocs.Parser for Java without a license?
A: Yes, a free trial is available, but some advanced features may be limited until you obtain a temporary or purchased license.

Q: How do I handle exceptions during parsing with GroupDocs.Parser?
A: Wrap parsing calls in try‑catch blocks for IOException and other specific exceptions to gracefully handle errors.

Q: What are common pitfalls when filtering resources?
A: Incorrect URI checks can skip needed files; use logging or breakpoints to verify your conditions.

Q: Is it possible to parse non‑HTML documents using GroupDocs.Parser for Java?
A: Absolutely—GroupDocs.Parser supports PDFs, Word, Excel, PowerPoint, and many other formats.

Next Steps

Dive deeper into the library by exploring the API Reference or experimenting with additional settings such as ParserSettings.setDetectTables(true) for table extraction.


Last Updated: 2025-12-29
Tested With: GroupDocs.Parser 25.5 for Java
Author: GroupDocs

Resources