Add documents to index with chunk-based search in Java

In modern applications that need to add documents to index quickly and then perform fast, chunk‑based queries, you’ll want a solution that scales without blowing up memory. This tutorial walks you through setting up GroupDocs.Search for Java, adding multiple document folders, and configuring the engine to increase search performance while keeping java search index memory usage under control. Whether you’re indexing legal contracts, support tickets, or research papers, the steps below will give you a production‑ready implementation.

Quick Answers

  • What is the first step? Create a search index folder.
  • How do I include many files? Use index.add() for each document folder.
  • Which option enables chunk search? options.setChunkSearch(true).
  • Can I continue searching after the first chunk? Yes, call index.searchNext() with the token.
  • Do I need a license? A free trial or temporary license works for development; a full license is required for production.

What You’ll Learn

  • How to create a search index in a specified folder.
  • Steps to add documents to index from multiple locations.
  • Configuring search options to enable chunk‑based searching.
  • Performing initial and subsequent chunk‑based searches.
  • Real‑world scenarios where chunk‑based document search shines.

Prerequisites

To follow this guide, ensure you have:

  • Required Libraries: GroupDocs.Search for Java 25.4 or later.
  • Environment Setup: A compatible Java Development Kit (JDK) installed.
  • Knowledge Prerequisites: Basic Java programming and Maven familiarity.

Setting Up GroupDocs.Search for Java

To begin, integrate GroupDocs.Search into your project using Maven:

<repositories>
   <repository>
      <id>repository.groupdocs.com</id>
      <name>GroupDocs Repository</name>
      <url>https://releases.groupdocs.com/search/java/</url>
   </repository>
</repositories>

<dependencies>
   <dependency>
      <groupId>com.groupdocs</groupId>
      <artifactId>groupdocs-search</artifactId>
      <version>25.4</version>
   </dependency>
</dependencies>

Alternatively, download the latest version from GroupDocs.Search for Java releases.

License Acquisition

To try out GroupDocs.Search:

  • Free Trial – test core features without commitment.
  • Temporary License – extended access for development.
  • Purchase – full license for production use.

Basic Initialization and Setup

Create an index in the folder where you want the searchable data to live:

import com.groupdocs.search.*;

public class CreateIndex {
    public static void main(String[] args) {
        String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";
        // Creating an index in the specified folder
        Index index = new Index(indexFolder);
    }
}

How to add documents to index

Now that the index exists, the next logical step is to add documents to index from the locations where your files are stored.

1. Creating an Index

Overview: Set up a directory for the search index.

String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";
Index index = new Index(indexFolder);

2. Adding Documents to Index

Overview: Pull in files from several source folders.

String documentsFolder1 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder2 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder3 = "YOUR_DOCUMENT_DIRECTORY";
index.add(documentsFolder1);
index.add(documentsFolder2);
index.add(documentsFolder3);

Enable chunk‑based searching by tweaking the options object.

SearchOptions options = new SearchOptions();
options.setChunkSearch(true);

Run the first query using the chunk‑enabled options.

String query = "invitation";
SearchResult result = index.search(query, options);

Iterate through the remaining chunks until the search is complete.

while (result.getNextChunkSearchToken() != null) {
    result = index.searchNext(result.getNextChunkSearchToken());
}

Chunk‑based searching breaks massive document collections into manageable pieces, reducing memory pressure and speeding up response times. It’s especially beneficial when:

  1. Legal teams need to locate specific clauses across thousands of contracts.
  2. Customer support portals must surface relevant knowledge‑base articles instantly.
  3. Researchers sift through extensive datasets without loading entire files into memory.

How this approach increases search performance

By searching smaller chunks rather than whole files, the engine can:

  • Skip irrelevant sections early, cutting CPU cycles.
  • Keep only the active chunk in memory, which directly lowers java search index memory consumption.
  • Parallelize chunk processing on multi‑core machines for faster results.

Managing java search index memory

While chunk‑based search already reduces memory footprint, you can further tune the JVM:

  • Allocate sufficient heap (-Xmx2g or higher) based on index size.
  • Use index.optimize() after bulk additions to compress the index structure.
  • Monitor GC pauses with tools like VisualVM to avoid latency spikes.

Performance Considerations

  • Memory Management – Allocate sufficient heap space (-Xmx) for large indexes.
  • Resource Monitoring – Keep an eye on CPU usage during indexing and search operations.
  • Index Maintenance – Periodically rebuild or clean the index to discard stale data.

Common Pitfalls & Troubleshooting

IssueWhy It HappensFix
OutOfMemoryError during indexingHeap size too lowIncrease JVM heap (-Xmx2g or higher)
No results returnedChunk token not processedEnsure the while loop runs until getNextChunkSearchToken() is null
Slow search performanceIndex not optimizedRun index.optimize() after bulk additions

Frequently Asked Questions

Q: What is chunk‑based searching?
A: Chunk‑based searching divides the dataset into smaller pieces, allowing efficient queries over large volumes of data without loading entire documents into memory.

Q: How do I update my index with new files?
A: Simply call index.add() with the path to the new documents; the index will incorporate them automatically.

Q: Can GroupDocs.Search handle different file formats?
A: Yes, it supports PDFs, DOCX, XLSX, PPTX, and many other common formats.

Q: What are typical performance bottlenecks?
A: Memory constraints and unoptimized indexes are the most common; allocate sufficient heap and regularly optimize the index.

Q: Where can I find more detailed documentation?
A: Visit the official GroupDocs.Search Documentation for in‑depth guides and API references.

Q: Does chunk‑based search work with encrypted PDFs?
A: Yes, as long as you provide the password via the appropriate API overload.

Q: How can I monitor indexing progress?
A: Use the Index.add() overload that returns a Progress object or hook into logging callbacks.

Resources


Last Updated: 2026-02-21
Tested With: GroupDocs.Search 25.4 for Java
Author: GroupDocs