Add documents to index with chunk-based search in Java

In modern applications that need to add documents to index quickly and then perform fast, chunk‑based queries, you’ll want a solution that scales without blowing up memory. This tutorial walks you through setting up GroupDocs.Search for Java, adding multiple document folders, and configuring the engine to increase search performance while keeping java search index memory usage under control. Whether you’re indexing legal contracts, support tickets, or research papers, the steps below will give you a production‑ready implementation.

Quick Answers

What is the first step? Create a search index folder.
How do I include many files? Use index.add() for each document folder.
Which option enables chunk search? options.setChunkSearch(true).
Can I continue searching after the first chunk? Yes, call index.searchNext() with the token.
Do I need a license? A free trial or temporary license works for development; a full license is required for production.

What You’ll Learn

How to create a search index in a specified folder.
Steps to add documents to index from multiple locations.
Configuring search options to enable chunk‑based searching.
Performing initial and subsequent chunk‑based searches.
Real‑world scenarios where chunk‑based document search shines.

Prerequisites

To follow this guide, ensure you have:

Required Libraries: GroupDocs.Search for Java 25.4 or later.
Environment Setup: A compatible Java Development Kit (JDK) installed.
Knowledge Prerequisites: Basic Java programming and Maven familiarity.

Setting Up GroupDocs.Search for Java

To begin, integrate GroupDocs.Search into your project using Maven:

<repositories>
   <repository>
      <id>repository.groupdocs.com</id>
      <name>GroupDocs Repository</name>
      <url>https://releases.groupdocs.com/search/java/</url>
   </repository>
</repositories>

<dependencies>
   <dependency>
      <groupId>com.groupdocs</groupId>
      <artifactId>groupdocs-search</artifactId>
      <version>25.4</version>
   </dependency>
</dependencies>

Alternatively, download the latest version from GroupDocs.Search for Java releases.

License Acquisition

To try out GroupDocs.Search:

Free Trial – test core features without commitment.
Temporary License – extended access for development.
Purchase – full license for production use.

Basic Initialization and Setup

Create an index in the folder where you want the searchable data to live:

import com.groupdocs.search.*;

public class CreateIndex {
    public static void main(String[] args) {
        String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";
        // Creating an index in the specified folder
        Index index = new Index(indexFolder);
    }
}

How to add documents to index

Now that the index exists, the next logical step is to add documents to index from the locations where your files are stored.

1. Creating an Index

Overview: Set up a directory for the search index.

String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";

Index index = new Index(indexFolder);

2. Adding Documents to Index

Overview: Pull in files from several source folders.

String documentsFolder1 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder2 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder3 = "YOUR_DOCUMENT_DIRECTORY";

index.add(documentsFolder1);
index.add(documentsFolder2);
index.add(documentsFolder3);

3. Configuring Search Options for Chunk Search

Enable chunk‑based searching by tweaking the options object.

SearchOptions options = new SearchOptions();

options.setChunkSearch(true);

4. Performing Initial Chunk‑Based Search

Run the first query using the chunk‑enabled options.

String query = "invitation";

SearchResult result = index.search(query, options);

5. Continuing Chunk‑Based Search

Iterate through the remaining chunks until the search is complete.

while (result.getNextChunkSearchToken() != null) {
    result = index.searchNext(result.getNextChunkSearchToken());
}

Why use chunk‑based search?

Chunk‑based searching breaks massive document collections into manageable pieces, reducing memory pressure and speeding up response times. It’s especially beneficial when:

Legal teams need to locate specific clauses across thousands of contracts.
Customer support portals must surface relevant knowledge‑base articles instantly.
Researchers sift through extensive datasets without loading entire files into memory.

How this approach increases search performance

By searching smaller chunks rather than whole files, the engine can:

Skip irrelevant sections early, cutting CPU cycles.
Keep only the active chunk in memory, which directly lowers java search index memory consumption.
Parallelize chunk processing on multi‑core machines for faster results.

Managing java search index memory

While chunk‑based search already reduces memory footprint, you can further tune the JVM:

Allocate sufficient heap (-Xmx2g or higher) based on index size.
Use index.optimize() after bulk additions to compress the index structure.
Monitor GC pauses with tools like VisualVM to avoid latency spikes.

Performance Considerations

Memory Management – Allocate sufficient heap space (-Xmx) for large indexes.
Resource Monitoring – Keep an eye on CPU usage during indexing and search operations.
Index Maintenance – Periodically rebuild or clean the index to discard stale data.

Common Pitfalls & Troubleshooting

Issue	Why It Happens	Fix
`OutOfMemoryError` during indexing	Heap size too low	Increase JVM heap (`-Xmx2g` or higher)
No results returned	Chunk token not processed	Ensure the `while` loop runs until `getNextChunkSearchToken()` is `null`
Slow search performance	Index not optimized	Run `index.optimize()` after bulk additions

Frequently Asked Questions

Q: What is chunk‑based searching?
A: Chunk‑based searching divides the dataset into smaller pieces, allowing efficient queries over large volumes of data without loading entire documents into memory.

Q: How do I update my index with new files?
A: Simply call index.add() with the path to the new documents; the index will incorporate them automatically.

Q: Can GroupDocs.Search handle different file formats?
A: Yes, it supports PDFs, DOCX, XLSX, PPTX, and many other common formats.

Q: What are typical performance bottlenecks?
A: Memory constraints and unoptimized indexes are the most common; allocate sufficient heap and regularly optimize the index.

Q: Where can I find more detailed documentation?
A: Visit the official GroupDocs.Search Documentation for in‑depth guides and API references.

Q: Does chunk‑based search work with encrypted PDFs?
A: Yes, as long as you provide the password via the appropriate API overload.

Q: How can I monitor indexing progress?
A: Use the Index.add() overload that returns a Progress object or hook into logging callbacks.

Resources

Documentation: GroupDocs.Search for Java Docs
API Reference: GroupDocs.Search API Reference
Download: GroupDocs.Search Releases
GitHub: GroupDocs.Search GitHub Repository
Free Support: GroupDocs Forum
Temporary License: Obtain a Temporary License

Last Updated: 2026-02-21
Tested With: GroupDocs.Search 25.4 for Java
Author: GroupDocs