Add documents to index with chunk-based search in Java
In modern applications that need to add documents to index quickly and then perform fast, chunk‑based queries, you’ll want a solution that scales without blowing up memory. This tutorial walks you through setting up GroupDocs.Search for Java, adding multiple document folders, and configuring the engine to increase search performance while keeping java search index memory usage under control. Whether you’re indexing legal contracts, support tickets, or research papers, the steps below will give you a production‑ready implementation.
Quick Answers
- What is the first step? Create a search index folder.
- How do I include many files? Use
index.add()for each document folder. - Which option enables chunk search?
options.setChunkSearch(true). - Can I continue searching after the first chunk? Yes, call
index.searchNext()with the token. - Do I need a license? A free trial or temporary license works for development; a full license is required for production.
What You’ll Learn
- How to create a search index in a specified folder.
- Steps to add documents to index from multiple locations.
- Configuring search options to enable chunk‑based searching.
- Performing initial and subsequent chunk‑based searches.
- Real‑world scenarios where chunk‑based document search shines.
Prerequisites
To follow this guide, ensure you have:
- Required Libraries: GroupDocs.Search for Java 25.4 or later.
- Environment Setup: A compatible Java Development Kit (JDK) installed.
- Knowledge Prerequisites: Basic Java programming and Maven familiarity.
Setting Up GroupDocs.Search for Java
To begin, integrate GroupDocs.Search into your project using Maven:
<repositories>
<repository>
<id>repository.groupdocs.com</id>
<name>GroupDocs Repository</name>
<url>https://releases.groupdocs.com/search/java/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-search</artifactId>
<version>25.4</version>
</dependency>
</dependencies>
Alternatively, download the latest version from GroupDocs.Search for Java releases.
License Acquisition
To try out GroupDocs.Search:
- Free Trial – test core features without commitment.
- Temporary License – extended access for development.
- Purchase – full license for production use.
Basic Initialization and Setup
Create an index in the folder where you want the searchable data to live:
import com.groupdocs.search.*;
public class CreateIndex {
public static void main(String[] args) {
String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
}
}
How to add documents to index
Now that the index exists, the next logical step is to add documents to index from the locations where your files are stored.
1. Creating an Index
Overview: Set up a directory for the search index.
String indexFolder = "YOUR_DOCUMENT_DIRECTORY\\output\\AdvancedUsage\\Searching\\SearchByChunks";
Index index = new Index(indexFolder);
2. Adding Documents to Index
Overview: Pull in files from several source folders.
String documentsFolder1 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder2 = "YOUR_DOCUMENT_DIRECTORY";
String documentsFolder3 = "YOUR_DOCUMENT_DIRECTORY";
index.add(documentsFolder1);
index.add(documentsFolder2);
index.add(documentsFolder3);
3. Configuring Search Options for Chunk Search
Enable chunk‑based searching by tweaking the options object.
SearchOptions options = new SearchOptions();
options.setChunkSearch(true);
4. Performing Initial Chunk‑Based Search
Run the first query using the chunk‑enabled options.
String query = "invitation";
SearchResult result = index.search(query, options);
5. Continuing Chunk‑Based Search
Iterate through the remaining chunks until the search is complete.
while (result.getNextChunkSearchToken() != null) {
result = index.searchNext(result.getNextChunkSearchToken());
}
Why use chunk‑based search?
Chunk‑based searching breaks massive document collections into manageable pieces, reducing memory pressure and speeding up response times. It’s especially beneficial when:
- Legal teams need to locate specific clauses across thousands of contracts.
- Customer support portals must surface relevant knowledge‑base articles instantly.
- Researchers sift through extensive datasets without loading entire files into memory.
How this approach increases search performance
By searching smaller chunks rather than whole files, the engine can:
- Skip irrelevant sections early, cutting CPU cycles.
- Keep only the active chunk in memory, which directly lowers java search index memory consumption.
- Parallelize chunk processing on multi‑core machines for faster results.
Managing java search index memory
While chunk‑based search already reduces memory footprint, you can further tune the JVM:
- Allocate sufficient heap (
-Xmx2gor higher) based on index size. - Use
index.optimize()after bulk additions to compress the index structure. - Monitor GC pauses with tools like VisualVM to avoid latency spikes.
Performance Considerations
- Memory Management – Allocate sufficient heap space (
-Xmx) for large indexes. - Resource Monitoring – Keep an eye on CPU usage during indexing and search operations.
- Index Maintenance – Periodically rebuild or clean the index to discard stale data.
Common Pitfalls & Troubleshooting
| Issue | Why It Happens | Fix |
|---|---|---|
OutOfMemoryError during indexing | Heap size too low | Increase JVM heap (-Xmx2g or higher) |
| No results returned | Chunk token not processed | Ensure the while loop runs until getNextChunkSearchToken() is null |
| Slow search performance | Index not optimized | Run index.optimize() after bulk additions |
Frequently Asked Questions
Q: What is chunk‑based searching?
A: Chunk‑based searching divides the dataset into smaller pieces, allowing efficient queries over large volumes of data without loading entire documents into memory.
Q: How do I update my index with new files?
A: Simply call index.add() with the path to the new documents; the index will incorporate them automatically.
Q: Can GroupDocs.Search handle different file formats?
A: Yes, it supports PDFs, DOCX, XLSX, PPTX, and many other common formats.
Q: What are typical performance bottlenecks?
A: Memory constraints and unoptimized indexes are the most common; allocate sufficient heap and regularly optimize the index.
Q: Where can I find more detailed documentation?
A: Visit the official GroupDocs.Search Documentation for in‑depth guides and API references.
Q: Does chunk‑based search work with encrypted PDFs?
A: Yes, as long as you provide the password via the appropriate API overload.
Q: How can I monitor indexing progress?
A: Use the Index.add() overload that returns a Progress object or hook into logging callbacks.
Resources
- Documentation: GroupDocs.Search for Java Docs
- API Reference: GroupDocs.Search API Reference
- Download: GroupDocs.Search Releases
- GitHub: GroupDocs.Search GitHub Repository
- Free Support: GroupDocs Forum
- Temporary License: Obtain a Temporary License
Last Updated: 2026-02-21
Tested With: GroupDocs.Search 25.4 for Java
Author: GroupDocs