使用 GroupDocs.Editor 提取文件元資料（Java）

如果您需要快速且可靠地 extract document metadata java，您來對地方了。無論您是在構建文件歸檔服務、遷移管道，或是自動化報告工具，了解如何從 Word、Excel 以及純文字檔案中提取格式、頁數或加密狀態等屬性，都能節省大量人工時間。本指南將使用 GroupDocs.Editor for Java 完整說明流程，示範如何 get document info java，並涵蓋密碼保護檔案等常見情境。

快速解答

什麼程式庫可以在 Java 中提取文件元資料？ GroupDocs.Editor for Java.
哪個方法可以在不載入內容的情況下取得元資料？ getDocumentInfo(null).
我可以從受密碼保護的檔案讀取元資料嗎？ 可以 – 處理 PasswordRequiredException 和 IncorrectPasswordException.
生產環境需要授權嗎？ 需要有效的 GroupDocs.Editor 授權；亦提供免費試用。
支援哪個 Java 版本？ Java 8 或更高版本。

什麼是 extract document metadata java？

在 Java 中提取文件元資料是指以程式方式讀取檔案的描述性資訊——例如類型、大小、頁數或是否已加密——而不必開啟完整的文件內容。此輕量化方式非常適合索引、驗證及工作流程自動化。

為什麼使用 GroupDocs.Editor for Java？

GroupDocs.Editor 提供統一的 API，支援多種格式（DOCX、XLSX、XML、TXT 等），並抽象化各檔案類型的複雜性。它亦內建對密碼保護文件的處理，使其成為執行 get document info java 任務的一站式解決方案。

前置條件

Java Development Kit (JDK) 8 或更新版本。
Maven 用於相依管理（或手動下載）。
基本的 Java 程式設計知識。

設定 GroupDocs.Editor for Java

透過 Maven 安裝

Add the repository and dependency to your pom.xml:

<repositories>
   <repository>
      <id>repository.groupdocs.com</id>
      <name>GroupDocs Repository</name>
      <url>https://releases.groupdocs.com/editor/java/</url>
   </repository>
</repositories>

<dependencies>
   <dependency>
      <groupId>com.groupdocs</groupId>
      <artifactId>groupdocs-editor</artifactId>
      <version>25.3</version>
   </dependency>
</dependencies>

直接下載

或者，從 GroupDocs.Editor for Java releases 下載最新的二進位檔。

取得授權

Free Trial – 免費試用 API。
Temporary License – 如需延長評估時間，可透過 this link 取得。
Purchase – 取得完整授權以供正式環境部署。

基本初始化與設定

import com.groupdocs.editor.Editor;

public class DocumentEditorSetup {
    public static void main(String[] args) {
        String filePath = "YOUR_DOCUMENT_DIRECTORY/SAMPLE_DOCX";
        Editor editor = new Editor(filePath);
        // Initialize your document processing workflow here
        editor.dispose();
    }
}

如何從 Word 文件提取 document metadata java

功能 1：從 Word 文件提取元資料

步驟 1 – 載入文件

import com.groupdocs.editor.Editor;
import com.groupdocs.editor.IDocumentInfo;
import com.groupdocs.editor.metadata.WordProcessingDocumentInfo;

String docxInputFilePath = "YOUR_DOCUMENT_DIRECTORY/SAMPLE_DOCX";
Editor editorDocx = new Editor(docxInputFilePath);

步驟 2 – 取得文件資訊

IDocumentInfo infoDocx = editorDocx.getDocumentInfo(null);
if (infoDocx instanceof WordProcessingDocumentInfo) {
    WordProcessingDocumentInfo casted = (WordProcessingDocumentInfo) infoDocx;
    // Access properties like format, page count, and more
}
editorDocx.dispose();

為什麼這很重要：getDocumentInfo(null) 只取得元資料，降低記憶體使用，同時提供您在 Word 檔案上執行 get document info java 所需的全部資訊。

如何取得 spreadsheet 的 document info java

功能 2：檢查試算表文件類型

步驟 1 – 載入試算表檔案

import com.groupdocs.editor.Editor;
import com.groupdocs.editor.IDocumentInfo;
import com.groupdocs.editor.metadata.SpreadsheetDocumentInfo;

String xlsxInputFilePath = "YOUR_DOCUMENT_DIRECTORY/SAMPLE_XLSX";
Editor editorXlsx = new Editor(xlsxInputFilePath);

步驟 2 – 檢查並提取試算表細節

IDocumentInfo infoXlsx = editorXlsx.getDocumentInfo(null);
if (infoXlsx instanceof SpreadsheetDocumentInfo) {
    SpreadsheetDocumentInfo casted = (SpreadsheetDocumentInfo) infoXlsx;
    // Retrieve properties like tab count, size, etc.
}
editorXlsx.dispose();

如何在提取元資料時處理受密碼保護的檔案

功能 3：處理受密碼保護的文件

步驟 1 – 載入受保護的文件

import com.groupdocs.editor.Editor;
import com.groupdocs.editor.IDocumentInfo;
import com.groupdocs.editor.PasswordRequiredException;
import com.groupdocs.editor.IncorrectPasswordException;

String xlsInputFilePath = "YOUR_DOCUMENT_DIRECTORY/SAMPLE_XLS_PROTECTED";
Editor editorXls = new Editor(xlsInputFilePath);

步驟 2 – 嘗試存取並管理密碼

try {
    IDocumentInfo infoXls = editorXls.getDocumentInfo(null); // Attempt without password
} catch (PasswordRequiredException ex) {
    System.out.println("A password is required to access this document.");
}

try {
    IDocumentInfo infoXls = editorXls.getDocumentInfo("incorrect_password");
} catch (IncorrectPasswordException ex) {
    System.out.println("The provided password is incorrect. Please try again.");
}

IDocumentInfo infoXls = editorXls.getDocumentInfo("excel_password"); // Correct password
if (infoXls instanceof SpreadsheetDocumentInfo) {
    SpreadsheetDocumentInfo casted = (SpreadsheetDocumentInfo) infoXls;
    // Extract document details
}
editorXls.dispose();

專業提示：始終將元資料呼叫包在 try‑catch 區塊中，以確保應用程式能抵禦缺少或錯誤的密碼。

如何從純文字格式提取元資料

功能 4：基於文字的文件元資料提取

步驟 1 – 載入文字文件

import com.groupdocs.editor.Editor;
import com.groupdocs.editor.IDocumentInfo;
import com.groupdocs.editor.metadata.TextualDocumentInfo;

String xmlInputFilePath = "YOUR_DOCUMENT_DIRECTORY/SAMPLE_XML";
Editor editorXml = new Editor(xmlInputFilePath);

步驟 2 – 提取並顯示資訊

IDocumentInfo infoXml = editorXml.getDocumentInfo(null);
if (infoXml instanceof TextualDocumentInfo) {
    TextualDocumentInfo casted1 = (TextualDocumentInfo) infoXml;
    // Access encoding, size, etc.
}
editorXml.dispose();

實務應用

Automated Document Archiving – 提取元資料以標記並儲存檔案，免除手動輸入。
Workflow Automation – 使用提取的屬性將文件導向正確的處理管道。
Data Migration – 在系統間遷移內容時保留原始檔案屬性。

效能考量

立即釋放 Editor 實例（editor.dispose()）以釋放原生資源。
盡可能以串流方式處理大型檔案，以避免高記憶體消耗。
使用 Java 效能分析工具 來分析程式碼，找出因重複呼叫元資料而產生的瓶頸。

常見問題與解決方案

問題	解決方案
`casted` 上的 `NullPointerException`	在轉型前確認 `instanceof` 檢查已成功。
檔案路徑錯誤	使用絕對路徑或透過 `Paths.get(...)` 解析相對路徑。
不支援的格式	確保檔案類型列於 GroupDocs.Editor 支援的格式清單中。
密碼錯誤	再次確認密碼字串；請注意大小寫敏感。

常見問答

Q: 我可以使用此 API 從 PDF 檔案提取元資料嗎？
A: GroupDocs.Editor 專注於可編輯的格式（DOCX、XLSX 等）。若需處理 PDF，請使用 GroupDocs.Viewer 或 PDF 專屬 API。

Q: 取得元資料是否需要載入整個文件？
A: 不需要。getDocumentInfo(null) 只讀取標頭資訊，使操作保持輕量。

Q: 此函式庫如何處理大型 Excel 活頁簿？
A: 元資料提取僅讀取活頁簿的摘要資訊，完整工作表資料不會載入記憶體。

Q: 有沒有方法批次處理多個檔案？
A: 有 – 迭代檔案清單，在迴圈中重複使用相同的 Editor 模式，使用後釋放每個實例。

Q: 若文件損壞該怎麼辦？
A: API 會拋出 InvalidFormatException。請捕獲該例外並記錄檔案以供人工檢查。

結論

您現在已掌握使用 GroupDocs.Editor 在 Word、Excel 以及文字檔案上 extract document metadata java 與 get document info java 的完整、可投入生產的解決方案。將這些程式碼片段整合至您的服務中，並依照提供的例外模式處理邊緣情況，即可獲得更快速且更可靠的文件處理管道。

最後更新： 2025-12-18
測試版本： GroupDocs.Editor 25.3
作者： GroupDocs