ASP.Net: Import BulletList from Word document using OpenXML

comunidadmexicanaroma
 
on Mar 23, 2021 11:22 AM
2348 Views

Hi,

I need to import Word (.DOC and .DOCX) document files into MySQL Server Database directly using OpenXML.

In this code I have set an regexp is to check if a line starts with a whitespace, a letter, the bullet character or the - character or number

But this code import only the text contained in Word (.DOC and .DOCX) document files.

If on the Word file I have

  1. List 1
  2. List 2
  3. List 3

or

  • List 4
  • List 5
  • List 6
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))
{
    body = wordDoc.MainDocumentPart.Document.Body;
    contents = "";
    
    var reg = new Regex(@"^[\s\p{L}\d•-]");

    foreach (Paragraph co in
                wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(somethingElse => 
                reg.IsMatch(somethingElse.InnerText)))
    {
        contents += co.InnerText + "<br />";
        //insert contents into database;
    }
}

on the table of MySQL Server Database

+----------+
| contents |
+----------+
| List 1   |
| List 2   |
| List 3   |
| List 4   |
| List 5   |
| List 6   |
+----------+
6 rows in set (0.08 sec) 

Can someone help me?

Any help would greatly appreciate.

Thank you.

Download FREE API for Word, Excel and PDF in ASP.Net: Download
dharmendr
 
on Mar 24, 2021 03:29 AM
on Jan 04, 2022 10:23 AM

Hi comunidadmexi...,

You need to use Paragraph ParagraphProperties to check Bullet list. With regular expression it not possible to verify.

Refer below code.

Namespaces

C#

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System.Text.RegularExpressions;

VB.Net

Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Wordprocessing
Imports System.Text.RegularExpressions

Code

C#

protected void Page_Load(object sender, EventArgs e)
{
    if (!this.IsPostBack)
    {
        string file = @"C:\Users\developer4\Desktop\Test.docx";
        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))
        {
            Body body = wordDoc.MainDocumentPart.Document.Body;
            string contents = "";

            var reg = new Regex(@"^[\s\p{L}\d\•\-]");

            foreach (Paragraph co in wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>())
            {
                if (co.ParagraphProperties != null && co.ParagraphProperties.NumberingProperties != null)
                {
                    contents += co.InnerText + "<br />";
                    //insert contents into database.
                }
                else
                {
                    // Do other checking.
                }
            }
        }
    }
}

VB.Net

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load
    If Not Me.IsPostBack Then
        Dim file As String = "C:\Users\developer4\Desktop\Test.docx"
        Using wordDoc As WordprocessingDocument = WordprocessingDocument.Open(file, True)
            Dim body As Body = wordDoc.MainDocumentPart.Document.Body
            Dim contents As String = ""
            Dim reg = New Regex("^[\s\p{L}\d\•\-]")
            For Each co As Paragraph In wordDoc.MainDocumentPart.Document.Body.Descendants(Of Paragraph)()
                If co.ParagraphProperties IsNot Nothing AndAlso co.ParagraphProperties.NumberingProperties IsNot Nothing Then
                    contents += co.InnerText & "<br />"
                Else
                End If
            Next
        End Using
    End If
End Sub