用Python改变Word文档格式

我有个书稿需要解决《A》(B)结构里,把B斜体的问题,一个一个改实在太麻烦了,所以写了个 Python 解决这个问题。

python-docx 库其实很粗糙,改不了所有的地方,甚至还会把字体弄乱了。但是实测大概可以改掉70%,已经极大程度减小工作量了 …… 至于字体和结构乱了重排一下就好。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from docx import Document
import re

def italicize_pattern_in_docx(docx_path):
doc = Document(docx_path)
pattern = r'《([^》]*)》(([^)]*))'

for paragraph in doc.paragraphs:
matches = re.finditer(pattern, paragraph.text)
for match in matches:
start, end = match.start(2), match.end(2)
inline = paragraph.runs
before = paragraph.text[:start]
italic_text = paragraph.text[start:end]
after = paragraph.text[end:]

paragraph.clear()
if before:
paragraph.add_run(before)
italic_run = paragraph.add_run(italic_text)
italic_run.italic = True
if after:
paragraph.add_run(after)

doc.save('修改_' + docx_path)

docx_path = '文档.docx'
italicize_pattern_in_docx(docx_path)