(开始的开始我们都是孩子~ 最后的最后渴望变成天使~)

开始的开始#

初中时就想写Code玩服务器，上大学就开始搞这些东西了。

开始的开始之WordPress#

之前刚接触服务器就跟着网上的教程搭建博客。
网上很多关于wordpress建站建站的教程，确实很适合新手小白
当时也尝试过github的静态页面，当时不懂Git，也不懂NodeJS这种Javascript Runtime，甚至连数组都不会操作，只会 cout<<"hello world"<<endl;
所以就选择了WordPress，在一个1C1G的VPS开始了我的从零开始的WorldPress生活

我与WordPrss的爱恨情仇与成长路径#

其实最开始我连域名都没有直接使用 IP:PORT 裸奔
当时使用的还是宝塔，但其实宝塔没有多长时间就用了4-5个月大概，后边就换成1panel了

对了，大家离开自己的电脑一定要锁屏，我当时就被人用自己的Shell上了一个PHP一句话木马，也算是被人近源渗透了QAQ

后面购买了域名发现域名解析到国内服务器的 80 443 端口需要备案
不想备案就更换了国外的VPS，使用Cloudflare的CDN回源
后面也迁移过，同时也经历过WebServer服务提供商跑路，但是此时一直都是CF HTTP回源
其实这时候还不知道为什么要使用HTTPS、什么是SSL、什么是Cert
其实也使用过一些PHP网页托管的项目，发现免费的就是最贵的，而且那些托管的面板也很难用，硬盘空间给的也很少静态资源大一些的网页需要做静态资源外链，在这之前也自建过图床（后面懒得维护了就直接Delete了）
其实还部署过雷池WAF和蜜罐后来觉得，就一个简单的博客没必要就下了
其实换了1panel也算是强制性入门Docker了
我记得应该是这个时间段应该是买了一台浪潮NF5280M4+玩客云，也开始了FRP+Tailscale+WireGuard的组合，但是后来发现Frp的效率太低了就把Frp去掉了，主力Tailscale备用WireGuard，后也发现经常Tailscale打不通会走官方通提供的Derp服务器，也自建过Derp服务器，后来发现不如直接FOFA梭哈
简单用了一段时间pve
后来就是用Vue3+SpringBoot3+PostgreSwQL写了一个博客的CMS写了快两个月，写完发现是一坨也没有启用，还是继续用WordPress
我应该是这段时间换成HTTPS回源的
突然就开始搞QQBot和K8s了，应该是为了准备办下一年的比赛，玩了Ollama Napcat GZCTF，其实当时看不懂K8s的部署文档和教程然后部署的K3s
我应该是这个时间把pve换成了ESXI7

Half a year later

然后开了几台虚拟机玩了玩MC的Paper服务器
然后在帮朋友(@qianjunasukami)搞K8S集群我部署了Dify、n8n、HelmDashBoard、ArgoCD应该就这些，我记不清了，K8S真的好难，也不太敢上手，虽然K8S的容灾和自我修复能力很强
应该是这段时间，我自己复现训练了一个基于维基18年之前的中文数据的GPT模型，AI东西更是难上加难
还是这段时间，我创建了三个虚拟机搭建了自己的K8S集群(一个Agent+两个Workers)，也部署了单节点，对应文章Debain12 部署Kubernetes 1.32.5 单节点教程
又双叒叕的这段时间，尝试了Cloudint,对应文章Cloud-init ESXi 尝试
在就是这段时间从 WordPress 迁移到 Astro 和将ESXI7更换成了PVE9.1 并且部署了Ceph集群 YakumoRan 在这里感谢陪我走过这些旅途的@qianjunasukami、@YakumoRan、@Yi 和其他无名客，原此行终抵群星。

为什么从 WordPress 迁移到 Astro？#

之所以为什么从WordPress迁移到 Astro 懒得迁移静态资源和数据库，我是个经常换服务器的人，虽然都备份在Cloudflare的R2上，但是每次迁移还是怪麻烦的
WordPress的新编辑器，在Https下而且是CDN回源的情况下有问题，需要手动需改config.php 还是 settings.php ，并且我现在习惯写markdown了
主要是这样就能绿自己的github墙了, 哈哈哈其实也尝试过朋友使用的Hugo但是go的template语法我真的不会选择fuwari我只是觉得比较符合我的风格

迁移脚本#

用AI写的, 说实话写的比较粗糙但是够用

1
#安装库
2
pip install python-wordpress-xmlrpc python-frontmatter html2text requests
3

4
#使用方法
5
#命令行参数（推荐）
6

7
python wp2fuwari.py \
8

9
  --url https://your-blog.com \
10

11
  --user your-username \
12

13
  --pass your-password \
14

15
  --author your-name \
16

17
  --output fuwari-export

可选参数#

参数	说明	示例
—url	WordPress 站点 URL	https://blog.example.com
—user	用户名	admin
—pass	密码	secret
—author	作者名	John Doe
—output	输出目录	fuwari-export
—no-images	不下载图片	-

1
导入到 Fuwari
2

3
# 复制文章
4

5
cp -r fuwari-export/posts/* fuwari/src/content/posts/
6

7
# 复制图片
8

9
cp -r fuwari-export/images/* fuwari/public/images/
10

11
# 构建项目
12

13
cd fuwari
14

15
pnpm build

常见问题#

Q: XML-RPC 连接失败？

确保 WordPress 后台设置 > 撰写中已启用 XML-RPC。

Q: 图片下载失败？

检查 failed_images.txt，脚本会自动处理 base64 编码的图片。

Q: 格式转换不完美？

建议先迁移一小部分文章测试，根据结果调整清理规则。

1
#!/usr/bin/env python3
2

3
# wp2fuwari.py - WordPress 迁移到 Fuwari 完整脚本
4

5
# ==================== Python 3.10+ 兼容性补丁 ====================
6

7
import collections
8

9
import collections.abc
10

11
# 修复废弃的 collections 导入
12

13
for attr in ('Iterable', 'Iterator', 'Mapping', 'MutableMapping',
14

15
             'Sequence', 'MutableSequence', 'Callable'):
16

17
    if not hasattr(collections, attr):
18

19
        setattr(collections, attr, getattr(collections.abc, attr))
20

21
# ==================== 正常导入 ====================
22

23
from wordpress_xmlrpc import Client
24

25
from wordpress_xmlrpc.methods.posts import GetPosts
26

27
import frontmatter
28

29
import os
30

31
import re
32

33
import sys
34

35
import argparse
36

37
from html2text import HTML2Text
38

39
from datetime import datetime
40

41
import urllib.parse
42

43
import requests
44

45
import html
46

47
# ==================== 配置 ====================
48

49
class Config:
50

51
    WP_URL = ''  # 例如: https://blog.example.com
52

53
    WP_USER = ''
54

55
    WP_PASS = ''
56

57
    OUTPUT_DIR = 'fuwari-export'
58

59
    POSTS_DIR = f'{OUTPUT_DIR}/posts'
60

61
    IMAGES_DIR = f'{OUTPUT_DIR}/images'
62

63
    # 下载设置
64

65
    DOWNLOAD_IMAGES = True
66

67
    TIMEOUT = 30
68

69
    RETRY = 3
70

71
    # 作者名 - 修改这里的默认值
72

73
    AUTHOR = 'your-name'  # ← 修改为你想要的默认作者名
74

75
# ==================== 工具函数 ====================
76

77
def clean_slug(title):
78

79
    """生成干净的 URL slug"""
80

81
    slug = re.sub(r'[^\w\s-]', '', title).strip().lower()
82

83
    slug = re.sub(r'[-\s]+', '-', slug)
84

85
    return slug[:50]
86

87
def clean_wp_tags(text):
88

89
    """清理 WordPress Gutenberg 块标记"""
90

91
    # 移除块注释标签
92

93
    text = re.sub(r'<--\s*wp:\w+(\s+[^>]*)?\s*-->', '', text)
94

95
    text = re.sub(r'<--\s*/wp:\w+\s*-->', '', text)
96

97
    # 移除 HTML 注释
98

99
    text = re.sub(r'<!--.*?-->', '', text, flags=re.DOTALL)
100

101
    # 移除 WordPress Gutenberg 块类名
102

103
    text = re.sub(r'\s*class="wp-block[^"]*"', '', text)
104

105
    text = re.sub(r'\s*class="size-[^"]*"', '', text)
106

107
    return text
108

109
def extract_description(content, excerpt, max_len=150):
110

111
    """提取描述 - 去除所有 HTML 标签，返回双引号包裹的文本"""
112

113
    # 首先清理 WordPress 块标记
114

115
    content = clean_wp_tags(content)
116

117
    excerpt = clean_wp_tags(excerpt) if excerpt else excerpt
118

119
    def clean_html(text):
120

121
        """去除所有 HTML 标签和实体"""
122

123
        # 移除 HTML 标签
124

125
        text = re.sub(r'<[^>]+>', '', text)
126

127
        # 解码 HTML 实体
128

129
        text = html.unescape(text)
130

131
        # 清理多余空白
132

133
        text = re.sub(r'\s+', ' ', text).strip()
134

135
        return text
136

137
    if excerpt and len(excerpt.strip()) > 10:
138

139
        text = clean_html(excerpt)
140

141
        desc = text[:max_len] + ('...' if len(text) > max_len else '')
142

143
        return desc
144

145
    text = clean_html(content)
146

147
    text = re.sub(r'[#*`!\[\]\(\)]', '', text)
148

149
    desc = text[:max_len] + ('...' if len(text) > max_len else '')
150

151
    return desc
152

153
def ensure_dirs():
154

155
    """创建输出目录"""
156

157
    os.makedirs(Config.POSTS_DIR, exist_ok=True)
158

159
    os.makedirs(Config.IMAGES_DIR, exist_ok=True)
160

161
def init_html2text():
162

163
    """配置 HTML2Text"""
164

165
    h = HTML2Text()
166

167
    h.body_width = 0
168

169
    h.ignore_links = False
170

171
    h.ignore_images = False
172

173
    h.wrap_links = False
174

175
    h.ignore_tables = False
176

177
    h.skip_internal_links = False
178

179
    h.inline_links = True
180

181
    h.protect_links = True
182

183
    h.wrap_list_items = False
184

185
    h.include_sup_sub = True
186

187
    return h
188

189
# ==================== 图片处理 ====================
190

191
class ImageDownloader:
192

193
    def __init__(self):
194

195
        self.session = requests.Session()
196

197
        self.session.headers.update({
198

199
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
200

201
        })
202

203
        self.downloaded = set()
204

205
        self.failed = []
206

207
    def extract_images(self, md_content, post_date):
208

209
        """提取并替换图片路径"""
210

211
        cover_candidate = ''
212

213
        image_map = {}
214

215
        def replace_image(match):
216

217
            nonlocal cover_candidate
218

219
            alt_text = match.group(1)
220

221
            img_url = urllib.parse.unquote(match.group(2))
222

223
            # 解析文件名
224

225
            parsed = urllib.parse.urlparse(img_url)
226

227
            orig_filename = os.path.basename(parsed.path) or 'image.jpg'
228

229
            # 清理文件名
230

231
            clean_name = re.sub(r'[^\w.\-]', '_', orig_filename)
232

233
            if '.' not in clean_name:
234

235
                clean_name += '.jpg'
236

237
            # 添加日期前缀
238

239
            date_prefix = post_date.strftime('%Y%m%d')
240

241
            new_filename = f'{date_prefix}_{clean_name}'
242

243
            new_path = f'/images/{new_filename}'
244

245
            # 记录映射: 原始URL -> 本地文件名
246

247
            image_map[img_url] = new_filename
248

249
            # 设置封面候选
250

251
            exts = ['.jpg', '.jpeg', '.png', '.webp', '.gif']
252

253
            if not cover_candidate and any(ext in clean_name.lower() for ext in exts):
254

255
                cover_candidate = new_path
256

257
            return f'![{alt_text}]({new_path})'
258

259
        new_content = re.sub(r'!\[(.*?)\]\((.*?)\)', replace_image, md_content)
260

261
        return new_content, cover_candidate, image_map
262

263
    def download(self, image_map, post_date):
264

265
        """下载所有图片"""
266

267
        if not Config.DOWNLOAD_IMAGES or not image_map:
268

269
            return
270

271
        # 从日期生成可能的 WordPress 上传路径
272

273
        year = post_date.strftime('%Y')
274

275
        month = post_date.strftime('%m')
276

277
        date_str = post_date.strftime('%Y%m%d')
278

279
        for orig_url, filename in image_map.items():
280

281
            if filename in self.downloaded:
282

283
                continue
284

285
            # 构建可能的 URL 列表
286

287
            urls_to_try = []
288

289
            # 如果 orig_url 是绝对 URL，优先使用
290

291
            if orig_url.startswith('http'):
292

293
                urls_to_try.append(orig_url)
294

295
            # WordPress 标准上传路径
296

297
            base_url = Config.WP_URL.rstrip('/')
298

299
            # 移除日期前缀获取原始文件名
300

301
            orig_name = filename.replace(f'{date_str}_', '')
302

303
            urls_to_try.extend([
304

305
                f"{base_url}/wp-content/uploads/{filename}",
306

307
                f"{base_url}/wp-content/uploads/{year}/{month}/{orig_name}",
308

309
                f"{base_url}/wp-content/uploads/{year}/{orig_name}",
310

311
            ])
312

313
            # 尝试下载
314

315
            saved = False
316

317
            for url in urls_to_try:
318

319
                try:
320

321
                    resp = self.session.get(url, timeout=Config.TIMEOUT)
322

323
                    if resp.status_code == 200 and len(resp.content) > 100:
324

325
                        filepath = os.path.join(Config.IMAGES_DIR, filename)
326

327
                        with open(filepath, 'wb') as f:
328

329
                            f.write(resp.content)
330

331
                        self.downloaded.add(filename)
332

333
                        print(f"      ↓ {filename}")
334

335
                        saved = True
336

337
                        break
338

339
                except Exception:
340

341
                    continue
342

343
            if not saved:
344

345
                self.failed.append((filename, orig_url))
346

347
                print(f"      ✗ {filename} (下载失败)")
348

349
# ==================== 主导出逻辑 ====================
350

351
def export_posts():
352

353
    """导出 WordPress 文章"""
354

355
    # 验证配置
356

357
    if not all([Config.WP_URL, Config.WP_USER, Config.WP_PASS]):
358

359
        print("错误: 请配置 WP_URL, WP_USER, WP_PASS")
360

361
        print("示例: python wp2fuwari.py --url https://blog.com --user admin --pass secret")
362

363
        sys.exit(1)
364

365
    ensure_dirs()
366

367
    h = init_html2text()
368

369
    downloader = ImageDownloader()
370

371
    # 连接 WordPress
372

373
    print(f"连接 {Config.WP_URL} ...")
374

375
    try:
376

377
        wp = Client(f"{Config.WP_URL}/xmlrpc.php", Config.WP_USER, Config.WP_PASS)
378

379
    except Exception as e:
380

381
        print(f"连接失败: {e}")
382

383
        sys.exit(1)
384

385
    # 获取文章
386

387
    print("获取文章列表...")
388

389
    posts = wp.call(GetPosts({
390

391
        'post_type': 'post',
392

393
        'post_status': 'publish',
394

395
        'number': 1000
396

397
    }))
398

399
    print(f"找到 {len(posts)} 篇文章\n")
400

401
    # 处理每篇文章
402

403
    success = 0
404

405
    failed = 0
406

407
    for i, post in enumerate(posts, 1):
408

409
        try:
410

411
            title_display = post.title[:40] if len(post.title) > 40 else post.title
412

413
            print(f"[{i}/{len(posts)}] {title_display}...")
414

415
            # HTML 转 Markdown
416

417
            md_content = h.handle(post.content)
418

419
            # 清理 WordPress Gutenberg 块标记
420

421
            md_content = re.sub(r'<!--\s*wp:\w+(\s+[^>]*)?\s*-->', '', md_content)
422

423
            md_content = re.sub(r'<!--\s*/wp:\w+\s*-->', '', md_content)
424

425
            md_content = re.sub(r'class="[^"]*"', '', md_content)  # 移除空 class
426

427
            md_content = html.unescape(md_content)  # 解码 HTML 实体
428

429
            md_content = re.sub(r'\n{3,}', '\n\n', md_content)  # 清理多余空行
430

431
            # 处理图片
432

433
            md_content, cover, image_map = downloader.extract_images(md_content, post.date)
434

435
            # 构建 frontmatter (Fuwari 格式)
436

437
            # 使用 datetime 对象，frontmatter 会输出为无引号的日期格式
438

439
            published_date = post.date.replace(tzinfo=None)
440

441
            updated_date = (
442

443
                post.date_modified.replace(tzinfo=None)
444

445
                if hasattr(post, 'date_modified') and post.date_modified
446

447
                else post.date.replace(tzinfo=None)
448

449
            )
450

451
            metadata = {
452

453
                'title': post.title.strip(),
454

455
                'published': published_date,
456

457
                'updated': updated_date,
458

459
                'description': extract_description(post.content, post.excerpt),
460

461
                'author': Config.AUTHOR,
462

463
                'category': post.terms[0].name if post.terms else 'uncategorized',
464

465
                'tags': [t.name for t in post.terms if hasattr(t, 'taxonomy') and t.taxonomy == 'post_tag'] or [],
466

467
                'cover': cover,
468

469
                'draft': False,
470

471
            }
472

473
            # 创建 Markdown 文件
474

475
            md_file = frontmatter.Post(md_content, **metadata)
476

477
            slug = clean_slug(post.title)
478

479
            filename = f"{post.date.strftime('%Y-%m-%d')}-{slug}.md"
480

481
            filepath = os.path.join(Config.POSTS_DIR, filename)
482

483
            fm_string = frontmatter.dumps(md_file, allow_unicode=True)
484

485
            with open(filepath, 'w', encoding='utf-8') as f:
486

487
                f.write(fm_string)
488

489
            # 下载图片
490

491
            if image_map:
492

493
                print(f"    发现 {len(image_map)} 张图片")
494

495
                downloader.download(image_map, post.date)
496

497
            success += 1
498

499
        except Exception as e:
500

501
            print(f"    ✗ 失败: {e}")
502

503
            failed += 1
504

505
    # 统计
506

507
    sep_line = '=' * 50
508

509
    print(f"\n{sep_line}")
510

511
    print("导出完成!")
512

513
    print(f"成功: {success} | 失败: {failed}")
514

515
    print(f"图片: {len(downloader.downloaded)} 成功, {len(downloader.failed)} 失败")
516

517
    print(f"输出: {os.path.abspath(Config.OUTPUT_DIR)}")
518

519
    # 写入失败日志
520

521
    if downloader.failed:
522

523
        log_path = f'{Config.OUTPUT_DIR}/failed_images.txt'
524

525
        with open(log_path, 'w', encoding='utf-8') as f:
526

527
            for name, url in downloader.failed:
528

529
                f.write(f"{name}\t{url}\n")
530

531
        print(f"失败列表: {log_path}")
532

533
    def process_failed_images():
534

535
        """处理失败图片列表中的 base64 编码图片"""
536

537
        failed_log_path = f'{Config.OUTPUT_DIR}/failed_images.txt'
538

539
        if not os.path.exists(failed_log_path):
540

541
            return
542

543
        print(f"\n处理失败图片列表中的 base64 图片...")
544

545
        with open(failed_log_path, 'r', encoding='utf-8') as f:
546

547
            lines = f.readlines()
548

549
        processed = []
550

551
        failed_again = []
552

553
        for line in lines:
554

555
            line = line.strip()
556

557
            if not line or '\t' not in line:
558

559
                continue
560

561
            filename, url_or_data = line.split('\t', 1)
562

563
            # 检查是否是 base64 编码
564

565
            if 'base64,' in url_or_data:
566

567
                try:
568

569
                    # 提取 base64 数据
570

571
                    base64_data = url_or_data.split('base64,')[1]
572

573
                    import base64
574

575
                    image_data = base64.b64decode(base64_data)
576

577
                    # 保存图片
578

579
                    filepath = os.path.join(Config.IMAGES_DIR, filename)
580

581
                    with open(filepath, 'wb') as f:
582

583
                        f.write(image_data)
584

585
                    print(f"  ✓ {filename} (base64 decoded)")
586

587
                    processed.append(filename)
588

589
                except Exception as e:
590

591
                    print(f"  ✗ {filename} (base64 decode failed: {e})")
592

593
                    failed_again.append(line)
594

595
            else:
596

597
                # 保留非 base64 的失败记录
598

599
                failed_again.append(line)
600

601
        # 更新失败列表
602

603
        if failed_again:
604

605
            with open(failed_log_path, 'w', encoding='utf-8') as f:
606

607
                for line in failed_again:
608

609
                    f.write(line + '\n')
610

611
            print(f"剩余 {len(failed_again)} 个图片仍需手动处理")
612

613
        else:
614

615
            os.remove(failed_log_path)
616

617
            print("所有失败图片已处理完成")
618

619
        return processed
620

621
        # 在 export_posts() 函数末尾、sys.exit(0) 之前调用
622

623
        # 位置：在 "下一步提示" 之后
624

625
        # 处理 base64 编码的失败图片
626

627
        process_failed_images()
628

629
    # 下一步提示
630

631
    print(f"\n下一步:")
632

633
    print(f"  1. cp -r {Config.POSTS_DIR}/* fuwari/src/content/posts/")
634

635
    print(f"  2. cp -r {Config.IMAGES_DIR}/* fuwari/public/images/")
636

637
    print("  3. cd fuwari && pnpm build")
638

639
# ==================== 命令行入口 ====================
640

641
def main():
642

643
    parser = argparse.ArgumentParser(description='WordPress 迁移到 Fuwari')
644

645
    parser.add_argument('--url', help='WordPress 站点 URL')
646

647
    parser.add_argument('--user', help='用户名')
648

649
    parser.add_argument('--pass', dest='password', help='密码')
650

651
    parser.add_argument('--output', default='fuwari-export', help='输出目录')
652

653
    parser.add_argument('--author', default=None, help='作者名 (如果不指定则使用 Config.AUTHOR)')
654

655
    parser.add_argument('--no-images', action='store_true', help='不下载图片')
656

657
    args = parser.parse_args()
658

659
    # 应用命令行参数
660

661
    if args.url:
662

663
        Config.WP_URL = args.url
664

665
    if args.user:
666

667
        Config.WP_USER = args.user
668

669
    if args.password:
670

671
        Config.WP_PASS = args.password
672

673
    Config.OUTPUT_DIR = args.output
674

675
    Config.POSTS_DIR = f'{args.output}/posts'
676

677
    Config.IMAGES_DIR = f'{args.output}/images'
678

679
    # 只在命令行指定了作者时才覆盖配置
680

681
    if args.author:
682

683
        Config.AUTHOR = args.author
684

685
    Config.DOWNLOAD_IMAGES = not args.no_images
686

687
    export_posts()
688

689
if __name__ == '__main__':
690

691
    main()