正在查看 1 个帖子:1-1 (共 1 个帖子)
- 作者帖子
野草游客帖子链接:求助部分日本图书馆以及大学馆藏照片下载方法 – 书格
使用工具:编程语言python、chrome浏览器、chatgpt以及kimi ai
说明:没有技术含量,仅做分享。
思路分析:首先我们要解析这个xml,获取关键信息。其次就是分析信息,构造url,下载碎图。最后就是按事先设定的图片命名方式,进行拼图。
测试链接1:https://opac.lib.takushoku-u.ac.jp/kyugaichi/htmls/resources/2013_013_001/root.xml
测试链接2:当麻練供養図 | 誕生寺所蔵
第一步:我们先把https://www.nara-wu.ac.jp/aic/gdb/mahoroba/y29/taimanerikuyouzu/limes/taimanerikuyouzu.html 改成
https://www.nara-wu.ac.jp/aic/gdb/mahoroba/y29/taimanerikuyouzu/resources/tanjoji_taimanerikuyouzu/root.xml
第二步使用pycharm或者其它能运行py脚本的ide工具运行下面代码即可。
缺点就是速度比较慢,比不上类似的软件,有些参数也可以修改。
import os import requests from PIL import Image import xml.etree.ElementTree as ET from datetime import datetime timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") download_path = os.path.join('E:\25', f'download_{timestamp}') os.makedirs(download_path, exist_ok=True) # 下载和解析 XML dzi = "https://opac.lib.takushoku-u.ac.jp/kyugaichi/htmls/resources/2013_013_001/root.xml" response = requests.get(dzi) xml_content = response.text # 打印 XML 内容进行调试 print("XML 内容:") print(xml_content) root = ET.fromstring(xml_content) # 获取 <image> 节点并提取信息 ,没有的可以换成resource image_node = root.find('.//image') if image_node is None: image_node = root.find('.//resource') if image_node is not None: print("找到节点:", image_node.tag) else: print("未找到节点") if image_node is not None: wi = int(image_node.attrib['width']) hi = int(image_node.attrib['height']) tsize = int(image_node.attrib['tilewidth']) else: raise ValueError("XML 文件中未找到 <image> 节点") # 输出获取的参数 print(f"Width: {wi}, Height: {hi}, Tile Width: {tsize}") # 计算列数和行数 cols = (wi // tsize) + 1 rows = (hi // tsize) + 1 # 下载图像瓦片并拼接 if not os.path.exists(download_path): os.makedirs(download_path) for i in range(rows): string = [] for j in range(cols): nh = tsize if i < rows - 1 else hi % tsize or tsize # 确保至少有一个瓦片 nw = tsize if j < cols - 1 else wi % tsize or tsize num1 = f"{j * tsize:05d}" num2 = f"{i * tsize:05d}" num3 = f"{nw:05d}" num4 = f"{nh:05d}" image_name = f"{num1}{num2}{num3}{num4}.jpg" img_path = os.path.join(download_path, f"{j}_{i}.jpg") file_url = f"{dzi.replace('root.xml', '0/')}{image_name}" print(f"Downloading {file_url}") img_response = requests.get(file_url) if img_response.status_code == 200: with open(img_path, "wb") as f: f.write(img_response.content) string.append(img_path) else: print(f"Failed to download {file_url}. Status code: {img_response.status_code}") # 拼接当前行的图像 if string: row_image = Image.open(string[0]) for img_path in string[1:]: next_image = Image.open(img_path) new_width = row_image.width + next_image.width new_image = Image.new('RGB', (new_width, row_image.height)) new_image.paste(row_image, (0, 0)) new_image.paste(next_image, (row_image.width, 0)) row_image = new_image row_image.save(os.path.join(download_path, f"row{i}.jpg")) # 最后合并所有行的图像 final_images = [Image.open(os.path.join(download_path, f"row{i}.jpg")) for i in range(rows)] full_image = Image.new('RGB', (wi, hi)) y_offset = 0 for img in final_images: full_image.paste(img, (0, y_offset)) y_offset += img.height full_image.save(os.path.join(download_path, "full.jpg")) print(f"全图已保存为 {os.path.join(download_path, 'full.jpg')}")
- 作者帖子
正在查看 1 个帖子:1-1 (共 1 个帖子)
正在查看 1 个帖子:1-1 (共 1 个帖子)