Multi-modal Synergy: Bridging Chinese Culture Expression and Teaching Interaction in Art English Textbooks via Self-supervised Learning
Published: 28 April 2025| Version 1 | DOI: 10.17632/znrhysn7hk.1
Contributor:
Biao KongDescription
This dataset is optimized for English textbooks in art colleges, covering three modes of text, image and video. It contains 300,000 texts, 80,000 images and 2,000 hours of video, and the content involves multi-domain knowledge of art textbooks. In addition, 10,000 texts, 5000 images and 500 hours of videos were added from social media to test the performance of the model on noisy data, aiming to improve the semantic consistency, cultural expression integrity and teaching interactivity of textbooks, and provide strong support for the research on cross-cultural communication and multimodal integration of art English textbooks.
Files
Categories
Art