The application of artificial intelligence-assisted technology in cultural and creative product design

Table of Contents

Experimental environment and evaluation process

Appendices 1 and 2 present the experimental environment utilized in this study, detailing the hardware and software configurations. The parameter settings for the VAE and RL models are provided in Appendices 3 and 4.

The experiment evaluates the performance of the proposed VAE and RL combination in cultural and creative product design using several metrics, including model accuracy, generation quality, user satisfaction, and computational efficiency. These results are compared with those of traditional single models, which have not integrated multiple techniques or methods. These single models include fundamental approaches such as GANs, VAEs, and RL.

In evaluating the quality of the generated designs, both quantitative and qualitative measures are applied. Quantitatively, the Structural Similarity Index (SSIM) is employed to assess the similarity between AI-generated designs and reference designs from historical data, providing an objective measure of output accuracy and quality. SSIM evaluates image similarity based on brightness, contrast, and structural features. Additionally, Contrast Fidelity, Texture Fidelity, and Color Fidelity are used to assess visual contrast, texture detail, and color reproduction accuracy, respectively, by comparing the generated designs to real-world counterparts. Texture Fidelity can be analyzed by extracting texture features, such as using Gabor filters, and calculating the similarity between these extracted features. The SSIM formula is expressed in Eq. (4):

$$SSIM\left( x,y \right)=\frac{\left( 2\mu _x\mu _y+C_1 \right)\left( 2\sigma _xy+C_2 \right)}\left( \mu _x^2+\mu _y^2+C_1 \right)\left( \sigma _x^2+\sigma _y^2+C_2 \right)$$

(4)

In Eq. (4), $\mu _x$ and $\mu _y$ are the means of images x and y, $\sigma _x^2$ and $\sigma _y^2$ are their variances, $\sigma _xy$ represents the covariance between the two images, and $C_1$ and $C_2$ are constants to prevent division by zero. Contrast Fidelity, which assesses how well the contrast is preserved in the generated image, is defined as Eq. (5):

$$\textContrast~Fidelity=\frac{\textst\textd_\textgenerated}{\textst\textd_\textreference}$$

(5)

In Eq. (5), $\textst{\textd_\textgenerated}$ is the standard deviation of the generated image, and $\textst{\textd_\textreference}$ is the standard deviation of the reference image. A Contrast Fidelity value close to 1 indicates strong contrast preservation.

Color Fidelity is assessed using the CIEDE2000 color difference formula, as shown in Eq. (6).

$$\text\varvec\DeltaE_00=\sqrt (L_1^* – L_2^*)^2+(a_1^* – a_2^*)^2+(b_1^* – b_2^*)^2$$

(6)

In Eq. (6), L^∗, a^∗, and b^∗ represent the color components in the CIE Lab color space.

For the qualitative assessment, a comprehensive set of evaluation criteria is used to ensure the design works meet high standards. These criteria include innovation, aesthetic appeal, cultural relevance, cultural adaptability, and design practicality. Innovation refers to the uniqueness and creativity of the design. Experts evaluated the level of innovation using a scoring rubric, employing a scale from 1 to 10, where 1 indicates a lack of innovation and 10 represents high creativity. The scale is defined as follows. For example: 1–3 points indicate a lack of innovation, with designs closely resembling common, repetitive patterns; 4–6 points reflect some innovation but insufficient distinction; 7–8 points demonstrate strong innovation with notable uniqueness; and 9–10 points represent highly innovative designs showcasing ground-breaking ideas.

Aesthetic appeal focuses on the visual impact of the design, including elements such as color combinations, shapes, and compositions. Both users and experts rate the designs based on their aesthetic quality, ensuring objectivity and consistency in the evaluation process. Cultural relevance assesses how well the design aligns with a specific culture. Experts score designs based on their effectiveness in cultural communication, ensuring that the designs not only have aesthetic appeal but also resonate with cultural significance. Cultural adaptability evaluates how effectively the design integrates into a specific cultural context. Experts score based on the cultural elements embodied in the design, ensuring its relevance to the target culture. Finally, design practicality concerns the feasibility of the design in practical applications, assessing functionality, usability, and market acceptance.

To gather expert feedback, professionals from various fields within the cultural and creative industries—such as designers, cultural researchers, and market analysts—are invited to participate. Prior to the evaluation, experts undergo training on the evaluation criteria and scoring system, ensuring they fully understand the requirements and meaning behind each indicator. Experts use standardized review forms to score the designs, ensuring a systematic and consistent evaluation process. Simultaneously, user feedback is collected through structured questionnaires, covering aspects such as visual appeal, cultural communication effectiveness, and design practicality. Users rate each aspect using a Likert scale, supplemented with open-ended questions for deeper insights (e.g., 1 indicating strong disagreement and 5 indicating strong agreement).

During the analysis phase, the collected scoring data are subjected to statistical analysis to compute the mean and standard deviation for each criterion. Analysis of variance (ANOVA) is used to assess significant differences between the various designs. Additionally, thematic analysis is performed on users’ open-ended feedback to identify key patterns and themes, providing valuable insights. This evaluation framework helps to better understand design performance within the cultural and creative industries, offering theoretical foundations and practical recommendations for future design practices.

User interaction data, such as actions like viewing, selecting, and modifying designs, provides insights into user engagement with the designs. For computational efficiency, the focus is on assessing the runtime and resource consumption of the model during the design generation process. Time measurements for each model’s design generation are recorded under identical hardware conditions, allowing for a comparative analysis of computational efficiency across different models. This data is then compared with the performance of traditional single models, providing a thorough evaluation of the advantages of combining VAE and RL models in the creation of cultural and creative products.

Performance evaluation

Model generation results

The results of several design cases generated by the proposed model are presented in Fig. 5.

These examples demonstrate that the generated designs successfully retain the original design elements while integrating innovative features to enhance their creativity and imaginative appeal. For instance, in Fig. 5a, the Jingdezhen ceramics incorporate additional creative patterns and flowing elements inspired by the original museum collection, resulting in a more dynamic and visually engaging interpretation of the porcelain. In Fig. 5b, the textiles reinterpret traditional Qing Dynasty garments by introducing distinctive colors and intricate embroidery, blending traditional craftsmanship with modern stylistic elements to align with contemporary aesthetic preferences. Similarly, Fig. 5c reimagines an ancient chair by incorporating modern design elements, creating a visually innovative piece that harmonizes the ingenuity of ancient Chinese design with modern aesthetic standards, rendering it highly appealing to contemporary audiences.

Quantitative evaluation results of generated products

The evaluation results for model accuracy are shown in Fig. 6.

In Fig. 6, the VAE + RL model demonstrates exceptional performance across all metrics, particularly excelling in accuracy and F1 score compared to other models. The accuracy of the VAE + RL model reaches 94.5%, significantly surpassing other single models such as VAE (92.3%), RL (88.7%), and GAN (87.0%). These findings underscore the effectiveness of combining VAE with RL in capturing both global and local features during complex design generation tasks, resulting in enhanced overall performance. In comparison, GPT and Llama-3 models also exhibit commendable performance, with accuracies of 90.4% and 91.2%, respectively. While these models excel in text generation tasks, they face slight limitations in handling cross-modal tasks, particularly those requiring seamless integration of visual and textual data, when compared to the VAE + RL model. Notably, GPT and Llama-3 models achieve recall scores of 91.7% and 90.8%, respectively, reflecting their strong capabilities in recognizing design elements. However, the VAE + RL model outperforms in design diversity and innovation, achieving an F1 score of 93.4%, further reinforcing its superiority in terms of design quality and user satisfaction.

Despite the widespread use of the GAN model in various generation tasks, its performance in this study is relatively suboptimal, achieving an F1 score of only 87.5%, which is lower than that of other models. This result underscores the GAN model’s limitations in managing design complexity and innovation, as it struggles to balance generation quality with design creativity as effectively as the VAE + RL model. These findings highlight that the VAE + RL model excels not only in accuracy and recall but also outperforms other models in overall performance, particularly in generating high-quality, innovative design samples. While GPT and Llama-3 models showcase exceptional abilities in text generation, their effectiveness diminishes when addressing tasks that demand design innovation and the integration of multimodal data. This contrast underscores the substantial advantage of combining VAE and RL techniques, which enhances both the generation and optimization of designs in the cultural and creative product design domain.

Figure 7 provides a visual comparison of the generated quality among different models, illustrating the superior outcomes achieved by the VAE + RL approach.

Figure 7 provides a comparative analysis of generative quality across different models, emphasizing the superiority of the VAE + RL model across all evaluated metrics. The VAE + RL model achieves a SSIM of 0.92, a Contrast Fidelity of 0.95, a Texture Fidelity of 0.94, and a Digital Color Fidelity of 0.93. These results demonstrate that designs generated by the VAE + RL model not only closely align with real-world structural characteristics but also exhibit exceptional fidelity in contrast, texture, and color. This superior performance is attributed to the synergy between the VAE and RL. The VAE effectively captures diverse latent representations from existing designs, while RL optimizes these designs iteratively through interaction with feedback mechanisms, producing high-quality outputs.

In comparison, GPT and Llama-3 models also deliver strong generative performance, particularly in structural similarity and contrast fidelity. GPT achieves an SSIM of 0.89 and a Contrast Fidelity of 0.91, while Llama-3 records an SSIM of 0.88 and a Contrast Fidelity of 0.90. These models, although primarily developed for natural language generation, display robust generative capabilities that extend to design tasks. However, their performance in Texture Fidelity and Digital Color Fidelity is slightly lower, reflecting their limited specialization in image-based generation compared to the VAE + RL approach.

When used independently, VAE and RL achieve moderate generative quality. The VAE records an SSIM of 0.87, a Contrast Fidelity of 0.90, a Texture Fidelity of 0.88, and a Digital Color Fidelity of 0.85. Similarly, RL achieves an SSIM of 0.85, a Contrast Fidelity of 0.88, a Texture Fidelity of 0.85, and a Digital Color Fidelity of 0.84. While both models produce coherent designs individually, their outputs lack the polished quality achieved through their combined use. In contrast, the GAN model underperforms across all metrics, with an SSIM of 0.83, a Contrast Fidelity of 0.86, a Texture Fidelity of 0.83, and a Digital Color Fidelity of 0.81. While GANs are known for their ability to generate diverse designs, they often face challenges such as mode collapse and training instability, which negatively impact their consistency and overall quality. These results underscore the substantial benefits of integrating VAE and RL, particularly for tasks demanding high-fidelity outputs and innovative design capabilities. The combination proves to be a robust approach in the domain of cultural and creative product design, offering superior performance and adaptability compared to traditional single-model methods.

Qualitative evaluation results of generated products

Figure 8 illustrates the qualitative evaluation scores assigned by experts and users for various design works based on the defined metrics, with scores ranging from 1 (very poor quality) to 10 (excellent quality).

The scoring results presented in Fig. 8 show that:

1.

Design A, which includes three ceramic products, received high ratings for innovativeness and aesthetic appeal, with particular emphasis on design practicality, resulting in a user satisfaction rate of 95.2%. This design was recognized for its ability to combine traditional ceramic techniques with modern creative elements, ensuring both visual appeal and functionality.
2.

Design B, featuring three improved versions of Qing Dynasty garments, excelled in cultural relevance and cultural adaptability, scoring 9. This score reflects the design’s success in integrating traditional cultural elements with contemporary aspects. Although its aesthetic appeal was slightly lower than that of Design A, its overall performance remained strong, with a user satisfaction rate of 90.3%, indicating its effectiveness in blending cultural heritage with modern aesthetics.
3.

Design C, which includes three enhanced furniture pieces (chairs), achieved a score of 9 for both innovativeness and cultural relevance, reflecting the design’s successful fusion of traditional cultural elements with modern design innovation. The user satisfaction rate for this design was 90.7%, underscoring the appeal of the furniture in merging design innovation with cultural expression.

The results indicate variations in performance across the different design types concerning innovativeness, cultural adaptability, and user satisfaction. Design A stands out for its strong performance in practicality and aesthetic appeal, while Design B demonstrates its value in merging cultural heritage with modern enhancements. Design C further highlights the potential of integrating cultural relevance with innovative design, offering valuable insights into the field of furniture design.

Model efficiency evaluation

Figure 9 illustrates the resource consumption and runtime of different models.

Figure 9 compares the resource consumption and runtime of various models, offering insights into their computational efficiency and suitability for real-time design generation:

1.

The VAE + RL model exhibits exceptional performance with an average training time of 5 h and an average inference time of 0.2 s. This high efficiency makes it ideal for practical applications where rapid generation of high-quality designs is essential, providing a competitive advantage in environments requiring fast turnaround times.
2.

The GPT and Llama-3 models demonstrate strong generative capabilities but are relatively less efficient in terms of resource consumption and runtime. GPT requires an average of 8 h for training and 0.4 s for inference, while Llama-3 needs 9 h for training and 0.35 s for inference. These longer training times reflect the higher computational demands of these models, which may limit their effectiveness in real-time design applications where speed and efficiency are critical.
3.

The VAE and GAN models show moderate resource consumption, with training times of 6 h and 7 h, respectively, and inference times of 0.25 s and 0.3 s. These models offer a balance between generative quality and resource efficiency, making them suitable for applications requiring both high-quality outputs and optimized resource use.
4.

The RL model, while efficient in inference, has the longest training time, requiring 8 h. Its inference time of 0.35 s is comparable to other models, but the extended training period is needed to optimize design decisions effectively.

In conclusion, the VAE + RL model emerges as the most efficient and effective choice for cultural and creative product design, offering both superior generative quality and resource efficiency. In contrast, the GPT and Llama-3 models are more computationally demanding and may be better suited for scenarios with ample computational resources or more stringent quality requirements. This analysis highlights the strengths and limitations of each model, helping guide their optimal application in different contexts.

Statistical analysis results

To further validate these findings, an ANOVA test is performed, and Table 1 provides the adjusted performance statistics for each model.

Table 1 Statistical analysis of different model performances.

The data in Table 1 clearly demonstrates the superior performance of the VAE + RL model in cultural and creative product design. The traditional design methods, with a mean score of 72.24, are significantly outperformed by more modern approaches, showing their limited capability in addressing complex and innovative design tasks. These traditional methods fall short in both efficiency and design innovation when compared to the newer, data-driven models. The VAE model shows a significant improvement over traditional methods, with a mean score of 86.84, reflecting its ability to generate innovative designs due to its strong generative capabilities. However, while VAE excels in design creativity, the RL model—scoring 84.33—is particularly strong in optimizing decision-making processes. While the RL model shows promising results, its design innovation falls slightly behind the VAE model, as expected given its focus on optimizing rather than generating designs. The VAE + RL model achieves the highest mean score of 90.50, underscoring the synergy between the generative power of VAE and the decision-making capabilities of RL. This combination not only fosters higher levels of design innovation but also enhances user satisfaction. Additionally, the standard deviation of the VAE + RL model is 2.00, indicating stable performance across various design tasks, ensuring consistent high-quality outcomes. The wide score range (from 87.00 to 94.00) further emphasizes the model’s reliability in delivering superior results across different types of cultural and creative designs.

Although the GPT model (mean score: 89.00) and the Llama-3 model (mean score: 87.00) perform well in innovation, their strengths are primarily in text generation and handling complex design issues. When it comes to decision optimization and overall design processing, both models fall behind the VAE + RL model. GPT and Llama-3, though excelling in text generation, lack the integrated design optimization capabilities that the VAE + RL model offers. In conclusion, the VAE + RL model stands out as the most effective approach for cultural and creative product design, not only enhancing design innovation but also improving the efficiency and stability of the design process. This combination leads in terms of design quality, user satisfaction, and design optimization, offering valuable insights and robust technical support for future advancements in the field. Compared to the individual VAE, RL, GPT, and Llama-3 models, VAE + RL delivers a more comprehensive and effective solution for cultural and creative product design.

Turning test results

The Turing test conducted to assess the intelligence of the VAE + RL model involves determining whether the designs generated by the model can be perceived as indistinguishable from those created by human designers. This test provides direct insight into the model’s generative capabilities and its ability to mimic human creativity.

The VAE + RL model is initially employed to generate a series of design schemes, ensuring a broad range of diversity and creativity. These designs incorporated various styles and themes to simulate real-world cultural and creative design tasks. The generated designs are then mixed with those created by human designers to form a comprehensive evaluation set. To maintain fairness in the evaluation process, all designs are anonymized, removing any identifiable markers that could indicate whether the design was generated by the VAE + RL model or by a human designer. A double-blind assessment is conducted with a panel consisting of 20 design experts and 20 ordinary users. Each evaluator is provided with a set of designs, which included both VAE + RL model-generated and human-created designs. Evaluators are tasked with determining whether each design is created by a human, with their judgments based on factors such as innovation, practicality, and artistic value. After collecting the evaluations, the proportion of model-generated designs that are incorrectly identified as human-created was calculated. The classification results from each evaluator are then aggregated, and a confusion matrix is used, along with accuracy metrics, to assess the intelligence level of the model.

Table 2 presents the feedback based on differ rent design types and evaluator groups. Each test includes 20 design samples from both the VAE + RL model and human designers.

Table 2 Turing test results.

Table 2 illustrates the performance variations of the VAE + RL model across different design types. The model’s designs in modern art and digital illustration are often perceived as human-created, demonstrating strong performance in these areas. In contrast, the VAE + RL model shows weaker performance in traditional craft and product design, with notably lower accuracy rates in product design, as evaluated by both experts and general users.

In the fields of modern art and digital illustration, the VAE + RL model performs notably well. For modern art, experts recognize 12 of the model-generated designs as resembling human creations, resulting in an accuracy rate of 60.0%. General users assess 17 designs with a higher accuracy rate of 85.0%. In the digital illustration category, experts evaluate 17 designs with an accuracy rate of 85.0%, while general users assess 19 designs, achieving a 95.0% accuracy rate. Conversely, the model is less effective in traditional craft and product design. For traditional craft, experts identify 13 designs with an accuracy rate of 65.0%, while general users assess 17 designs, yielding an 85.0% accuracy rate. In product design, experts evaluate 15 designs with a 75.0% accuracy rate, while general users assess 16 designs with an 80.0% accuracy rate. Overall, design experts typically achieve higher accuracy rates compared to general users, reflecting their more precise evaluation of the designs. General users perform better in modern art and digital illustration but demonstrate lower accuracy rates in traditional craft and product design. This discrepancy could be attributed to differences in design experience and sensitivity to details.

Discussion

The outstanding performance of the VAE + RL model in design quality and user satisfaction underscores its effectiveness in generating high-quality and attractive product designs. Jang et al.²⁹ highlight that combining generative models with decision optimization methods can significantly improve the quality and diversity of design solutions. This finding aligns with that viewpoint, as both SSIM and user satisfaction indicators were higher than those of other models. Furthermore, the high rating of design innovation by users emphasizes the critical role of innovation in the cultural and creative industries⁵⁰, reflecting Chen⁵¹ who identifies innovation as a key driver of success in cultural products. Despite the VAE + RL model’s impressive performance, its resource consumption and runtime limitations are important considerations. Zhan et al.⁵² note that highly complex model optimization can result in significant increases in computational costs. This suggests that while the model excels in design quality and user satisfaction, its computational demands could hinder its broader adoption, particularly in resource-constrained environments.

The advantages of the VAE + RL model go beyond traditional evaluation metrics, demonstrating adaptability to market demands. As Vuong and Mai⁵³ assert, integrating generative models with decision optimization more effectively meets market needs for innovation and personalization. This integration enables the VAE + RL model to produce design solutions that better align with user expectations, thereby strengthening its competitive position in the market. However, given the model’s high computational demands, further research should focus on optimizing computational efficiency. Techniques such as model pruning and quantization could help reduce resource consumption while maintaining core performance. Choudhary et al.⁵⁴ suggest that model compression techniques are effective in lowering computational costs and enhancing practical application efficiency. Additionally, strategies like distributed computing and parallel processing could reduce model training and inference times, improving computational efficiency. Beyond the cultural and creative design sector, the design optimization capabilities of the VAE + RL model hold potential for various other domains. Applications could extend to fields like architectural design and product development. Future research should explore these cross-domain applications, assess the model’s performance in different design tasks, and develop targeted optimization strategies to enhance its utility.

A comparison with similar studies underscores the advantages of the proposed research methodology and its outcomes. For example, Liu et al.⁵⁵ explored the application of AI in the labor market, emphasizing data-driven decision support based on statistical analysis. In contrast, the current research not only provides data-driven design decision support but also generates diverse design solutions through VAEs and enhances design efficiency and user satisfaction through RL. Furthermore, this approach is specifically tailored to the cultural and creative design domain, with the specificity of the application scenario and the creativity of the generated solutions representing key advantages of the model.

Similarly, Li et al.⁵⁶ focused on AI-supported industrial perception, addressing sensor and data processing challenges in intelligent manufacturing. While their research emphasizes hardware integration and industrial optimization, the present study focuses on optimizing design processes in the cultural and creative sector, substantially improving design quality and user experience. For instance, user satisfaction in this study reached 95%, while existing models in industrial perception often overlook user feedback on design solutions. Additionally, the model addresses gaps in existing research by incorporating cultural adaptability and diversity generation assessments. Furthermore, Zhu⁵⁷ proposed an adaptive agent decision-making model based on deep RL, applied to decision optimization in the logistics sector. While both studies utilize RL frameworks, the current research distinguishes itself by integrating RL with VAEs. This not only optimizes design decisions but also leverages generative models to enhance the diversity of design solutions for cultural and creative products. Additionally, the model emphasizes the generation of optimized solutions based on user feedback, marking a significant departure from the logistics focus on efficiency and path optimization. As such, this approach is particularly suited for the design innovation domain, demonstrating greater adaptability and practical significance.

The comparative analysis above highlights the specificity and innovativeness of the proposed research methodology in the context of cultural and creative design, offering valuable insights for future studies in related fields. This research introduces a generative optimization model that combines VAE and RL, achieving both theoretical and practical advancements in the domain of cultural and creative product design. The experimental results and comparative analysis clearly demonstrate the method’s superiority in design quality, diversity, and user satisfaction. More importantly, this study presents a new paradigm for AI-assisted design within the cultural and creative industries. Unlike traditional design support systems, the proposed approach not only facilitates decision-making but also generates design solutions. The integration of generative models shifts the design process from “selection optimization” to “solution creation,” establishing a foundation for tackling more complex and diversified design challenges in the future. In terms of application prospects, the model developed in this study extends beyond the cultural and creative sector and holds significant cross-domain potential. For instance, its capabilities in generation and optimization can be applied to industrial design, educational content creation, and other areas, thus expanding the possibilities for AI-driven intelligent design. This cross-domain adaptability underscores the model’s versatility and provides substantial opportunities for future research and development.

link

The application of artificial intelligence-assisted technology in cultural and creative product design