Playground v2 – New 1024px Aesthetic Model

That’s interesting! The Playground v2 – 1024px Aesthetic Model sounds like a powerful tool for generating high-quality, aesthetically pleasing images. I’m familiar with both Hugging Face and Diffusers, so I understand the potential applications of this model.

Could you tell me more about your goals for using this model? Are you interested in using it for creative projects, personal artwork, or something else entirely? Knowing more about your specific needs would help me provide more relevant and helpful information.

  • Technical assistance: If you’re unfamiliar with using Hugging Face or Diffusers, I can help you get started by explaining the basics and providing resources for further learning.
  • Creative inspiration: I can help you brainstorm ideas for prompts and settings to use with the model to generate specific kinds of aesthetic images.
  • Comparison with other options: If you’re considering other models for generating images, I can compare and contrast them with the Playground v2 model to help you make an informed decision.

I’m excited to learn more about how you plan to use this exciting technology!

Playground v2 is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at Playground.

Images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL, according to Playground’s user study.

We are thrilled to release intermediate checkpoints at different training stages, including evaluation metrics, to the community. We hope this will encourage further research into foundational models for image generation.

Lastly, we introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality.

Please see our blog for more details.

Install diffusers >= 0.24.0 and some dependencies:

pip install transformers accelerate safetensors

To use the model, run the following snippet.

Note: It is recommend to use guidance_scale=3.0.

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2-1024px-aesthetic",
    torch_dtype=torch.float16,
    use_safetensors=True,
    add_watermarker=False,
    variant="fp16"
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image  = pipe(prompt=prompt, guidance_scale=3.0).images[0]

In order to use the model with software such as Automatic1111 or ComfyUI you can use playground-v2.fp16.safetensors file.

According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored 2.5 times more than those produced by Stable Diffusion XL.

We report user preference metrics on PartiPrompts, following standard practice, and on an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks.

During the user study, we give users instructions to evaluate image pairs based on both (1) their aesthetic preference and (2) the image-text alignment.

image/png
ModelOverall FID
SDXL-1-0-refiner9.55
playground-v2-1024px-aesthetic7.07

We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality.

We have curated a high-quality dataset from Midjourney, featuring 10 common categories, with each category containing 3,000 samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.

For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024×1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ-30K benchmark.

We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.

ModelFIDClip Score
SDXL-1-0-refiner13.0432.62
playground-v2-256px-base9.8331.90
playground-v2-512px-base9.5532.08

Apart from playground-v2-1024px-aesthetic, we release intermediate checkpoints at different training stages to the community in order to foster foundation model research in pixels. Here, we report the FID score and CLIP score on the MSCOCO14 evaluation set for the reference purposes. (Note that our reported numbers may differ from the numbers reported in SDXL’s published results, as our prompt list may be different.)

fa_IRPersian