AI and Photography: Part 3 - Midjourney vs Stable Diffusion

进入原网站导航首页

AI and Photography: Part 3 - Midjourney vs Stable Diffusion × Try 1x For Free — Start
Growing as a Photographer Every photo on 1x is handpicked by curators.
As a member, you get: Feedback from experts Your work published in a curated gallery Earn certificates as a published photographer A chance to sell your art globally Start Free Trial 1 month free. Cancel anytime. We use cookies This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions. I agree I deny Send Canvas is not supported in your browser. Close SEARCH Gallery Curation Art Prints Magazine Tutorials About Members Log in Sign up Menu Sign up About Log in Upgrade Magazine Latest Archive AI and Photography: Part 3 - Midjourney vs Stable Diffusion Share on Written By Yan Zhang
Published by Yvette Depaepe, the 24th of May 2024

"Machines will be capable, within twenty years, of doing any work a man can do.” ~Herbert A. Simon (1965)

In July 2023, I attended in an AI research forum. An Amazon researcher introduced to us several AI projects currently undertaken at Amazon. During the event, we had lunch together. When she learned that I was also a photographer, she bluntly said to me: "Midjourney ended photography!"his statement, her words present the view of many professionals engaged in the cutting-edge research on generative AI. In this article, from the perspectives of both as an AI scientist and as a professional photographer, I try to thoroughly explore the profound impact that generative AI is having on traditional photography; and how we, as photographers, should face it to this challenge.

Next week: Part 4 - The Photographer's Confusion Midjourney vs Stable Diffusion

2023 will be definitely a year wrinen in the history of AI.

In early 2023, ChatGPT, a large language model (LLM) launched by OpenAI, reached 100 million users in just two months. By mid 2023, the applications of ChatGPT and its successor GPT-4 have significantly expanded from initial Question Answering, document editing and creation, to a wider range of finance, health care, education, soiware development, etc.

At the same time, research on the diffusion model based image generation represented by Midjourney, Stable Diffusion, and DALL.E 2 have also achieved major breakthroughs. The main function of these models is to generate imagines of various styles from prompts. The most amazing of them is that the Midjourney and Stable Diffusion models can generate realistic images similar to photography.

Images Generated by Midjourney

Generally speaking, Midjourney can use relatively simple and direct prompts to generate high quality and photorealistic images. Here we demonstrate several various images generated by v5.0 and v6.0 versions.

“Everest Base camp”. Generated on Midjourney, by Yan Zhang.

“A young woman portrait”. Generated on Midjourney, by Yan Zhang.

“Mysterious forest”. Generated on Midjourney, by Yan Zhang.

“Dream seascape”. Generated on Midjourney, by Yan Zhang.

From the pictures above, we can see that Midjourney can produce nearly perfect "photographs". Midjourney is also good at generating non-photographic artworks, and can generate such artworks with even specific artist styles, as shown in the following.

“Picasso’s women”. Generated on Midjourney, by Yan Zhang.

The power of Midjourney with image generation has been widely recognised. However, since it is a fully closed system, Midjourney's model structure and training methods are unknown to the public, and users have to pay fees for using it through the Discord platform.

Stable Diffusion Model Structure

Stable Diffusion is an image generation diffusion model launched by Stability AI in July 2022. Unlike Midjourney, Stable Diffusion is a completely open system, so we can understand all examine all technical details of this model from the structure to the training process.

Figure 6. The main model structure of Stable Diffusion.

After we know the basic idea of the diffusion model (see Figure 4 and Figure 5), it is not difficult to understand the structure of the Stable Diffusion main model in Figure 6. The training image x is compressed into a latent vector z by the encoder, and the process of forward diffusion begins. During this process, noises are gradually added to the latent vector, and finally transformed into a noise latent vector zT; then the reverse diffusion begins. process. At this time, the additional "text/image" condition is converted into the representation of a latent vector through a transformer and implanted into the reverse diffusion process. In this reverse diffusion process, the neural network U-Net uses a specific algorithm to gradually remove noises, restore it to a latent vector z, and finally generates a new image x^ through the decoder.

It should be noted that aier the model completes training, we only need to use the reverse diffusion process as an inference engine to generate images. At this time, the input text/image is converted into a latent vector through the transformer, and reverse diffusion through U-Net begins to generate a new image.

The Stable Diffusion model in Figure 6 can also be roughly divided into three major components: the leftmost red module VAE, the middle green module U-Net, and the rightmost Conditioning transformer. Such a structural diagram will facilitate the description of the Stable Diffusion extension we will discuss later.

Figure 7. The three modules of Stable Diffusion correspond to the main structure in Figure 6. VAE (Variational AutoEncoder) compresses and restores images; U-Net neural network is used for the reverse diffusion process, which we also call inference; Conditioning transformer is an encoder used to convert text and image conditions, attached to the reverse diffusion process.

Stability AI uses 5 billion (image, text) pairs collected by LAION as the training dataset, where each image size is 512X512. The compuDng resources used for model training are 256 Nvidia A100 GPU processors on Amazon Web Services (AWS) (each A100 GPU has a capacity of 80 GB); the iniDal model training took 150,000 GPU hours and cost USD $600,000.

Images Generated by Stable Diffusion

Generally speaking, under the same prompt words, the quality of the pictures generated by Stable Diffusion is not as good as Midjourney. For example, using the same prompts of the "Mysterious forest " picture generated by Midjourney above, the picture generated by SD v1.5 is as follows:

"Mysterious forests". Generated on Stable Diffusion (use the same prompts as the same titled image shown above), by Yan Zhang.

Obviously, the quality of the picture above is not as good as the one generated by Midjourney, both in terms of photographic aesthetics and image quality. However, it would be a mistake to think that Stable Diffusion is far inferior to Midjourney.

Because it is open source, Stable Diffusion provides people with unlimited possibilities for subsequent research and development in various ways. We will briefly outline the work in this area below.

Using a rich prompt structure and various extensions, Stable Diffusion can also generate realistic “photography works" comparable to Midjourney.

“Future city”. Generated on Stable Diffusion, by Yan Zhang.

“A young woman portrait”. Generated on Stable Diffusion, by Yan Zhang.

“Alaska Snow Mountain Night”. Generated on Stable Diffusion, by Yan Zhang.

Stable Diffusion Extensions

The open source of Stable Diffusion allows AI researchers to carefully study its structure and source code, so as to make various extensions to the model and enhance its functions and applications.

The expanded research and development of Stable Diffusion is basically focused on the UNet part (see Figure 7). There are two main aspects of the work: (1) Based on the original Stable Diffusion U-Net, with a small amount of specific dataset to train a personalized U-Net sub-model. In this way, when the sub-model is embedded in Stable Diffusion, it can generate images with personalized styles that users want. Dreambooth, LoRA, Hypernetworks, etc., all belong to this type of work.

(2) Enhance control over the image generation process of Stable Diffusion. Research in this area is to design and train a specific neural network control module so that in the process of image generation by Stable Diffusion, users can directly intervene according to their own requirements, such as changing the posture of the character, replacing the face or background, etc. ControlNet, ROOP, etc., are all control module extensions that belong to this category.

In addition, we can also revise the original U-Net structure of Stable Diffusion and use a specific training dataset to train part or all of the modified diffusion model. The underlying diffusion model trained in this way can be targeted at specific application domains, such as medicine, environmental science, etc.

Stable Diffusion sub-model example. The author of this article downloaded 7 photos of Tom Hanks from the Internet as shown in (a). Then use the extension Dreambooth to train these only 7 photos to generate an "AI-TomHanks" sub-model. Embedding this sub-model in Stable Diffusion can generate an AI version of the Tom Hanks picture, as shown in (b).

In addition to U-Net, we can also make more modifications and extensions to Stable Diffusion in the two parts of VAE and Conditioning transformer, which we will not go into details here.

Comparisons between Midjourney and Stable Diffusion

Here based on my own experience, I made the following comparison of the six main features of the two.

User friendliness: From a user’s perspective, I think Midjourney is easier to use than Stable Diffusion. It is easier for people to generate more satisfactory pictures on Midjourney. If you are a Stable Diffusion user, you will find that in order to generate a high-quality image, in addition to working on prompts, you also need to have a suitable sub-model (also called checkpoint), no matter whether you are using SD v1.5 or SD XL v1. 0, therefore, it is relatively difficult.

Flexibility: In the process of image generation, Midjourney and Stable Diffusion provide different ideas and methods to control and modify the final output image. However, I think Midjourney's method is more intuitive and practical, giving users more flexibility. Although Stable Diffusion also provides more complex and richer image editing capabilities, such as inpainting, outpainting, upscaling, etc., it is not very easy to use in practice for ordinary users.

Functionality diversity: Because of open source and scalability, the functions of Stable Diffusion have been conDnuously enhanced, which has also made Stable Diffusion increasingly popular in various application domains in business, education, medical and scientific research. However, just from the aspect of artistic picture generation, both Midjourney and Stable Diffusion can generate stunning artistic pictures (photography, painting, cartoon, 3D, sculpture, etc.).

Image quality: Both systems can generate high-quality artistic images of all types. However, as mentioned before, Midjourney is slightly bener than Stable Diffusion in terms of the aesthetics and quality of the generated images.

Extendibility/Free use: First of all, Midjourney is not free to use, and it is not open source. For users who want to use generative AI soiware for free and have some IT knowledge background, I strongly recommend installing Stable Diffusion on their own computers, so that you can enjoy to freely create anything you are interested.

Photographers ask me, which one should we choose, Midjourney or Stable Diffusion?

My suggestions are as follows: (a) If you are limited by technology and/or resources (for example: you don’t know how to install and use Stable Diffusion, your computer does not have a certain GPU capacity), then you can just choose Midjourney. Although it requires a subscription fee, after learning, you will definitely be able to create great AI art works, and you can also use it to help you enhance your photography post-process workflow.

(b) If you are only interested in generating AI artwork and processing photos, I also only recommend using Midjourney and do not consider Stable Diffusion at all.

(c) If you have a certain IT knowledge background and are interested in the technical details of generating a wide range of artistic images, especially if you want to generate some personalized images, then I strongly recommend Stable Diffusion, because it is currently the most comprehensive generative AI soiware for image generation.

“Mountain sunrise”. Generated on Midjourney, by Yan Zhang.

“Silent valley”. Generated on Stable Diffusion, by Yan Zhang.

Mini AI knowledge: AI Winter - refers to the period from 1974 to 2000, when AI research and development, mainly in the United States, was at a low ebb, and research funding and investment were significantly reduced. The main reason for the AI winter is that since the mid-1960s, a series of large-scale AI research projects have failed or failed to make substantial progress. This includes: the failure of machine translation and single-layer neural network research projects in the late 1960s; the failure of speech understanding research at Carnegie Mellon University in the mid-1970s; and the stagnation of the fifth-generation computer research and large-scale expert system development during 1980s -1990s.

Write Ulrike Eisenmann PROStable diffusion, see https://arxiv.org/pdf/2112.10752, researchers from the university of Munich published the schematic picture in April 2022, which is shown above without reference, or did I miss it?Miro Susta CREWDear Ulrike In this respect they did not use it legally, this picture is licenced by Hiroshima International University.Ulrike Eisenmann PROah , interesting! does not make it better ,-) Cristiano Giani PRO...well, if a user starts producing ''WOW'' images from tomorrow, compared to the ''MMM...NOT BAD'' ones he produced until yesterday, he has certainly started using Midjourney....:):):)....Miro Susta CREW😊👍😊 Ulrike Eisenmann PROCompletely agree with Miro, what has the result by Midjourney to do with photography, ok, it grabs parts of other peoples photographies to combine it. Yes, dear Yan, you can use prompts very well but pictures created do not look like photographies but really artificial. Was also very astonished about the title, since AI is strictly forbidden here, sorry to be really negative on this articleMiro Susta CREW😊👍😊 Steven T CREWYan Zhang, Thank you for the articles and images. I don't understand the technical details of how AI creates images, but it's clear that it will be a big change for visual artistry - much like the big change around 1840 when the invention of photography threatened to replace painting, and the more recent change when digital imaging and Photoshop began to replace film and darkroom. Painting survived, and Photography will too. Perhaps the evolution of technology will push us towards making meaningful images that can't be described with words. Miro Susta CREWMidjourney and Stable Diffusion are two leading generative AI applications that offer highly advanced functionality in their creation of images. While these two generative AI image creators share a similar focus, they are significantly different in their approach to AI image generation. Their difference boils down to a preference of artistic nuance versus extensive customization: Midjourney: Best for creating artistic, visually compelling images. Stable Diffusion: Best for extensive customization and technical control over image generation. This means both, Midjourney and Stable Diffusion are AI applications with ability to quickly generate images from text prompts. Use of AI for image generation is strictly forbiden in 1x. After reading this article (actually all 3 parts) I am non able to revise my previous comments to it. IMHO, this article is a promotion of AI, is it appropriate to promote something which was banned on this 1x platform? Furher, part of this article orignates from (licenced) work of Shigekazu Ishihara, Rueikai Ruo and Keiko Ishihara (Hiroshima International University), without mention their names. Dear Yan, please do not be upset about my comment(s) to this article, this is just my personal opinion on this subject. I am an engineer and also supporter of AI but definitely not in the photography. I wish you and all 1x readers lovely weekend.Yan Zhang CREWDear Miro, thanks for reading this article and provided your comments. As an AI researcher, I am definitely an AI supporter. However, if you read my later parts of this article, you should know my position about the relationship between AI and photography. Most importantly, no matter you like or not, AI is here, and its impact to photography is increasing, with the most photography industries are embracing AI. When Adobe first time embedded generative AI into Photoshop in 2023 officially, it has become clear that the traditional meaning of photography is getting complicated, and we may need to re-define it.Miro Susta CREWDear Yan, I appreciate your answer to my comment very much, I understand it very well. I know that we can't stop any development, as you said also not in photography. I'm very sad about it, for me is photography an art, created by the photographer and his camera. Now with AI we don't need the camera anymore, we can create beautiful pictures with words only, I tried it, it is working very well. I must repeat, IMHO it is pushing real photo work, or photo artwork to offside. I'm not a very good photographer, but photography is a part of my life. I was always very proud when I saw that one of my humble photo was published or even awarded, but now I have to reconsider if I shall continue or not, because my chances in competition with AI touched photos are rather slim. Once more thank you for this most educative article. Have a very nice weekend.

智能索引记录

2026-02-27 06:41:26 综合成功标题：PIP's News & Accolades - PIP Omaha, NE
简介：The latest news, accolades, press releases, and all the inte
2026-02-27 00:23:37 综合成功标题：Test Systems › Transmissions & Axles - ZF
简介：Test benches from passenger cars to heavy transmissions and
2026-02-27 01:17:45 综合成功标题：Ab wieviel Jahre? [Archiv] - BW7 Forum
简介：Sehr geehrte Damen und Herren, ich habe dieses Forum mit Go
2026-02-27 07:53:43 综合成功标题：鹤跱的拼音_鹤跱的意思_鹤跱的繁体_词组网
简介：词组网鹤跱频道,介绍鹤跱,鹤跱的拼音,鹤跱是什么意思,鹤跱的意思,鹤跱的繁体,鹤跱怎么读,鹤跱的近义词,鹤跱的反义词。
2026-02-27 03:43:30 综合成功标题：Compliant HCP Engagement with P360
简介：Engage HCPs confidently with P360 secure, compliant communic
2026-02-27 03:10:43 新闻成功标题：602《暗黑西游》7服12月18日13时火爆开启 - 新闻公告 - 602游戏平台 - 做玩家喜爱、信任的游戏平台！cccS
简介：602《暗黑西游》7服12月18日13时火爆开启
2026-02-27 05:55:36 综合成功标题：Tommy Robinson jailed for nine months for contempt - 5RB Barristers
简介：Tommy Robinson jailed for nine months for contempt - Contemp
2026-02-27 02:06:16 教育成功标题：我的作文400字【汇编3篇】
简介：在平时的学习、工作或生活中，大家都经常接触到作文吧，作文一定要做到主题集中，围绕同一主题作深入阐述，切忌东拉西扯，主题涣
2026-02-27 06:49:23 综合成功标题：无痛胃镜和普通胃镜的区别 - 云大夫
简介：普通胃镜检查时，患者会比较痛苦。而无痛胃镜是在静脉麻醉之后进行，更加方便医生观察病情以及取活检。但要注意的是，进行无痛胃
2026-02-27 00:35:12 综合成功标题：Yanjun Feng Fish & Richardson
简介：Yanjun (Yan) Feng, Ph.D., applies her deep technical experti
2026-02-27 05:58:29 综合成功标题：RIA.com – Виза в Венгрию - помощь в оформлении и получении
简介：Каталог тур агентств и операторов, которые помогут оформить
2026-02-27 03:54:05 游戏成功标题：拉泰区域活动：帮我个忙，打我！_天国拯救图文全剧情流程攻略_全主线支线任务攻略_3DM单机
简介：天国：拯救图文全剧情流程攻略，全主线支线任务攻略。《天国：拯救(Kingdom Come: Deliverance)》是
2026-02-27 05:34:20 综合成功标题：排尿时尿道刺痛为什么 - 云大夫
简介：排尿时尿道刺痛，大多是尿道炎症、前列腺炎症或者尿道中有结石及异物等因素导致的。患者一旦出现了排尿时尿道刺痛的情况，需要及
2026-02-27 07:44:15 综合成功标题：Greystone College Australia Job Support
简介：Join Greystone College Australia
2026-02-27 03:51:26 教育成功标题：不敢苟同的意思解释_不敢苟同是什么意思-雄安文学网
简介：不敢苟同是什么意思？雄安文学网为您提供不敢苟同的意思解释、拼音、近反义词，以及不敢苟同成语接龙，供成语爱好者参考学习用。
2026-02-27 06:20:31 综合成功标题：莆田市大方信息技术有限公司招聘-597直聘
简介：597直聘为您提供招聘信息、公司简介、公司地址、公司福利等详细信息,让您在选择前有一个全面的了解.公司介绍：一、老店新推
2026-02-27 07:07:48 综合成功标题：Schaeffler Germany
简介：Schaeffler has been driving forward groundbreaking invention
2026-02-27 04:11:28 综合成功标题：Measuring Battery DC Internal Resistance with a 24xx Graphical SMU and TSP Technology Tektronix
简介：Measuring Battery DC Internal Resistance with a 24xx Graphic
2026-02-27 05:37:16 综合成功标题：SBQuantum_Element_Six_MagQuest - Element Six single crystal diamond is to be tested in space - Element Six
简介：Quantum magnetometer powered by Element Six diamond to be te
2026-02-27 00:02:48 综合成功标题：Abigail's a rising star - Sir Robert McAlpine
简介：Our Apprentice Quantity Surveyor was among those recognised
2026-02-27 03:16:48 综合成功标题：Law.com The Premier Source for Global Legal News & Analysis
简介：Law.com delivers news, insights and resources that allow leg
2026-02-27 00:23:44 视频成功标题：送给月亮的情书第63集红豆剧场_在线播放[高清流畅]_爽文短剧
简介：爽文短剧_送给月亮的情书剧情介绍：送给月亮的情书是由内详执导,内详等人主演的,于2025年上映,该剧情讲述的是@好男人@
2026-02-27 05:19:21 综合成功标题：多囊卵巢综合症怎么引起的 - 云大夫
简介：发病的原因大体分为两个方面，先天的因素和后天的因素。先天的因素，主要是遗传因素，多囊卵巢综合症不是遗传病，但是它属于有遗
2026-02-27 02:33:08 综合成功标题：MGK Acquires DeBug® Brands From Agro Logistic Systems, Inc. - MGK
简介：The DeBug products are an excellent addition to MGK’s curren
2026-02-27 07:35:57 综合成功标题：Boucle Sport Nike Rose Alpenglow 40 mm - Apple (CH)
简介：Personnalisez votre Apple Watch avec une Boucle Sport Nike r
2026-02-27 06:35:56 综合成功标题：Fisher Investments Vermögensverwaltung
简介：Fisher Investments GmbH gehört zur weltweiten Fisher-Unterne
2026-02-27 02:32:52 新闻成功标题：龙岗网站设计资讯图片上传网站变形的处理-北京孤凡电子商务有限公司
简介：龙岗网站设计资讯,图片上传网站变形的处理,网站开发的背景知识与相关技术,seo 轻松跑通GitHub热门AI项目#xff
2026-02-27 05:42:54 综合成功标题：Librairie chrétienne Excelsis
简介：Excelsis, librairie chrétienne, protestante et évangélique e
2026-02-27 01:51:38 综合成功标题：醋酸概述 - 山东地六化学有限公司
简介：醋酸解释：具有刺鼻气味的无色液体酸CH3COOH,它是醋中的主要酸，通常由乙醛氧化、葡萄酒发酵和木材干馏制成，主要用于制
2026-02-27 00:12:43 综合成功标题：User Defined Filter Tool テクトロニクス
简介：User Defined Filter Tool