For the previous five months, I had the opportunity to intern at CAI Stack as a Data Science Intern. I worked on mostly research-oriented projects with the Data Science team. This post will mark the end of my internship here and will be a summary of my experience and work during these 5 months. It has been a brilliant learning experience, where I was able to learn about very recent technologies and how they could be used in a commercially feasible and practical manner while working within reasonable constraints. This not only helped me to learn about these technologies but also taught me to be able to optimize, to be able to get the best results quickly and easily.
My projects mainly revolved around potential use cases for generative networks in fashion. Broadly speaking, the end goal of my project here was to implement a virtual try-on network-essentially taking in-shop clothing and a person image as input to give output as an image of a person wearing those clothes. The model implemented has been done with a focus on tops, with complete apparel transfer being potential future work.
For this, we initially needed to be able to implement a segmentation algorithm. Even though open-source state-of-the-art models could have been used to implement this, we stuck with robust image processing techniques for segmentation. With the idea being to localize the face and understand the skin color of the model from the face image to be able to divide an image into hair, clothes, skin, and background.
Once we have the clothing segment, we can now geometrically compare this clothing segment to the in-shop clothing. Our goal is now to be able to learn transforms on the in-shop clothing to make it as geometrically similar to the model clothing. To visually describe this one can refer to the image below. The example is a grid of six images. Top left is the in-shop clothes, top right being the clothing segment of the model, the top middle being the transform (bottom left) applied on the in-shop clothes.
The above examples are generated during training and hence the in-shop clothing and the model clothing are the same. This can also lead to an easier qualitative assessment. The network architecture to learn this transform is briefly described below.
We call the learning of this transformation as the Geometric Matching Module, as it matches the in-shop clothing to the current clothing trying to get them to match geometrically. Some of the results after training are shown below.
The instinctive approach to imposing the new clothing now is to simply paste it over the image, but as one can see this will cause problems due to overlap with hair and hands, and the previous clothing stays, making it look very unrealistic. The solution to this was the try-on module, where we implement an encoder-decoder network to smoothen out the image.
This gives a smoothened image that looks much more realistic than the results we would have if we were to paste the image over the model. The article has avoided any in-depth description of the work done and for a thorough description of the model and training strategy, it is advisable to read the paper here.
The project described above was one of the many things I worked on during my internship at CAI Every one of my projects led to immense learning and taught me the importance of being a quick learner. Almost all of the concepts used throughout the internship were new to me and I had to get an in-depth understanding relatively quickly. Overcoming this was not only enjoyable but also instilled confidence in me.
I am really thankful for all my mentors who guided me in different aspects such as improving code performance, debugging skills, writing clean and modularized code. I am definitely taking a lot of learnings to build upon.
Empower your AI journey with our expert consultants, tailored strategies, and innovative solutions.