Recent research on mid-air gesture interaction for TV control aimed to standardize them. To this end, researchers developed a design approach that relies on the agreement rates among the elicited end-user gestures. Contrasting with the agreement based approach; a recent study have shown that the most common mid-air gestures might not be the most favored ones. In addition to this, researchers claimed that the agreement studies ignore users’ cultural and conceptual bias. Thus, it can be postulated that the mid-air gesture interaction research can benefit from a qualitative analysis of the users’ mid-gesture set design processes. Towards this end, this study investigated users’ task conceptualizations and mental models. For this purpose, a mid-air gesture-based video streaming experiment was simulated with 10 participants, 4 females and 6 males. Through the lens of Conceptual Metaphor Theory, the study investigated the similarities between the participants’ conceptual representations. The study findings demonstrated that the participants’ conceptualizations had clear references to their bodies and prior physical experiences with the objects, which reflected as linguistic representations of orientational and ontological metaphors in participants’ explanations. Further findings of the study addressed intersections between participants’ mental models.