Getting Social with Video
What we can learn during a video call is far greater than what is provided by audio alone.
Meeting face-to-face is usually much better than a phone call or audio conference, and the use of video conferencing provides a face-to-face conversation. You learn more with video than with audio alone. Video conferencing is easily justified because it can reduce travel expenses and avoid the travel time, making users more productive. I argue that video has some benefits over audio conferencing in that it conveys more information that just a picture with sound.
I record many podcasts face-to-face at conferences. Face-to-face conversations are easier to accomplish because I can read the facial expressions and body language of the other individual. I can also digest the background sound and visuals to tune them out or respond to them accordingly.What You See is Not Sound
When you look at a person or group of people, you are both unconscious and conscious of their behavior. You and the other party are conveying more than what is spoken. Robert Masters, in his article "Compassionate Wrath: Transpersonal Approaches to Anger,"presented his view of affect, feeling, and emotion:
"Affect is an innately structured, non-cognitive evaluative sensation that may or may not register in consciousness; feeling is affect made conscious, possessing an evaluative capacity that is not only physiologically based, but that is often also psychologically (and sometimes relationally) oriented; and emotion is psychosocially constructed, dramatized feeling."
When watching others in a video conversation, we can observe affect, feeling, and emotion. What we can learn during a video call is far greater than what is provided by audio alone.
The two pictures below are taken from "Emotional and Social Signals: A Neglected Frontier in Multimedia Computing," written by Hatice Gunes and Hayley Hung. The article investigates computer processing of social signals.
Here, I'm using some of the content to analyze what people see and do during a video conversation. Inspect the picture below to see what parts of video may not be provided in an audio conference.
- Roles -- The participants can be both or either dominant or submissive.
- Gesture -- Emotion and statements can be made through hand, head, and body gestures to emphasize a point or to intimidate the other participant.
- Eye contact and position -- If eye contact is not present, then the person may be avoiding responsibility for a statement, distracted from the conversation, or expressing emotion such as rolling their eyes.
- Posture -- Siting up straight can convey confidence, showing that the person is under control of themselves. A slouching posture, on the other hand, can demonstrate disinterest or disagreement.
- Distance to the camera -- Just like some people have postures that may not inspire confidence, when a person stays further away from the camera, this may also indicate lack of confidence or desire not to be part of the video conversation.
- Emotional state -- The person could be relaxed, tense, scared, uncomfortable, intimidated, aggressive, assertive, or virtually anything in between.
- Facial expressions -- A blank face is hard to read; a poker face. Most of us do not have that type of control all of the time, maybe never. Changing facial expressions that occur during the conversation all provide insight into the other person's state of mind.
The context surrounding the conversation, both visible and invisible, can further augment the social signals. There are three contexts that surround the video conversation:
- Social context -- These are the personalities of the conversation parties, their gender, possibly nationality, attractiveness, and whether they are alone or with others who may or may not participate in the conversation. Elements of social context include whether the participants know each other, how well, and what is their previous emotional relationship (i.e. do they like and respect each other). Even the way participants are dressed can affect the conversation.
- Situational context -- Where the parties are located (classroom, office, indoors or outdoors, hotel room, lobby, conference/huddle room, convention center) will influence what is said, conversation security, and attitude. Does the conversation cover a past discussion, a new one, or a future activity?
- Affective context -- This is the state of the user: busy, distracted, tired, agitated, awake, too much coffee, or just not interested. The same can be said of those surrounding the speaker: Is the surrounding mood loud, quiet, bright, dimly lit, a lot of physical motion, or just calm?
There are people who can pick up on many of the social cues I have attributed to video conversations, but most can't. Video clearly adds to the comprehension of a conversation. The "Emotional and Social Signals" paper mentioned at the beginning of this blog explores the idea that computer recognition of these cues is on the horizon. This may help some people, but I think the human interpretation of social signals will still be superior.
We may, in the future, see training offered for video participants to learn social expressions and conditions when watching others. If this happens, you can expect that there will training for opposite behavior, training so people can only see the social cues I want to provide.