Research has shown that the human lip and its movements are a rich source of information related to speech content and speaker’s identity. Lip image segmentation, as a fundamental step in many lip-reading and visual speaker authentication systems, is of vital importance. Because of variations in lip color, lighting conditions and especially the complex appearance of an open mouth, accurate lip region segmentation is still a challenging task. To address this problem, this paper proposes a new fuzzy deep neural network having an architecture that integrates fuzzy units and traditional convolutional units. The convolutional units are used to extract discriminative features at different scales to provide comprehensive information for pixel-level lip segmentation. The fuzzy logic modules are employed to handle various kinds of uncertainties and to provide a more robust segmentation result. An end-to-end training scheme is then used to learn the optimal parameters for both the fuzzy and the convolutional units. A dataset containing more than 48,000 images of various speakers, under different lighting conditions, was used to evaluate lip segmentation performance. According to the experimental results, the proposed method achieves state-of-the-art performance when compared with other algorithms.