Good authentication performance and liveness detection are two key requirements in many authentica- tion systems. To avoid replay attacks, a novel visual speaker authentication scheme with random prompt texts is proposed. Compared with the fixed password scenario, visual speaker authentication with ran- dom prompt texts is much more challenging because it is impossible to ask the client to pronounce every possible prompt text to be used as training samples. In order to solve this problem, a new deep convolutional neural network is proposed in this paper and it has three functional parts, namely, the lip feature network, the identity network, and the content network. In the lip feature network, a series of 3D residual units have been adopted, which can depict the static and dynamic characteristics of the lip biometrics comprehensively. By considering the distinguishing features of the identity and content authentication tasks, the identity network and the content network are designed accordingly. An end-to- end, multi-task learning scheme is proposed which can optimize the weights of all the above three net- works simultaneously. Experiments have been carried out to evaluate the performance of the proposed network under both the fixed-password and the random prompt texts scenario. From the experimental results, it is shown that the proposed approach can achieve superior performance in the fixed-password scenario compared with several state-of-the-art approaches. Furthermore, it also achieves satisfactory au- thentication results in the random prompt texts scenario and thus it provides a reliable solution for user authentication where liveness is guaranteed.