Below are some questions that have been recently asked in top I.T companies and these are something that you should definitely be prepared with, if you are appearing for a Hadoop Administrator Interview.
Comment below on what you think the answers could be for these questions.
What is your cluster configuration?
How many master and slave nodes are there in your cluster?
What are the components in your cluster?
Do you have Nosql database?
What is Hive?
When will you use Hbase and when Hive?
Any experience in shell scripts?
Have you ever come across garbage collection, out of memory
error?
How much memory have you given to your data nodes?
How many jobs can you run in your cluster simultaneously?
How much data is incorporated every day in your cluster?
What is Kafka? What is Flume?
Difference between Kafka and Flume?
If you shutdown your cluster will you be able to come to
your Cloudera manager page?
Even if you shutdown the Cloudera agent completely and when
you go to Cloudera page it gives you error page not found. How will you bring
up your Cloudera and your cluster?
What is performance tuning?
What if a job fails, How will you troubleshoot?
What type of scheduling are you guys using?
How many queues are there for scheduling jobs?
How do you decide which job will go to which queue?
Explain the working of HDFS in HA in depth.
Write permission on Journal node
What kind of fencing you have configured on your cluster.
How EPOC Fencing works
How SSH Fencing works
How SHELL/BASH
Fencing works
What is BASHRC ? what is written there?
What variable are entered in Bashrc?
Which file is used for User Management.
What are the 2 important files used for user managerment for
hadoop?
If I have to make changes to 1000 user on Hadoop Cluster
which user management file i will use (this is apure linux file)
Explain in detail about working of YARN
How the Container is created and on what basis
How the containers are allocated and how we decide container
size as per job and RAM and how yarn
slots are getting allocated.
When a job is suppose to be run , who provides the details
of the resource configuration or where to run the job on data node to the
client.
What is scheduling used in YARN
What algorithm used by YARN scheduling
Type of Schedulers
Explain the working of KAFKA
what are the errors you faced during Kafka installation
What details you have to take care while instaling Kafka
Linux messaging service you have ever used.
Is you cluster in HA environment
What are the components which need to have HA configured
what technologies you use for Hadoop security
Which tool u have used for Hadoop cluster governance
Have you worked or used Atlas?
Have you configured Sentry or Ranger?
Example of Row level filtering done on Ranger like File table and columns
user in US should be able to view one particular View
How was the rule was written?
What logic was used , what are the different logics used for
writing these security filter
Tell me the filter you wrote for security at database level
5 location , all are in same group , so how do you write the
filter to make sure that a person from one location can access data of specific
location, even though they are in same group
Have set up cluster from scratch
What is the biggest
cluster you have set
Explain Raid
What is combination of Raid 1 and Raid 0?
how would raid 0 will give redundancy?
Does stripping happens in Raid 0?
How do you use RAID in operating system
Explain LVM
Explain working of EXT3 file system
Explain working of any linux file system
Have u worked on server level configuration on Linux
Do you know DNS
Explain the working of DNS (How it works?)
How do you get Host name resolved in hadoop cluster
what DNS is used for small cluster (Etc/host)
For Big Cluster we use the AMAZON DNS OR we create our own
DNS server.
Explain How this DNS server is configured.
did you work on NFS
storage Sambha
Suppose you are given a baremetal box, and on that you have
to install Hadoop Cluster, then what are the details you have to take care,
Explain everything in detail which you will do to set up
operating system and then Hadoop
Prerequisite at operating level
What is swappiness
What is swap space and how it is utilized in Linux
Full form of SELINUX
Full form of THP
In Redhat NTP service is replaced by something else, what is
that called?
It is with some greater features
What is FQDN
Is it one of the prerequiste for Hadoop
Explain what is SELINUX and what kind of security it
provides to kernel
How does selinux akes care from user level
What kind of warning and errors you get when you disable
selinux
Once you install Hadoop and then activate Selinux what
errors you get in Cloudera
what are the best practice followed to keep the hadoop
cluster up and running
Suppose you need to set up
50 nodes clutser , how much time you will take to set up complete hadoop
cluster including Operating system
How will you deploy
O/S on 10 nodes
Common issues trouble shooted on the cluster and how u have
trouble shooted it
what are the counters in hadoop jobs?
Hadoop Interview Questions
Reviewed by Admin
on
March 29, 2018
Rating:

Great content.
ReplyDeleteThank you for these Hadoop Admin Interview questions admin.
These questions seems valid to me..
ReplyDeleteGood job.