Wanted to share my DevTeach talk slides on Elastic Search. Where I went into introducing the Elastic Stack. Consisting of Elastic Search, Logstash and Kibana. I also went into the constraints that we had and the design approaches that we took.
Hope you enjoy and expect more ElasticSearch blogs this year π
Transcript
1. STORE 2 MILLION OF AUDIT LOGS A DAY INTO ELASTICSEARCH Taswar Bhatti (Microsoft MVP) GEMALTO @taswarbhatti http://taswar.zeytinsoft.co m taswar@gmail.com
2. WHO AM I? β 4 years Microsoft MVP β 17 years in software industry β Currently working as System Architect in Enterprise Security Space (Gemalto) β You may not have heard of Gemalto but 1/3 of the world population uses Gemalto they just dont know it β Gemalto has stacks build in many environnent .NET, Java, Node, Lua, Python, mobile (Android, IOS), ebanking etc 9/22/2017 2
3. AGENDA β Problem we had and wanted to solve with Elastic Stack β Intro to Elastic Stack (Ecosystem) β Logstash β Kibana β Beats β Elastic Search flows designs that we have considered β Future plans of using Elastic Search 9/22/2017 3
4. QUESTION & POLL β How many of you are using Elastic or some other logging solution? β How do you normally log? Where do you log? β Do you log in Relational Database? 9/22/2017 4
5. HOW DO YOU TROUBLESHOOT OR FIND YOUR BUGS β Typically in a distributed environment one has to go through the logs to find out where the issue is β Could be multiple systems that you have to go through which machine/server generated the log or monitoring multiple logs β Even monitor firewall logs to find traffic routing through which data center β Chuck Norris never troubleshoot; the trouble kills themselves when they see him coming 9/22/2017 5
6. 9/22/2017 6
7. OUR PROBLEM β We had distributed systems (microservices) that would generate many different types of logs, in different data centers β We also had authentication audit logs that had to be secure and stored for 1 year β We generate around 2 millions records of audit logs a day, 4TB with replications β We need to generate reports out of our data for customers β We were still using Monolith Solution in some core parts of the application β Growing pains of a successful application β We want to use a centralized scalable logging system for all our9/22/2017 7
8. FINDING BUGS THROUGH LOGS 9/22/2017 8
9. A LITTLE HISTORY OF ELASTICSEARCH β Shay Banon created Compass in 2004 β Released Elastic Search 1.0 in 2010 β ElasticSearch the company was formed in 2012 β Shay wife is still waiting for her receipe app 9/22/2017 9
10. 9/22/2017 10
11. ELASTIC STACK 9/22/2017 11
12. ELASTICSEARCH β Written in Java backed by Lucene β Schema free, REST & JSON based document store β Search Engine β Distributed, Horizontally Scalable β No database storage, storage is Lucene β Apache 2.0 License 9/22/2017 12
13. COMPANIES USING ELASTIC STACK 9/22/2017 13
14. ELASTICSEARCH INDICES β Elastic organizes document in indices β Lucene writes and maintains the index files β ElasticSearch writes and maintains metadata on top of Lucene β Example: field mappings, index settings and other cluster metadata 9/22/2017 14
15. DATABASE VS ELASTIC 9/22/2017 15
16. ELASTIC CONCEPTS β Cluster : A cluster is a collection of one or more nodes (servers) β Node : A node is a single server that is part of your cluster, stores your data, and participates in the clusterβs indexing and search capabilities β Index : An index is a collection of documents that have somewhat similar characteristics. (e.g Product, Customer, etc) β Type : Within an index, you can define one or more types. A type is a logical category/partition of your index. β Document : A document is a basic unit of information that can be indexed β Shard/Replica: Index divided into multiple pieces called shards, replicas are copy of your shards9/22/2017 16
17. ELASTIC NODES β Master Node : which controls the cluster β Data Node : Data nodes hold data and perform data related operations such as CRUD, search, and aggregations. β Ingest Node : Ingest nodes are able to apply an ingest pipeline to a document in order to transform and enrich the document before indexing β Coordinating Node : only route requests, handle the search reduce phase, and distribute bulk indexing. 9/22/2017 17
18. SAMPLE JSON DOCUMENT HTTP CALL JSON DOCUMENT 9/22/2017 18
19. ELASTICSEARCH CLUSTER 9/22/2017 19
20. TYPICAL CLUSTER SHARD & REPLICA 9/22/2017 20
21. SHARD SEARCH AND INDEX 9/22/2017 21
22. DEMO OF ELASTICSEARCH 9/22/2017 22
23. LOGSTASH β Ruby application runs under JRuby on the JVM β Collects, parse, enrich data β Horizontally scalable β Apache 2.0 License β Large amount of public plugins written by Community https://github.com/logstash- plugins 9/22/2017 23
24. TYPICAL USAGE OF LOGSTASH 9/22/2017 24
25. 9/22/2017 25
26. LOGSTASH INPUT 9/22/2017 26
27. LOGSTASH FILTER 9/22/2017 27
28. LOGSTASH OUTPUT 9/22/2017 28
29. DEMO LOGSTASH 9/22/2017 29
30. BEATS 9/22/2017 30
31. BEATS β Lightweight shippers written in Golang (Non JVM shops can use them) β They follow unix philosophy; do one specific thing, and do it well β Filebeat : Logfile (think of it tail βf on steroids) β Metricbeat : CPU, Memory (like top), redis, mongodb usage β Packetbeat : Wireshark uses libpcap, monitoring packet http etc β Winlogbeat : Windows event logs to elastic β Dockbeat : Monitoring docker β Large community lots of other beats offered as opensource 9/22/2017 31
32. 9/22/2017 32
33. FILEBEAT 9/22/2017 33
34. X-PACK β Elastic commercial offering (This is one of the ways they make money) β X-Pack is an Elastic Stack extension that bundles β Security (https to elastic, password to access Kibana) β Alerting β Monitoring β Reporting β Graph capabilities β Machine Learning 9/22/2017 34
35. 9/22/2017 35
36. KIBANA β Visual Application for Elastic Search (JS, Angular, D3) β Powerful frontend for dashboard for visualizing index information from elastic search β Historical data to form charts, graphs etc β Realtime search for index information 9/22/2017 36
37. 9/22/2017 37
38. DEMO KIBANA 9/22/2017 38
39. DESIGNS WE WENT THROUGH β We started with simple design to measure throughput β One instance of logstash and one instance of ElasticSearch with filebeat 9/22/2017 39
40. DOTNET CORE APP β We used a dotnetcore application to generate logs β Serilog to generate into json format and stored on file β Filebeat was installed on the linux machine to ship the logs to logstash 9/22/2017 40
41. PERFORMANCE ELASTIC β 250 logs item per second for 30 minutes 9/22/2017 41
42. OVERVIEW 9/22/2017 42
43. LOGSTASH 9/22/2017 43
44. ELASTIC SEARCH RUN TWO β 1000 logs per second, run for 30 minutes 9/22/2017 44
45. PERFORMANCE 9/22/2017 45
46. OTHER DESIGNS 9/22/2017 46
47. WHAT WE ARE GOING WITH FOR NOW, UNTILβ¦.. 9/22/2017 47
48. CONSIDERATIONS OF DATA β Index by day make sense in some cases β In other you may want to index by size rather (Black Friday more traffic than other days) when Shards are not balance ElasticSearch doesnβt like that β Donβt index everything, if you are not going to search on specific fields mark them as text 9/22/2017 48
49. FUTURE CONSIDERATIONS β Investigate into Elastic Search Machine learning β ElasticSearch with Kafka for cross data center replication 9/22/2017 49
50. THANK YOU & OPEN TO QUESTIONS β Questions??? β Contact: Taswar@gmail.com β Blog: http://Taswar.zeytinsoft.com β Twitter: @taswarbhatti β LinkedIn (find me and add me)